VariantFilteration and Selectvariant (Mitochondrial mode) - Missing True Positive Variants
Hi, I am using gatk-4.2.6.1 version, working on mitochondrial mode, Till FilterMutectCalls steps all good, but while doing Hard Filtering Step using (1) SelectVariant and (2) VariantFiltertion i am missing few of (True Positive) variants.
####
I attaching my command here which i have used for hard filtering after FilterMutectcalls
# Step 1_a: SelectVariants_SNP (Hard_Filter)
java -jar /data/applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar SelectVariants -V FiterMutectcalls.vcf -select-type SNP -O SelectVariants_SNP.vcf
# Step 1_b: VariantFiltration_SNP (snps_filtered)
java -jar /data/applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar VariantFiltration -V SelectVariants_SNP.vcf --filter-expression 'QD < 2.0' --filter-name 'QD2' --filter-expression 'QUAL < 30.0' --filter-name 'QUAL30' --filter-expression 'SOR > 3.0' --filter-name 'SOR3' --filter-expression 'FS > 60.0' --filter-name 'FS60' --filter-expression 'MQ < 40.0' --filter-name 'MQ40' --filter-expression 'MQRankSum < -12.5' --filter-name 'MQRankSum-12.5' --filter-expression 'ReadPosRankSum < -8.0' --filter-name 'ReadPosRankSum-8' -O VariantFiltration_SNP.vcf
# Step 2_a: SelectVariants_INDEL (Hard_Filter)
java -jar /data/applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar SelectVariants -V FiterMutectcalls.vcf -select-type INDEL -O SelectVariants_INDEL.vcf
# Step 2_b: VariantFiltration_INDELS (indels_filtered)
java -jar /data/applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar VariantFiltration -V SelectVariants_INDEL.vcf --filter-expression 'QD < 2.0' --filter-name 'QD2' --filter-expression 'QUAL < 30.0' --filter-name 'QUAL30' --filter-expression 'FS > 200.0' --filter-name 'FS200' --filter-expression 'ReadPosRankSum < -20.0' --filter-name 'ReadPosRankSum-20' -O VariantFiltration_INDEL.vcf
## Step 10: Merging_filtered_SNP_and_INDEL
java -jar /data/applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar MergeVcfs -I VariantFiltration_SNP.vcf -I VariantFiltration_INDEL.vcf -O /SNP_INDEL.vcf
I also tried with "-select-type MIXED" so that any variants other than (SNP & INDEL) should not miss
# SelectVariants_MIXED (Hard_Filter)
java -jar /data/applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar SelectVariants -V FiterMutectcalls.vcf -select-type MIXED -O SelectVariants_MIXED.vcf
# VariantFiltration_MIXED (mixed_filtered)
java -jar /data/applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar VariantFiltration -V SelectVariants_MIXED.vcf --filter-expression 'QD < 2.0' --filter-name 'QD2' --filter-expression 'QUAL < 30.0' --filter-name 'QUAL30' --filter-expression 'FS > 200.0' --filter-name 'FS200' --filter-expression 'ReadPosRankSum < -20.0' --filter-name 'ReadPosRankSum-20' -O VariantFiltration_MIXED.vcf
####
And also wanted to confirm is that all "PASS" which is "True Positive", are converting to "QUAL30" in "FILTER" column, if it is so then what is the purpose and how this hardfiltering(VariantFilteration) exactly working here. Is it possible or recommended to use more stringent quality parameters to identify the false positives?
Is there any important parameter which i am missing during the step, Next thing wanted to know the zygosity information in mutect2 & filtermutect2 calls. Why all the variants are getting called as 0/1 or het? Is it because of the noise in the data? However in Freebayes the variants are showing as it is seen in bam as hom or het. Is there any variant calling parameter to check here.
-
Jyoti Mridha This looks like a custom-made filtering pipeline. It does not resemble our best practices. The hard filtering most likely does very little since it is filtering based on several annotations -- QUAL, QD, ReadPosRankSum, MQRankSum -- that Mutect2 does not emit.
Our recommendation is simply to run Mutect2 and FilterMutectCalls in mitochondria mode with default options.
Finally, the zygosity emitted by Mutect2 is meaningless. The VCF spec assumes an integer ploidy which is not appropriate for aneuploidy and heteroplasmy such as occurs in cancer and mitochondria. A 0/1 genotype just means that a variant exists, which is totally redundant.
Please sign in to leave a comment.
1 comment