Force-calling mode in Mutect2
REQUIRED for all errors and issues:
a) GATK version used:4.2.0.0
b) Exact command used:
c) Entire program log:
When using the force-calling mode in Mutect2 to call SNPs, I encountered two issues:
1. There is a significant difference in the INFO column of the variants detected among using the MT2 tumor-normal paired mode, tumor-only mode, and force-calling mode. such as the following sites (upper used the tumor-normal paired mode, middle used the tumor-only mode, lower used the force-calling mode) ;
1 10878001 . C T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=10,8|0,4;DP=40;ECNT=1;GERMQ=21;MBQ=31,20;MFRL=226,237;MMQ=60,54;MPOS=33;NALOD=0.624;NLOD=2.76;POPAF=5.6;TLOD=14.43 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:0,0:0.468:0:0,0:0,0:0,0,0,0 0/0:9,0:0.079:9:2,0:6,0:5,4,0,0 0/1:9,4:0.331:13:3,2:1,2:5,4,0,4
1 10878001 . C T . clustered_events;germline;haplotype AS_FilterStatus=SITE;AS_SB_TABLE=5,3|0,4;DP=18;ECNT=3;GERMQ=1;MBQ=12,20;MFRL=232,237;MMQ=60,54;MPOS=33;POPAF=5.60;TLOD=14.50 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:8,4:0.331:12:3,2:0,2:0|1:10878001_C_T:10878001:5,3,0,4
1 10878001 . C T . weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=5,4|0,1;DP=14;ECNT=1;GERMQ=21;MBQ=22,20;MFRL=244,235;MMQ=60,60;MPOS=30;POPAF=7.30;TLOD=0.026 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:9,1:0.144:10:4,0:0,1:5,4,0,1
2. When employing the force-calling mode, not all SNPs were called in the alleles.
gatk --java-options "-Xmx10g" Mutect2 \
-R ${ref}/GRCh37.fa \
-I ${bam_path}/${sample}.bam \
--alleles ${GGA_path}/vcf/${sample}_GGA_union.vcf.gz \
--native-pair-hmm-threads 2 \
--force-call-filtered-alleles \
-L ${GGA_path}/vcf/${sample}_GGA_union.vcf.gz \
-bamout ${GGA_path}/bam/${sample}_GGA.bam \
-O ${GGA_path}/recall/${sample}_GGA_recall.vcf.gz
-
When a sample has no read coverage near a force-calling allele Mutect2 can't emit it because the GATK engine skips over the region. Is that the case for your SNPs that were not called?
Some discrepancy between force-calling and regular calling is normal but I admit that this is more than I would expect. Are there any other force-called alleles within 100 bases of the variant in question?
It would also be helpful if you could generate a bamout.bam file by adding "-bamout bamout.bam" to your command line and post IGV screenshots.
-
Hi David Benjamin,
Thank you for your response, but I still have some questions.
1. Regarding the site 1:10878001:C:T mentioned in my first question, we retrieved the 50bp upstream and downstream reads from the raw BAM file. However, we observed that the AD and DP values for the three modes of MT2 appeared inaccurate, and there were significant discrepancies in other feature values as well. Could you please elaborate on the potential reasons behind this inconsistency?
2. As for my second question, concerning the uncalled SNP 3:11792584:G:T in the force-calling mode, we utilized the IGV tool to inspect both the force-calling BAMout (upper track) and raw BAM (lower track). Notably, we detected a mutation in the raw BAM, but surprisingly, there was no read coverage at that site in the force-calling BAM.
-
Regarding 2), restricting your intervals to just the force-calling alleles (i.e. -L ${GGA_path}/vcf/${sample}_GGA_union.vcf.gz) might cause problems. Can you try applying interval padding (I would recommend -ip 150) and see what the bamout near 3:11792584 looks like?
For 1), I suspect that the clustered events and haplotype filters are being applied because of a bug we very recently fixed (https://github.com/broadinstitute/gatk/pull/8717, which is merged into master but is not in a release yet) where somatic variants were penalized for being on haplotypes containing germline variants. Even after the fix, there is a good chance the variant will still be filtered as germline in tumor-only mode. This is inevitable because FilterMutectCalls has to be very cautious since the prior probability of germline variants is so much higher than somatic variants. If you are curious, an IGV width of 300 or so bases -- large enough to see any variants in the same local assembly -- would be needed.
The ADs and DPs from tumor-normal and tumor-only modes look okay to me, and the force-calling values might become more consistent after using interval padding.
Please sign in to leave a comment.
3 comments