GATK Mutect2 counts the alternate base twice in overlapping paired reads
REQUIRED for all errors and issues:
a) GATK version used: 4.1.8.1
Hi GATK team,
I observed a significant increase in variant calls in samples with overlapping paired reads and discovered that Mutect2 (version 4.1.8.1) counts the alternate base twice in these regions, leading to numerous false positive variants. Below are the read counts of one variant(chr1:83879352) from the Mutect2 VCF file and the corresponding IGV plot.
vcf file:
chr1 83879352 . G T . PASS . GT:AD:AF:DP:F1R2:F2R1:SB 0/1:18,3:0.188:21:10,0:6,3:10,8,2,1
IGV plot:
In the IGV plot, we can see that only two fragments support the alt variant at position chr1:83879352:G>T, whereas Mutect2 reports 3 counts, which can sometimes result in a false positive.
I’ve seen some discussions about this issue here, but I’m unsure of the current status of the bug. Besides, I've noticed that HaplotypeCaller generates significantly more variants in samples with overlapping paired reads too. My question is
1) Does HaplotypeCaller with version 4.1xx also count the alternate base twice in overlapping paired reads?
2) whether using Mutect2 or HaplatypeCaller in versions 4.0.xx or 4.6.xx would avoid this problem when calling variants.
Could someone please help to clarify?
Thanks,
Junhui
-
Hi JUNHUI LI
Can you try running the same sample through GATK 4.6.0.0 and see if the issue persists?
-
Thanks, Gökalp Çelik.
This issue is still there in GATK version 4.6.0.0. I would appreciate any recommendations you might have for working around this issue.
--Junhui
-
Hi JUNHUI LI
The current behavior for 4.6.0.0 is the expected one and Mutect2 itself is smart in separating overlapping evidence due to PCR error vs real evidence from mutations in DNA. Therefore base calling qualities are adjusted during reassembly for those overlapping sites until it reaches a point where it falls into PCR error category. If you wish to get us more insights about this particular site or any other false positive site you can provide examples of please share us the entire variant context with the INFO fields so that we can take a look at what Mutect2 provided us.
I hope this helps.
Regards.
-
Hi Gökalp Çelik,
Thanks. Here are two variants called with version 4.6.0.0
chr1 42256836 . C T . . AS_SB_TABLE=20,21|2,2;DP=45;ECNT=4;ECNTH=2;MBQ=20,20;MFRL=121,97;MMQ=60,60;MPOS=49;POPAF=3.83;TLOD=5.72 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:41,4:0.100:45:15,0:5,2:26,2:20,21,2,2
chr1 83879352 . G T . . AS_SB_TABLE=9,9|2,1;DP=21;ECNT=3;ECNTH=2;MBQ=20,20;MFRL=160,172;MMQ=57,58;MPOS=44;POPAF=7.30;TLOD=5.67 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:18,3:0.188:21:6,0:3,2:12,2:9,9,2,1version 4.0.12 only outputs the second variant as below,
chr1 83879352 . G T . . DP=15;ECNT=2;MBQ=32,33;MFRL=167,203;MMQ=58,58;MPOS=36;POPAF=7.30;TLOD=4.41 GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:12,2:0.186:14:7,0:5,2:0.00,0.141,0.143:0.044,0.011,0.944
I’m grateful for your support.
Thanks,
Junhui
-
Hi JUNHUI LI
Those values seem like they are not much and they will definitely get filtered out by FilterMutectCalls. Our suggestion would be to perform FilterMutectCalls first before deciding if there are any outstanding false positives that remain.
I hope this helps.
Regards.
Please sign in to leave a comment.
5 comments