Mismatch between mutect2 VCF and IGV BAM stats (causing strand bias issue)
I have mutect2 vcfs filtered and made from deduplicated, UMI collapsed, BAM files.
When I check called variants in IGV the stats in IGV are different to what is reported in mutect2 vcf - which makes sense, to an extent..? since vcf may have removed some uninformative reads etc right? so one may expect DP for example to be slightly lower in VCF than observed in IGV.
However, when I introduced a strand bias filter (SOR<3) I noticed some weird results so checked IGV. I cannot understand why but for some variants which appear in IGV as for example ALT Forward: 1000 Alt Reverse: 0 (extreme strand bias - 0 supporting alt reads in reverse) mutect2 has called the variant and the stats in mutect2 vcf do not show 1000 and 0 they show even distribution..? E.g. for attached example SNP in mutect2 if my understanding of F1R2 F2R1 is correct then it is Reference forward 411, Alt forward 8, Reference Reverse 488, Alt Reverse 8 - But IGV shows Reference forward 812, Alt forward 0, reference reverse 588, and Alt reverse 230. I am sure that I Must be misunderstanding something as there is no feasible reason for such an orientation discrepancy between the two I can think of???? The DP is higher in IGV than VCF (as expected) yet the Alt Forward count is 0 in IGV and 8 in mutect2 vcf?????? I am so confused where is mutect2 calling these Alt Forwards from if they are absent in IGV? Any ideas or explanation would be appreciated!
(gatk4-4.6.1.0). Many Thanks : )
-
Hi Ben Thompson
F1R2 and F2R1 are not the number of reads supporting either allele from different strands but rather pair orientations that support either ref and alt alleles. What you see in IGV is number individual piled reads for that site that support either allele.
I hope this helps.
-
Thank you Gökalp! That makes sense then I think I misunderstood the IGV view and got Strand Bias and Read Orientation Bias mixed up. I believe I successfully filtered out strand bias using Mutect2’s F1R2/F2R1 annotations, but when I checked the results in IGV, I mistakenly thought the remaining read orientation artifacts were still strand bias.
Given this, would you recommend adding LearnOrientationModel into my pipeline so I can use
--ob-priors
in FilterMutectCalls? From what I understand, this is the best way to remove read orientation bias from my Mutect2 VCFs currently?I don’t have a Panel of Normals (PoN) or control samples, just MSA brain samples, so would the best approach be:
- Running LearnOrientationModel separately on each sample,
- Combining all
.tar.gz
model files into one orientation bias model file, - Using that as the
--ob-priors
input when running FilterMutectCalls?
Many thanks for your help!
-
Hi again.
Read Orientation Model should be generated only per sample and should not be made into a combined model. For the usefulness of the tool you need to make sure that UMI collapsing should
- not create collapsed reads solely based on UMI similarity
- care about read strandedness and orientation during collapsing.
If these 2 parts are not provided orientation bias is not useful and should definitely be avoided.
I hope this helps.
Please sign in to leave a comment.
3 comments