I would like to understand the FORMAT strings of Mutect2 better. Here is an example of a somatic variant from running the Mutect2 pipeline using GATK 18.104.22.168
17 29665040 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=19,17|1,2;DP=39;ECNT=1;GERMQ=75;MBQ=20,20;MFRL=174,205;MMQ=60,60;MPOS=63;POPAF=7.3;TLOD=4.51 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:36,3:0.103:39:11,0:14,2:25,2:19,17,1,2
What is the true number of informative forward and reverse reads overlapping the variant position? AD and SB suggest it's 19 and 17 for REF and 1 and 2 for ALT, yet F1R2, F1R2 and FAD suggest it's 11 and 14 for REF and 0 and 2 for ALT for the forward and reverse strand.
Can the difference between these fields also be explained by uninformative reads? If I want to filter for variants where the ALT allele is supported by at least one read on both forward and reverse strand, do I need to remove this variant? I.e. do I need to consider the data from SB or from F1R2 / F1R2 to answer this question?
Please sign in to leave a comment.