How to decrease the likelihood threshold for a candidate haplotype to be called as a genotype in Mutect2
AnsweredI am using Mutect2, GATKv4.2.0.0, on a single tumour, no matched normal, on targeted capture of ctDNA and it's performing exceptionally well. I'm using serial titrations of a cell-line standard, so I know exactly what variants are expected, and I'm just trying to track down a few remaining ones. I have read the articles on expected variants that are not called, and am running my command with the following parameters:
--pruning-lod-threshold 0.0001
-init-lod 0.001
-emit-lod 0.001
-mbq 5
--min-pruning 0
--linked-de-bruijn-graph true
--recover-all-dangling-branches true
--f1r2-median-mq 5
--f1r2-min-bq 10
--f1r2-max-depth 30000
-max-reads-per-alignment-start 0
-native-pair-hmm-threads 16
--force-active true
--pcr-indel-model NONE
In the debug output, I can see that the algorithm has successfully assembled across the haplotype I'm looking for, so I don't think it's a matter of tweaking the assembly graph parameters.
Here is the relevant snippet from the debug output, and the insertion I'm looking for is this one:
Events = EventMap{chr5:112840254-112840254 [G*, GA],}
When I look at the bam output with --bam-writer-type ALL_POSSIBLE_HAPLOTYPES, my haplotype seems well-supported (when the -bamout option is used with CALLED_HAPLOTYPES, nothing is shown here). The supporting reads are those in purple in the lower pane below:
It looks to me like the haplotype is not making it past the posterior likelihood cutoff to go from potential haplotype to genotype call. This is a one-base insertion inside a 6-base homopolymer, so I know it's a bit of a marginal call. In a different sample with a higher allele frequency, the use of pcr-indel-model NONE worked for this particular variant, but I can't see any other parameters that adjusted. Any information on how the likelihood is calculated and how I can relax that likelihood cutoff would be very much appreciated!
-
Hi Yussanne Ma,
Thank you for your thorough post! I am starting to look into this. Unfortunately, there is a holiday in the US and so I won't be able to get back to you until Monday. Could you run Mutect2 around this site and force call it with the --alleles option? I want to see what the evidence for the site would be in the VCF.
Thank you,
Genevieve
-
Hi Genevieve,
Thank you so much for getting back to me! I ran the command on the same file with all the same params, forcing the call at that position, and here is the relevant line from the resulting vcf:
chr5 112840254 . G GA . . AS_SB_TABLE=8004,8324|14,15;
DP=17495;ECNT=1;MBQ=20,20;MFRL=166,167;MMQ=60,60;
MPOS=30;POPAF=7.30;RPA=6,7;RU=A;STR;TLOD=-4.740e+00 GT:AD:AF:DP:F1R2:F2R1:SB
0/1:16328,29:5.914e-04:16357:7254,13:1195,3:8004,8324,14,15
-
Hi Genevieve,
After looking more closely at what I sent you yesterday, I can see that the TLOD is -4.7, so I am able to now get this variant by setting my init-lod and emit-lod to -10. I am still a little confused as to why the odds ratio is so low, given that the alternate depth and F/R balance is not bad, and certainly the tool has called other variants with less confidence, so I'd appreciate any insight you have, but I understand that there is a lot involved in the models to get to these likelihood calculations so this could just be an edge case!
-
The --alleles option does change some of the likelihood calculations. I'm wondering what the site looks like when you are able to call it without the --alleles option?
-
Here it is in the two samples where I wasn't able to call it previously:
chr5 112840254 . G GA . . AS_SB_TABLE=8004,8324|14,15;DP=17495;ECNT=1;MBQ=20,20;MFRL=166,167;MMQ=60,60;MPOS=30;POPAF=7.30;RPA=6,7;RU=A;STR;TLOD=-4.740e+00 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:16328,29:5.914e-04:16357:7254,13:1195,3:8004,8324,14,15
chr5 112840254 . G A,GA . . AS_SB_TABLE=6731,7049|1,1|15,16;DP=14829;ECNT=2;MBQ=20,20,20;MFRL=168,164,162;MMQ=60,60,60;MPOS=27,48;POPAF=7.30,7.30;TLOD=-5.369e+00,-5.211e+00 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2:13780,2,31:2.153e-04,1.052e-03:13813:6443,1,12:709,0,3:6731,7049,16,17
-
Ok, thank you, we'll take a closer look and get back to you.
-
Hi Yussanne Ma,
The reason this site is not called without decreasing the lod arguments is that the frequency of the reads supporting the alternate allele is much lower than the normal error rate with illumina short reads. Since this is an insertion in a homopolymer, the error rate is around 1/1000, so we cannot be confident that these reads supporting the alternate allele are not just errors. Different types of variants could be called with lower support if the normal error rate for those variant types is lower.
I hope this helps you to understand what is going on at this site. Please let me know if you have further questions.
Best,
Genevieve
-
That definitely makes sense, Genevieve, thank you so much for looking into this for me. If you develop a version of mutect2 with priors adjusted for high-depth low-AF data, we would be really excited to use it :)
-
Happy to help!
Please sign in to leave a comment.
9 comments