gatk 4.1 vs 4.6 - increased number of low AF variants detected by mutect2
Hi, thanks very much for developing a great suite of tools. This is a question more than an "issue" -- I have been using mutect2 to identify somatic variants in tumor-only WGS data, and recently updated my gatk version from 4.1.0.0 to 4.6.0.0. I noticed that, while the variants mostly overlap, on most samples the newer mutect2 version outputs more variants, especially in the lower AF ranges such as 0.2-0.3 that may have only a handful of supporting reads. Relatedly, there are a minority of variants that are detected in v4.1 but not in v4.6, and it would be great to have a sense of what could cause that as well.
I am wondering if there is a summary or basic intuition for what changed between these versions that may lead to more or different low AF variants passing filter? I looked through the mutect2 changelogs of each release but didn't see anything too obvious as to what changed.
Exact command:
gatk Mutect2 \
--input $INPUT_CRAM \
--output ${NAME}_chr${CHR}_mutect2.vcf.gz \
--intervals chr${CHR} \
--reference $REF \
--dont-use-soft-clipped-bases \
--f1r2-tar-gz ${NAME}_chr${CHR}_f1r2.tar.gz \
--annotation OrientationBiasReadCounts &> logs/${NAME}_chr${CHR}_mutect2.log
-
Hi Julia Belk
There are certainly changes made to Mutect2 as versions progress and those changes are all for the betterment of the tool. Along these 2 versions here are some of the highlights that may change the behavior of Mutect2 for the better.
Make the Mutect2 haplotype and clustered events filters smarter about germline events (#8717)
Added a --base-qual-correction-factor to allow a scale factor to be provided to modify the base qualities reported by the sequencer and used in the Mutect2 substitution error model (#8447)
Fixed a rare edge case in the AdaptiveChainPruner where the JavaPriorityQueue is undefined for tied elements (#7851)
The palindrome ITR artifact transformer now skips reads whose contigs are not in sequence dictionary (#6968)
Fixed a bug where Mutect2 failed to filter germline variants with alternate representations (#7103)
Fixed the --dont-use-soft-clipped-bases argument in Mutect2 to actually work as intended (#6823)
Fixed a bug in the Mutect2 engine active region code that could affect the ability to call tumor alts when the normal has a different alt at the same site (#6908)
Fixed a bug in HaplotypeCaller and Mutect2 where we were losing insertion events that immediately followed a deletion (#6696)
Made improvements to the Mutect2 active region detection code that resulted in recovering some low-AF calls that we were missing (#6821)
Made the HaplotypeCaller/Mutect2 adaptive pruner smarter in complex graphs, resulting in modest improvements to indel sensitivity when using the adaptive pruning option (#6520)
Fixed a bug in variation event detection code that could sometimes lead to mistreating indel assembly windows as SNP assembly windows (#6661)
Fixed a bug in FragmentUtils where insertion quals were used instead of deletion quals when adjusting base qualities for two overlapping reads from the same fragment (#6815)
Fixed a regression in HaplotypeCaller and Mutect2 where alt haplotypes with a deletion at the end of the padded region caused exceptions (#6544)These are not the exhaustive list of changes to the Mutect2 but these are probably the reasons why called sites differ between 2 versions with more low allele fraction. Especially changes to the active region detection code which results in recovering more low AF variants and changes to overlapping reads code is probably the most obvious reasons why you are observing more of those new variants.
I hope this helps.
Regards.
Please sign in to leave a comment.
1 comment