Variants missing in 4.2.0.0 due to assembly cycles
AnsweredWe have a known SNV with AF ~1% and are trying to use Mutect2 to detect it in amplicon sequencing data.
- It can be called with GATK 4.1.8.1 without problems in multiple samples.
- It cannot be called in GATK 4.2.0.0 and GATK 4.2.5.0 with the same bam files input for multiple samples.
- When running in debug mode, the log revealed that the region is not assembled successfully in GATK 4.2 (Error: Not using kmer size of 10 in read threading assembler because it contains a cycle) and thus the variant is not called.
- In many cases, they have the same input reads and even same active region. But GATK 4.2 cannot assemble it while GATK 4.1.8.1 has no problem assembling the region.
We would like to understand if there is any assembler changes introduced in GATK 4.2+, which may cause the discrepancy between v4.1.8.1 and v4.2+? And is there any parameter we could set up to recover the expected SNV with GATK 4.2+? Any suggestions will be very much appreciated.
-
Hi yqiu,
I'm not sure what change specifically could be responsible for the differences you are seeing. All of our release notes are here: https://github.com/broadinstitute/gatk/releases, if you want to track the changes that have happened.
I would recommend that you turn off the downsampler with amplicon data in order to get the best results. You would need to set --max-reads-per-alignment-start to 0 to turn it off. The caveats for amplicon data are covered at the bottom of this troubleshooting document that I also linked in your other post: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant
Please let me know if you have further questions.
Best,
Genevieve
-
I see! Could you please share the site as it is called in 4.1.8.1? And also a screenshot of the BAM and the bam output in IGV with the specific site highlighted (to see the depth)?
What does this site look like in the most recent GATK version, 4.2.5.0?
I also did find that there were some changes to the active region code in 4.1.9.0, though this change on our end led to better results: https://github.com/broadinstitute/gatk/pull/6821.
-
Hi yqiu,
I am going to move your post into our Community Discussions -> General Discussion topic, as the Somatic topic is for reporting bugs and issues with GATK.
You can read more about our forum guidelines and the topics here: Forum Guidelines.
Best,
Genevieve
-
Thanks a lot for the suggestion! Following your previous advise, we did use --max-reads-per-alignment-start to 0 in both GATK 4.1.8 and GATK 4.2 in the results I mentioned above.
-
Hi Genevieve,
Thanks for the response! Please see the answers to your questions below.
Here is the SNV as called in GATK 4.1.8.1.
CHROM POS . C T . . AS_SB_TABLE=9605,9821|113,112;DP=19664;ECNT=16;MBQ=20,20;MFRL=150,150;MMQ=60,60;MPOS=55;POPAF=7.30;TLOD=275.06 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:19426,225:0.011:19651:19231,220:0,0:9605,9821,113,112
Please see IGV shot below. I included three tracks raw bams from BWA, realigned bam from GATK 4.1.8.1 and GATK 4.2.0.0. I highlighted the SNV in yellow box. As you can see, in both raw bam and 4.1.8.1 realigned bams, there are clear signal of SNV at ~1%. And the region is not assembled in 4.2.0.0 and there is no read there.
The site is not called in GATK 4.2.5.0 either. And it reports the same error as 4.2.0.0. (Error: Not using kmer size of 10 in read threading assembler because it contains a cycle)
-
yqiu can you click on the site of interest in IGV so we can see the depth in the raw bam? Did this site pass FilterMutectCalls in 4.1.8.1?
Please sign in to leave a comment.
6 comments