Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Variants missing in 4.2.0.0 due to assembly cycles

Answered
0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi yqiu,

    I'm not sure what change specifically could be responsible for the differences you are seeing. All of our release notes are here: https://github.com/broadinstitute/gatk/releases, if you want to track the changes that have happened.

    I would recommend that you turn off the downsampler with amplicon data in order to get the best results. You would need to set --max-reads-per-alignment-start to 0 to turn it off. The caveats for amplicon data are covered at the bottom of this troubleshooting document that I also linked in your other post: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant

    Please let me know if you have further questions.

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I see! Could you please share the site as it is called in 4.1.8.1? And also a screenshot of the BAM and the bam output in IGV with the specific site highlighted (to see the depth)?

    What does this site look like in the most recent GATK version, 4.2.5.0? 

    I also did find that there were some changes to the active region code in 4.1.9.0, though this change on our end led to better results: https://github.com/broadinstitute/gatk/pull/6821.

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi yqiu,

    I am going to move your post into our Community Discussions -> General Discussion topic, as the Somatic topic is for reporting bugs and issues with GATK.

    You can read more about our forum guidelines and the topics here: Forum Guidelines.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    yqiu

    Thanks a lot for the suggestion! Following your previous advise, we did use --max-reads-per-alignment-start to 0 in both GATK 4.1.8 and GATK 4.2 in the results I mentioned above. 

    0
    Comment actions Permalink
  • Avatar
    yqiu

    Hi Genevieve,

    Thanks for the response! Please see the answers to your questions below.

    Here is the SNV as called in GATK 4.1.8.1.

    CHROM    POS    .    C    T    .    .    AS_SB_TABLE=9605,9821|113,112;DP=19664;ECNT=16;MBQ=20,20;MFRL=150,150;MMQ=60,60;MPOS=55;POPAF=7.30;TLOD=275.06    GT:AD:AF:DP:F1R2:F2R1:SB    0/1:19426,225:0.011:19651:19231,220:0,0:9605,9821,113,112

    Please see IGV shot below. I included three tracks raw bams from BWA, realigned bam from GATK 4.1.8.1 and GATK 4.2.0.0. I highlighted the SNV in yellow box. As you can see, in both raw bam and 4.1.8.1 realigned bams, there are clear signal of SNV at ~1%. And the region is not assembled in 4.2.0.0 and there is no read there.

    The site is not called in GATK 4.2.5.0 either. And it reports the same error as 4.2.0.0. (Error: Not using kmer size of 10 in read threading assembler because it contains a cycle)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    yqiu can you click on the site of interest in IGV so we can see the depth in the raw bam? Did this site pass FilterMutectCalls in 4.1.8.1?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk