Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Discrepancy between GATK joint vcf and recaled.bam files

Answered
0

5 comments

  • Avatar
    Xiao Ran Luo

    Here are the tables and figures:

    Fig. 2

    Fig. 1

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Xiao Ran Luo,

    I am going to move your post into our Community Discussions -> General Discussion topic, as the Non-Human topic is for reporting bugs and issues with GATK.

    You can read more about our forum guidelines and the topics here: Forum Guidelines.

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Xiao Ran Luo,

    Thanks for writing into the forum! I have a few key suggestions that might help you to make sense of what is going on with your samples.

    1. We have a troubleshooting document that goes over why certain variants are called. When HaplotypeCaller and Mutect2 do not call an expected variant. It was written by our developers with steps they follow when troubleshooting these types of cases.
    2. I would recommend looking at the -bamout file from HaplotypeCaller to see what these variants look like after all of the read realignment and other steps in the HaplotypeCaller algorithm. HaplotypeCaller is not a simple pileup caller, see this article for more details.
    3. There are certain read filters that are used by HaplotypeCaller, listed in the tool documentation under Read Filters.
    4. I wasn't able to understand your first example, the table didn't really make it clear for me. Can you show the actual VCF lines and bamout view?
    5. For the second example, the HOM_REF sample has a GQ of 0, which indicates that the tool identified something happening in the region, but could not call it correctly.
    6. Could you share the bamout example in IGV for the third example and the VCF lines?
    7. We have an article about when allele depth is lower than expected, you can take a look here.

    I'm not sure about the genomes you mentioned in your post, or about specific mRNA contamination. Potentially other users who have insight into these topics can chime in. Let me know what you find and if you have further questions.

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Xiao Ran Luo

    Hi Genevieve, 

    Thank you so much for your reply! 

    Re: 4, here are the VCF lines for the 1st example, bmp2k, with some columns that only contained ‘NA’ deleted:

    Re: 6, here are the VCF lines for the 3rd example, spin2c, with some columns that only contained ‘NA’ deleted: 

    Please let me know if you have any additional questions, and thanks again for your help!

    Best,

    Luna

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Luna, 

    It's too difficult to see this info in screenshots, could you paste the text VCF file into a comment?

    Thanks,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk