Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

REAL Variant not called by HaplotypeCaller

Answered
1

4 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi, please note that your question was posted while the GATK Team was Out of Office

    Please repost any outstanding GATK issues and we will get to them if possible. Our first priority is solving GATK issues and abnormal results, see our support policy for more details.

    0
    Comment actions Permalink
  • Avatar
    NawarDalila

    Hi,

    I have kind of the same issue. Here is what I experienced lately and I really can't figure out a starting point.

    I have a sample (Sample1) that is analyzed using BWA v0.7.6a-r433 and GATK 4.1.0.0. I called the variants using HaplotypeCaller using the ERC mode (GVCF) and then joint genotyped the samples (8 samples) using GenotypeGVCFs.

    In one region in this sample, there is a variant that is TRUE (based on a third-party quality control, (chr1:169510380) that we did NOT call.

    As you can see below, the variant exists in Sample1 when we look at the BAM file which is generated from BWA. But when I take the BAM file that is generated by HaplotypeCaller (containing the active regions by using –bamOut option), one sees clearly that HaplotypeCaller didn’t consider this region in Sample1 as an active region and thus there is no way of calling the variant.

    This variant was called in another sample from the same run (Sample2). As you can see below as well, there is no big difference in the quality of the reads between the two samples (The coverage of both is around 500 and there is no bias in one strand or the other in both samples).

    My questions are:

    • Where should I start searching for the reason? What can I do to check why is this not called in Sample1?

     

     

    • How can I force HaplotypeCaller to call this variant?

     

    I hope I was able to explain the case clearly. Please let me know if I missed any required details and/or you need any more info.

    Thanks for your help.

    0
    Comment actions Permalink
  • Avatar
    HASSAN BADRANE

    Hi, you are describing the same issue I (and others) stated before. It's not due to BAM, it's simply GATK HaplotypeCaller considering that region too messy with so many gaps, that it removes the whole chunk.

    The solution that resolved the issue for me was to configure the --kmer-size option for "gatk HaplotypeCaller", I forgot what was the default values, but you can input multiple values (the program will run using these different values and choose the best result).

    For my case the problem was resolved when I tried some values smaller than default as follow:

    --kmer-size 18 --kmer-size 22

    GOOD LUCK.

    2
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    NawarDalila A few other helpful tips:

    • You can force call an allele with the --alleles argument and it might give more insight into why it was not called.
    • Use the --debug argument and look at the stderr or stdout files to see what is happening with the assembly.
    • I would recommend a newer version of GATK, I think our adaptive pruning improvements have been added since GATK 4.1.0.0, and could potentially help with this case.
    • How many reads are supporting the alt allele in Sample 1?

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk