Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Variant Calling at Interval Boundaries in GATK

2

3 comments

  • Avatar
    James Emery

    Hello Meryem Akdeniz. There are a number of complicated assembly/mapping related reasons that could result in not calling just on the border of an intervals file. I notice that `31182732` is the very next base after the start of the interval file. Its conceivable there is some off-by-one bug/indexing issue between formats that is not being handled correctly but I would first confirm that HaplotypeCaller is actually able to call that variant. Generally we expect the tool often has artifacts that are inequivalent to normal sequencing very close to the edge of traversal boundaries due to how the activity profile code works. We automatically pad intervals by a small amount for that code but it can often not be enough to cut down on messieness. If you want the best performance I would recommend adding extra ~200-500 bases at the ends of your intervals.

    I would recommend adding the -ip (interval padding) argument with perhaps 500 bases just to force haplotype caller to assemble good assembly regions. If the variant is still not called, it is likely related to assembly in some way. You can try some of the recommendations here: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant. 

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    Bed uses different coordinates than vcf.  BED is 0-based half open and vcf is 1 based closed coordinates. [0, 10) vs [1, 10] 

    Isn't chrX:31182732  in bed's 0 based coordinates the equivalent to chrX:31182733 in vcf's 1 based coordinates?  I think you're variant is just 1 base outside of your intervals so it's not being called. 

    0
    Comment actions Permalink
  • Avatar
    Meryem Akdeniz
    Thank you for replying James Emery Louis Bergelson
    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk