Variant Calling at Interval Boundaries in GATK
a) GATK version used: 4.2.0.0
b) Exact command used:
"-I {bam} "
Hi GATK community,
I am experiencing an issue while working with GATK, and I'd appreciate your assistance in resolving it.
In my BED file, I have the following line:
chrX 31182732 31182909 DMD . .
There is a significant variant at the position chrX-31182732 T>G. However, when I run the command with the provided BED file, I cannot see the variant at the "chrX 31182732" position in the output VCF file.
Interestingly, when I perform a test by expanding the BED file by one base on each side (changing it to "chrX 31182731 31182910 DMD . ."), I can observe the variant at the "chrX 31182732" position in the VCF file.
Is it expected behavior for GATK not to consider variants at the boundaries of interval files? Your insights and guidance on resolving this issue would be greatly appreciated.
Thank you in advance for your help.
Best regards,
Meryem
-
Hello Meryem Akdeniz. There are a number of complicated assembly/mapping related reasons that could result in not calling just on the border of an intervals file. I notice that `31182732` is the very next base after the start of the interval file. Its conceivable there is some off-by-one bug/indexing issue between formats that is not being handled correctly but I would first confirm that HaplotypeCaller is actually able to call that variant. Generally we expect the tool often has artifacts that are inequivalent to normal sequencing very close to the edge of traversal boundaries due to how the activity profile code works. We automatically pad intervals by a small amount for that code but it can often not be enough to cut down on messieness. If you want the best performance I would recommend adding extra ~200-500 bases at the ends of your intervals.
I would recommend adding the -ip (interval padding) argument with perhaps 500 bases just to force haplotype caller to assemble good assembly regions. If the variant is still not called, it is likely related to assembly in some way. You can try some of the recommendations here: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant. -
Bed uses different coordinates than vcf. BED is 0-based half open and vcf is 1 based closed coordinates. [0, 10) vs [1, 10]
Isn't chrX:31182732 in bed's 0 based coordinates the equivalent to chrX:31182733 in vcf's 1 based coordinates? I think you're variant is just 1 base outside of your intervals so it's not being called.
-
Thank you for replying James Emery Louis Bergelson
Please sign in to leave a comment.
3 comments