REAL Variant not called by HaplotypeCaller
AnsweredHello, I'm running into that widespread bug, in which real variant are not called by HaplotypeCaller. There are 10 variants clearly and easily visible to the eye, when viewing the BAM file. However, the whole region with these 10 variants (about 105 bases) is eliminated from the BAMOUT, and thus these are not called. I'm just focusing on this region as a test to troubleshoot this problem, but I expect this to have other occurrences in other regions of the genome.
I tried various tricks :
--force-active, --adaptive-pruning, --linked-de-bruijn-graph, --recover-all-dangling-branches, etc. But no success.
I run HaplotypeCaller on just that region of 105 bases, and they do show on this short bamout alignment and the variants are called. When I rerun the HaplotypeCaller on just this chromosome, same problem, they are not called.
Any suggestions to circumvent this problem?
-
Hi, please note that your question was posted while the GATK Team was Out of Office.
Please repost any outstanding GATK issues and we will get to them if possible. Our first priority is solving GATK issues and abnormal results, see our support policy for more details.
-
Hi,
I have kind of the same issue. Here is what I experienced lately and I really can't figure out a starting point.
I have a sample (Sample1) that is analyzed using BWA v0.7.6a-r433 and GATK 4.1.0.0. I called the variants using HaplotypeCaller using the ERC mode (GVCF) and then joint genotyped the samples (8 samples) using GenotypeGVCFs.
In one region in this sample, there is a variant that is TRUE (based on a third-party quality control, (chr1:169510380) that we did NOT call.
As you can see below, the variant exists in Sample1 when we look at the BAM file which is generated from BWA. But when I take the BAM file that is generated by HaplotypeCaller (containing the active regions by using –bamOut option), one sees clearly that HaplotypeCaller didn’t consider this region in Sample1 as an active region and thus there is no way of calling the variant.
This variant was called in another sample from the same run (Sample2). As you can see below as well, there is no big difference in the quality of the reads between the two samples (The coverage of both is around 500 and there is no bias in one strand or the other in both samples).
My questions are:
- Where should I start searching for the reason? What can I do to check why is this not called in Sample1?
- What are the parameters in the original BAM files (from BWA) that lead to skip a region in HaplotypeCallerERC. I read the documentation here (https://www.biorxiv.org/content/10.1101/201178v3.full) but was not able to figure out an exact number for a position to look after, it was too technical and I was lost in-between equations. I also read this simple document (https://gatk.broadinstitute.org/hc/en-us/articles/360036227652-ActiveRegion-determination-HaplotypeCaller-and-Mutect2-) without getting much wiser.
- How can I force HaplotypeCaller to call this variant?
I hope I was able to explain the case clearly. Please let me know if I missed any required details and/or you need any more info.
Thanks for your help.
-
Hi, you are describing the same issue I (and others) stated before. It's not due to BAM, it's simply GATK HaplotypeCaller considering that region too messy with so many gaps, that it removes the whole chunk.
The solution that resolved the issue for me was to configure the --kmer-size option for "gatk HaplotypeCaller", I forgot what was the default values, but you can input multiple values (the program will run using these different values and choose the best result).
For my case the problem was resolved when I tried some values smaller than default as follow:
--kmer-size 18 --kmer-size 22
GOOD LUCK.
-
NawarDalila A few other helpful tips:
- You can force call an allele with the --alleles argument and it might give more insight into why it was not called.
- Use the --debug argument and look at the stderr or stdout files to see what is happening with the assembly.
- I would recommend a newer version of GATK, I think our adaptive pruning improvements have been added since GATK 4.1.0.0, and could potentially help with this case.
- How many reads are supporting the alt allele in Sample 1?
Hope this helps!
Genevieve
Please sign in to leave a comment.
4 comments