I have a sample (Sample1) that is analyzed using BWA v0.7.6a-r433 and GATK 188.8.131.52. I called the variants using HaplotypeCaller using the ERC mode (GVCF) and then joint genotyped the samples (8 samples) using GenotypeGVCFs.
In one region in this sample, there is a variant that is TRUE (based on a third-party quality control, (chr1:169510380) that we did NOT call.
As you can see below, the variant exists in Sample1 when we look at the BAM file which is generated from BWA. But when I take the BAM file that is generated by HaplotypeCaller (containing the active regions by using –bamOut option), one sees clearly that HaplotypeCaller didn’t consider this region in Sample1 as an active region and thus there is no way of calling the variant.
This variant was called in another sample from the same run (Sample2). As you can see below as well, there is no big difference in the quality of the reads between the two samples (The coverage of both is around 500 and there is no bias in one strand or the other in both samples).
My questions are:
- Where should I start searching for the reason? What can I do to check why is this not called in Sample1?
- What are the parameters in the original BAM files (from BWA) that lead to skip a region in HaplotypeCallerERC. I read the documentation here (https://www.biorxiv.org/content/10.1101/201178v3.full) but was not able to figure out an exact number for a position to look after, it was too technical and I was lost in-between equations. I also read this simple document (https://gatk.broadinstitute.org/hc/en-us/articles/360036227652-ActiveRegion-determination-HaplotypeCaller-and-Mutect2-) without getting much wiser.
- How can I force HaplotypeCaller to call this variant?
I hope I was able to explain the case clearly. Please let me know if I missed any required details and/or you need any more info.
Thanks for your help.
Please sign in to leave a comment.