ArtificialHaplotypeRG makes variant heterozygous
AnsweredDear GATK,
Why do I see the Artificial Haplotype RG add more heterozygosity?
In the top part we have the bam file after using HaplotypeCaller with the option --bamfile. In the lower part there is the bam file after mapping with BWA-MEM.
We can see that the Artificial HaplotypeCaller adds 120 reads to T, and thus when variants are called this position gets called as an heterozygote.
How can it be that 2 T's can have such a large influence that it adds 120 more?
Thanks in advance.
Regards,
Wiebe
-
Hi Wiebe Kooistra,
HaplotypeCaller does local realignment so the counts before and after at a specific site will not always match. Could you check how many of the reads are marked with ArtificialHaplotype in the RG field?
Best,
Genevieve
-
I did read about the local realignment. There are 173 ArtificialHaplotypeRG's marked sequences. That the numbers don't add up I understand, I was just wondering why my homozygous SNP for the alternative allele is getting so much heterozygous scores in the Artificial Haplotype RGs?
Regards,
Wiebe
-
Could you share the GVCF or VCF output line? It may help me better understand why the bamout appears like this.
-
Hey,
Sure. Here is the VCF output line: SL4.0ch11 25548831 . T C,A 4.09785e+06 PASS AC=1,0;AF=0.523,0.006162;AN=2;BaseQRankSum=-0.295;DP=257674;ExcessHet=-0;FS=0;InbreedingCoeff=0.6408;MLEAC=819,8;MLEAF=0.713,0.006969;MQ=60;MQRankSum=0;QD=34.5;ReadPosRankSum=-0.269;SOR=2.688GT:AD:DP:FT:GQ:PL 0/1:41,91,0:132:PASS:99:1045,0,601,1162,800,1962
The gVCF output line is: SL4.0ch11 25548831 . T C,<NON_REF> 1037.60 . BaseQRankSum=8.323;DP=374;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=1346400,374;ReadPosRankSum=-0.517 GT:AD:DP:GQ:PL:SB 0/1:41,91,0:132:99:1045,0,601,1162,800,1962:41,0,91,0
-
Thanks for sharing these lines!
The bamout is a debugging tool meant to view the local reassembly of HaplotypeCaller and understand the called haplotypes. The reads marked with ArtificialHaplotypeRG are not real reads and are constructed by HaplotypeCaller during the local realignment step. You don't need to worry about these reads in the bamout and can remove them from the view if you want.
Otherwise, it looks like this site ended up being called because of the local realignment here. After local realignment, HaplotypeCaller found 41 reads in support of of T and 91 reads in support of C.
If you think that this haplotype should not have been called, you can try some troubleshooting methods following this document: When HaplotypeCaller and Mutect2 do not call an expected variant. I would recommend trying --linked-de-bruijn graph and also making sure your GATK version is up to date.
Hopefully this helps you to understand your results. Let me know if you have any follow up questions.
Best,
Genevieve
Please sign in to leave a comment.
5 comments