I am using the following command to call variants:
gatk HaplotypeCaller \
-R dmel-all-chromosome-r5.44.fasta \
-I ../121002_I637_FCD1B4GACXX_L1_SZAXPI015086-62.sort.noDup.realign.bam \
-L 3R:10100000-10400000 \
--dont-use-soft-clipped-bases true \
--bamout test_62_3R10209297.bam \
For a position 3R:10209297, HaplotypeCaller reports it as a heterozygous as :
3R 10209297 . G T 1578.64 . AC=1;AF=0.500;AN=2;DP=63;ExcessHet=3.0103;FS=4.152;MLEAC=1;MLEAF=0.500;MQ=59.39;MQRankSum=-0.621;QD=28.19;SOR=1.179 GT:AD:DP:GQ:PL 0/1:17,39:56:99:1586,0,568
But when I checked the input bam file in IGV, I find it's an alternative homozygous site (T/T), no read supports the reference (upper track in the following figure).
[upper track: input bam file; bottom track: bamout bam file and grouped/colored by "HC" ]
However, when I checked the --bamout output file (the bottom track in the above figure), it turns out to be a heterozygous site with some reads supporting G allele. And all the reads containing G at this position are from HC tagged artificial haplotype! Clearly, the artificial haplotype changes the position from homoAlt -> Hetero by introducing a G allele at this position.
The genotype, as indicated in the gvcf file, is a a high quality call (high AD and GQ), it's unlikely to be filtered out.
But I think it's more likely to be a wrong genotype! Because the individual's father:
The individual's mother:
Both parents are homoAlt, with ZERO reads having a reference allele. Therefore, I believe their child's genotype should also be 1/1, rather than 0/1. (Except a new mutation, which is very very rare.)
It seems the artificial haplotype causes the problem. But I have no idea how could it introduce a new allele, and I am wondering if it's possible to disable artificial haplotype.
Please sign in to leave a comment.