We are using GATK haplotypecaller for calling SNPs from Exome Sequencing data in wheat. This plant is allopolyploid with 3 different sub-genomes (A, B and D) that are highly similar with each other (for gene sequences this similarity is in the range 95-99% identity at nucleotide level). These 3 genomes do not recombine, thus they technically act as diploid for the purpose of genotying.
We identified some problem with an excessive number of HET calls (see this previous post), but we were not able to fix the problem from your previous suggestions. Most of these HET calls show really few reads supporting the alternative allele (like 2 reads supporting ALT and 18 supporting REF)
Given the high similarity between the 3 sub-genomes in wheat (and the high homozygosity of the samples), we suspect that these HET calls might be caused by read mis-mapping (i.e. mapping to similar sequences but in the wrong sub-genome). I would like to understand how to best use the --phred-scaled-global-read-mismapping-rate parameter for reducing the 'weight' of possibly mis-mapped reads that might cause HET calls (like for example when an HET is called even though only 2 reads out of 20 support the ALT allele).
Thanks a lot for your help
Please sign in to leave a comment.