Combining mplileup and haplotypeCaller calls
AnsweredHello,
I constructed a dataset from complete genomes using Haplotype caller in GVCF mode followed by JointGenoyping. Later I wanted to expand my dataset, but as my timeframe was short I decided to only call the SNPs found in my initial dataset in the bam files of the new samples thanks to mpileup (samtools).
The result is that there is a lot of missing data in the new samples (around 40%)
I guess this is due to the different method of calling the variants but I would need help to understand the details of what causes this problem.
-
Hi Charlotte Her,
There are definitely differences in the algorithms for the two different methods in how they handle sequencing errors and heuristics for identifying active regions and variants. You can read more about the specific HaplotypeCaller algorithm here:
https://gatk.broadinstitute.org/hc/en-us/articles/360035531412-HaplotypeCaller-in-a-nutshell
https://gatk.broadinstitute.org/hc/en-us/sections/360007226771-Algorithms
I also found this post explaining some of the inherent differences between how the tools work. In general, I wouldn't recommend trying to combine the outputs, but rather stick with one.
Kind regards,
Pamela
-
Thank you Pamela, that helps me a lot!
Please sign in to leave a comment.
2 comments