HaplotypeCaller's Underlying Statistical Model
What statistical model does HaplotypeCaller use to decide between possible genotypes? In the simplest case of only a reference base and one alternate base at a position, I imagine something like a binomial distribution with parameters p = 0.5 and n being the number of reads overlapping the position. If the p-value of the observed data is large and null hypothesis is not rejected, then report 0/1, else 0/0 if mostly reference base is seen, else 1/1 if mostly alternate base is seen. Is it a multinomial distribution or something else?
-
Hi Dario
You can take a look at our documentation in the link below.
In short HaplotypeCaller uses a bayesian approach to assigning genotypes.
Regards.
Please sign in to leave a comment.
1 comment