The way the phasing algorithm decides to phase is by checking whether two variants always occur on the same haplotype or always occur on a different haplotypes. The excess haplotypes severely dilute the signal.
For example, let's say variants A and B both occur on real haplotype H1, but that HC also assembled a similar false haplotype H2. If any reads supporting variant A match H2 better than H1, the phasing via H1 is lost.
This raises the question of whether we could do better, and the answer is yes, easily. The current code is very naive.
However, instead of improving our phasing algorithm our current efforts are in assembling fewer and better haplotypes.
Basically, the goal is to prevent H2 from existing in the first place, in which case the current naive phasing algorithm will probably work well enough.
Please sign in to leave a comment.