In the format field of a PGT (Pre-Implantation Genetic Testing) VCF, you may find a description similar to this in the metadate:
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
What exactly is meant by the "physical phasing" of the haplotype, in this instance?
Two or more variants will share a PID tag when they are close enough for a tool like HaplotypeCaller to attempt to phase. Within any set of variants sharing a PID, the PGT will tell us which of two homologous chromosomes the alleles fall on.
For example, if variants A→C
and T→G
have PGTs of 0|1 and 1|0, it means that the alt alleles are out of phase. The C
alt allele occurs in the chromosome inherited from Parent 1 and the G
alt allele occurs in the chromosome from Parent 2. (Note that which parent is the mother and which is the father is unknown.)
If the PGTs were instead both 0|1, the alt alleles would be in phase (occurring on the same parent's copy of the chromosome). If the PGTs were both 1|0 it would mean the same thing, since which parent is considered Parent 1 is arbitrary.
The GATK's physical phasing means that we only use one sample and only phase based on the co-occurrence of alleles on actual reads. This is in contrast to statistical phasing, which is more powerful and works over much longer ranges but requires multiple samples.
In other words, the GATK's phasing gives some information about the phasing of rare variants that a population-based tool would not deal with.
0 comments
Please sign in to leave a comment.