Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

What is the 'physical phasing'

0

3 comments

  • Avatar
    Bhanu Gandham

    Hi ,

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

     

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    whynot Two or more variants that are close enough for HaplotypeCaller to attempt to phase share a PID tag.  Within a set of variants with the same PID, the PGT tells us which of two homologous chromosomes the alleles fall on.  For example, if variants A->C and T->G have PGTs of 0|1 and 1|0, respectively, it means that the alt alleles are out of phase.  The C alt allele occurs in the chromosome inherited from parent 2 and the G alt allele occurs in the chromosome from parent 2.  (Note that which parent is the mother and which is the father is unknown).  If the PGTs were instead both 0|1, the alt alleles would be in phase, occurring on the same parent's copy of the chromosome.  If the PGTs were both 1|0 it would mean the same thing, since which parent is considered parent 1 is arbitrary.

    The GATK's physical phasing means that we only use one sample and only phase based on co-occurrence of alleles on actual reads.  This is in contrast to statistical phasing, which is more powerful and works over much longer ranges but requires multiple samples.  That is, the GATK's phasing is not a substitute for a tool like Eagle, but it does give some information about the phasing of rare variants, for example, that a population-based tool would not deal with.

    0
    Comment actions Permalink
  • Avatar
    that girl

    here maybe one literal error

    The C alt allele occurs in the chromosome inherited from parent 2 and the G alt allele occurs in the chromosome from parent 2.  (Note that which parent is the mother and which is the father is unknown).  

     

    which should be parent 1 for one of parent 2

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk