Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Haplotypes of duplicated genes

0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi,

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Muhammad Shoaib Akhtar,

    Would you be able to clarify your question so I can better understand what you are trying to do? What kind of output are you hoping for? Are you looking to do phasing across these genes?

    Thank you,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Genevieve Brandt (she/her) Thank you for your reply.

    In my dataset, several human genes are found to be duplicated using GATK's CNV pipeline. Duplicated means their copy number is 3N, 4N or 5N. Previously, I used GATK's Short Variant Discovery Pipeline and I was able to make consensus sequence of both haplotypes. Now, if CNV is 3N it must have 3 haplotypes. How can I make a consensus sequence for such third haplotype?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    What do you use to make the consensus sequence?

    One method that might work for you would be to re-run HaplotypeCaller using the -ploidy argument to specify the ploidy that was called with the CNV pipeline. You can use the intervals argument to only run it on those positions of your genome.

    You will get better calls if the ploidy is accurate.

    Hope this helps!

    1
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Genevieve Brandt (she/her) Thank you so much

    I already tried this method but calls information didn't change. Even later, I can get only 2 haplotypes.

    Usually, I use bcftools consensus to make consensus sequence using hard-filtered vcf file.

    0
    Comment actions Permalink
  • Avatar
    SkyWarrior

    Not with short read sequencing technologies. 3N does not always mean 3 different haplotypes BTW. 3 copies of the same haplotype or 2 copies of one haplotype and 1 copy of another haplotype can also end up producing 3 copies. 

    If you really want to perform haplotyping multiple copy genes I suggest you perform long-read sequencing by pacbio and use whatshap to generate your haplotypes backed by long reads. 

    Regards. 

    2
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk