Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Multi-sample VCF for CNV calling

Answered
0

8 comments

  • Official comment
    Avatar
    Genevieve Brandt (she/her)

    Hi Muhammad Shoaib Akhtar,

    Thank you for your post. We keep track of these feature requests so our developers can use them when improving the GATK tools. I encourage other users who could use this feature to interact with this post so that it gets more attention!

    In regards to running our gCNV pipeline on many samples, I would recommend checking out our WDLs. They are meant to be optimized for many samples. Specifically for what you are looking to do, I would recommend checking out the joint_call_exome_cnvs.wdl.

    Hope this helps!

    Genevieve

    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Muhammad Shoaib Akhtar,

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

    0
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Genevieve Brandt (she/her) thank you for your suggestion.

    I already finished this task by writing a bash script. I hope using WDLs will be helpful for other people trying to do similar analysis.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for the update Muhammad Shoaib Akhtar!

    1
    Comment actions Permalink
  • Avatar
    Faisal Almalki

    Dear Muhammad Shoaib Akhtar

    I would be interested to try your bash script. I am facing the same problem. 

    0
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Faisal Almalki it's s;imple. Please use bgzip, tabix and bcftools merge.

    I added my script below:

    #!/bin/sh

    ls
    for i in `ls -d gen*.vcf`;do
    echo $i
    bgzip $i
    done

    for j in `ls -d gen*.vcf.gz`;do
    echo $k
    tabix $j
    done

    bcftools merge -l list_of_vcf.gz -o output.vcf.gz

    gen*.vcf is output of genotyped intervals in PostprocessGermlineCNVCalls. You may run similarly for segments.

    Goodluck

    1
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Thank you for providing your solution Muhammad Shoaib Akhtar!

    0
    Comment actions Permalink
  • Avatar
    Faisal Almalki

    thank you for the script. Its really helpful

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk