Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Running GenomeSTRiP in batch-like manner

0

2 comments

  • Avatar
    Bhanu Gandham

    Hi Emma Wiener

     

    Tagging Bob Handsaker in this thread and he might be able to help out with this question. Thank you Bob!

     

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    Hi, Emma,

    The basic strategy is to make two passes:  Make a pass (e.g. CNV discovery) in each batch, then merge the CNVs and remove duplicates, then re-genotyped the unified set in each batch to try to get unbiased ascertainment.

    We have some WDL pipelines that implement this strategy. These are available in Terra and the WDLs are also in the release (under wdl/firecloud/...). We don't have cluster-based versions of these pipelines, as we have been moving to cloud processing for scalability.

    You could upload your data to google storage and run the pipelines in Terra. Or alternatively you could look at the WDLs and use them as a guide to write your own cluster-based scripts.

    -Bob

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk