Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Potential guidance on GenomeSTRiP CNV discovery on Anvil Terra app

1

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Thank you for your post. Bob Handsaker has been tagged and will get back to you shortly.

    1
    Comment actions Permalink
  • Avatar
    Jason Ni

    [latest development]:
    By inspecting the scripts, I took my best educated guess on a sequence of scripts:

    1. run gs_preprocess.wdl on each sample

    2. run gs_run_cnv_pipeline.wdl on each metadata.zip

    3. run gs_create_cnv_callset.wdl on the set of cnv call output from 2. 

    While upon running step 2, the following error occurred:


    "Task gs_run_cnv_pipeline_wf.setupDiscovery:NA:1 failed.
    The job was stopped before the command finished. PAPI error code 9.
    Execution failed: generic::failed_precondition: pulling image:
    docker pull: running ["docker" "pull" "skashin/genome-strip:latest"]:
    exit status 1 (standard error: "Error response from daemon:
    manifest for skashin/genome-strip:latest not found: manifest unknown:
    manifest unknown\n")"

    for step 1, I was able to successfully carry out the command with the docker "gcr.io/mccarroll-genomestrip/genome-strip:latest". I'm going to try to use this instead of "skashin/genome-strip:latest". Wonder if this is the correct thing to do or if there's an unannounced docker change to the docker. Thanks again. 

    1
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    Hi, Jason,

    You seem to be mostly on the right path. To help, I created a public workspace with 1000 Genomes data and some sample workflows that you can use as templates:

    https://app.terra.bio/#workspaces/mccarroll-genomestrip-terra/Genome%20STRiP%201000G

    This sample currently only contains preprocessing workflows, however. I'm happy to work with you on the CNV pipeline, which is not in as polished a form. If we get a workflow working, perhaps it can be cloned into this sample workspace as a useful template.

    You should also feel free to reach out by email. One initial question I would have is what size of cohort are you intending to process?  It is feasible to do joint calling on up to 500 or so genomes in a single run. When we have scaled up beyond this, we have taken a scatter-merge-regenotype approach. If you need this, I may have some WDLs that can help. If you want to reach out directly by email, feel free.

    1
    Comment actions Permalink
  • Avatar
    Jason Ni

    Hi Bob

    Thanks for the reply. It would be great if we could correspond through email. I wonder what you're email address is as I couldn't find it on GATK on Broad's website. 

    To answer your question, the cohort size is 28. And it could be great if you could share with me the WDLs for this purpose. Thanks in advance. And I'm happy to help modify some scripts to make them work on Terra. 

    Bohan

    0
    Comment actions Permalink
  • 0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk