Potential guidance on GenomeSTRiP CNV discovery on Anvil Terra app
I've currently been trying to run GenomeSTRiP CnvDiscoveryPipeline on Terra. I'm trying to find the correct sequence of workflows to go through. I couldn't match the wdl scripts on Firecloud with the stages described in this link.
My case is to run germline CNV discovery on a set of ~30 WGS (30X) samples with Mendelian diseases. I checked the RunCNVPipeline script and it seems like the most likely match for what I need. Do I need to run SingleStepPreprocessing/ before I run RunCNVPipeline? And Do I need to run CreatCnvCallset afterward, or is this a separate version of the CNVPipeline script? I hope to gain some guidance on the sequence of steps I need to run through on Terra.
It would be great if there is some sort of schematics describing the relations between each module. Thank you in advance!
Thank you for your post. Bob Handsaker has been tagged and will get back to you shortly.
By inspecting the scripts, I took my best educated guess on a sequence of scripts:
1. run gs_preprocess.wdl on each sample
2. run gs_run_cnv_pipeline.wdl on each metadata.zip
3. run gs_create_cnv_callset.wdl on the set of cnv call output from 2.
While upon running step 2, the following error occurred:
"Task gs_run_cnv_pipeline_wf.setupDiscovery:NA:1 failed.
The job was stopped before the command finished. PAPI error code 9.
Execution failed: generic::failed_precondition: pulling image:
docker pull: running ["docker" "pull" "skashin/genome-strip:latest"]:
exit status 1 (standard error: "Error response from daemon:
manifest for skashin/genome-strip:latest not found: manifest unknown:
for step 1, I was able to successfully carry out the command with the docker "gcr.io/mccarroll-genomestrip/genome-strip:latest". I'm going to try to use this instead of "skashin/genome-strip:latest". Wonder if this is the correct thing to do or if there's an unannounced docker change to the docker. Thanks again.
You seem to be mostly on the right path. To help, I created a public workspace with 1000 Genomes data and some sample workflows that you can use as templates:
This sample currently only contains preprocessing workflows, however. I'm happy to work with you on the CNV pipeline, which is not in as polished a form. If we get a workflow working, perhaps it can be cloned into this sample workspace as a useful template.
You should also feel free to reach out by email. One initial question I would have is what size of cohort are you intending to process? It is feasible to do joint calling on up to 500 or so genomes in a single run. When we have scaled up beyond this, we have taken a scatter-merge-regenotype approach. If you need this, I may have some WDLs that can help. If you want to reach out directly by email, feel free.
Thanks for the reply. It would be great if we could correspond through email. I wonder what you're email address is as I couldn't find it on GATK on Broad's website.
To answer your question, the cohort size is 28. And it could be great if you could share with me the WDLs for this purpose. Thanks in advance. And I'm happy to help modify some scripts to make them work on Terra.
Please sign in to leave a comment.