Germline CNV WDL - SIGKILL
Hello,
I have been testing the germline CNV workflow. I downloaded WDLs from Github and installed new docker image of GATK (latest on Friday).
I have prepared project folder:
cromwell
-- cromwell-50.jar
-- bams_cnv / 60 BAMs and BAIs (symlink)
-- germline_CNV / inputs.json cnv_germline_cohort_workflow.wdl ploidy_table.csv wes_intervals.bed
-- hg19 / hg19 reference stored here
I am running this command:
java -jar cromwell-50.jar run cnv_germline_cohort_workflow.wdl --inputs ./germline_CNV/inputs.json
The workflow runs with some warning (for like 1 hour) but then it was hanging on "WaitingForReturnCode" for two days and then it gets SIGKILL and its aborted.
Could you please navigate me, where is the mistake?
I am uploading workflow log and inputs.json file.
https://www.dropbox.com/s/z7783eu36dktjg5/inputs_short.json?dl=0
https://www.dropbox.com/s/vxjm2vass2jnkya/workflow.71b72b96-8d7c-470f-8b35-3514af3a0ae7.log?dl=0
I am running workflow locally on Ubuntu 18, 64 GB Ram and 16 core CPU.
-
Hi stanedav,
Happy to see if we can help here. Do you have stdout and stderr files you can provide?
Kind regards,
Jason
-
Hi Jason, here is generated stdout stderr file:
https://www.dropbox.com/s/tbwkk4d645bd6q4/stdout_stderr.txt?dl=0
Command I run for generate this file:
java -jar cromwell-50.jar run ./germline_CNV/cnv_germline_cohort_workflow.wdl --inputs ./germline_CNV/inputs_short.json &>stdout_stderr.txt
-
Hi stanedav,
Thank you for providing the file. I see that since you've collected the stdout and stderr to a single file it's about as verbose as the log—I am essentially looking for a way to narrow down to where the actual issue is shown to be occurring.
It looks like the issue is specifically occurring with the GermlineCNVCallerCohortMode task. It may take some time to dig into this issue, but I'll try to get back to you as soon as I can.
Just to confirm, can you link me to where you downloaded the workflow?
Kind regards,
Jason
-
Hi stanedav,
Do you still require any assistance with this? If so, can you link me to where you downloaded the workflow?
Many thanks,
Jason
-
Hi Jason, sorry for the late response. I downloaded the workflow here:
https://github.com/broadinstitute/gatk/tree/master/scripts/cnv_wdl/germline
(I used git to download whole GATK_master directory with scripts where I found the workflow.
-
Hello Jason, I would like to ask if you have any updates to my issue.
Thank you very much.
-
Hi stanedav,
Looks like your workflow is failing at the task GermlineCNVCallerCohortMode. The workflow log file you linked tells me the log messages at a high level, describing the status of each task but doesn't tell me what caused the shards for that task to fail. Those log messages should be found in each individual shard running for that task.
You maybe able to find it in the following directory:
..
<some local path>/cromwell-executions/CNVGermlineCohortWorkflow/71b72b96-8d7c-470f-8b35-3514af3a0ae7/call-GermlineCNVCallerCohortMode/shard-14/
I'm using shard 14 as an example but any of the ones that failed should be fine. Please provide a link to this type of log file.
Something else to double check is the resource usage for the machine you are running the workflow on. While running the workflow make sure the memory and diskspace isn't approaching its limits. (e.g. `free -h` for memory and `df -h` for disk usage)
-
Hello Beri,
here are the logs, you got point that there is a memory issue. How can I set a memory limit for the whole workflow? I have 64 GB RAM in my machine.
-
Like you mentioned and from the log message from the stderr log below, you may need some additional memory for your input file.
The exit code indicates that the process was terminated. This may mean the process requires additional memory.
There doesn't seem to be a variable to allow you to set the memory for the whole workflow but there are individuals variables for each task you can set. Currently the default java memory being used for the GermlineCNVCallerCohortMode task is 6.5 GB , you can increase this by adding the mem_gb_for_germline_cnv_caller input parameter to your input json file and setting the variable.
"CNVGermlineCohortWorkflow.mem_gb_for_germline_cnv_caller": <add GB integer here>
-
Hello Beri,
I have tested the workflow again with increased amount of memory
CNVGermlineCohortWorkflow.mem_
gb_for_germline_cnv_caller:58 however I still getting the error with not enough memory in one shard
The exit code indicates that the process was terminated. This may mean the process requires additional memory.
at org.broadinstitute.hellbender.
utils.python. PythonExecutorBase. getScriptException( PythonExecutorBase.java:75)
at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor. executeCuratedArgs( ScriptExecutor.java:130)
at org.broadinstitute.hellbender.utils.python. PythonScriptExecutor. executeArgs( PythonScriptExecutor.java:170)
at org.broadinstitute.hellbender.utils.python. PythonScriptExecutor. executeScript( PythonScriptExecutor.java:151)
at org.broadinstitute.hellbender.utils.python. PythonScriptExecutor. executeScript( PythonScriptExecutor.java:121)
at org.broadinstitute.hellbender.tools.copynumber. GermlineCNVCaller. executeGermlineCNVCallerPython Script(GermlineCNVCaller.java: 438)
at org.broadinstitute.hellbender.tools.copynumber. GermlineCNVCaller.doWork( GermlineCNVCaller.java:309)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram. runTool(CommandLineProgram. java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram. instanceMainPostParseArgs( CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram. instanceMain( CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram( Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292) I also tested the workflow with less samples (6) and then it runs without errors. Is there any way to run the workflow with this number of samples (>40) without adding RAM?
If not, do you recommend creating a model in TerraApp and then download it and locally use it only for cases?
Thank you,
David
-
I talked to the dev team and they mentioned you could try decreasing the
num_intervals_per_scatter
variable in the workflow json. This would specify the number of intervals in a scatter so decreasing the variable would decrease the number of intervals the CNVGermlineCohortWorkflow task is processing per shard. Also to reap the benefits of reducing num_intervals_per_scatter, you should be sure to set a sensible value for concurrent-job-limit. The job limiter is a cromwell option that limits the number of jobs (shards) are executed on your machine concurrently.Terra is a great platform to use when limited with compute resources. If i'm not mistaken, each shard would get its own compute resource instead of performing all the shards on one system (occurs when running locally).
-
Hi Beri, David Benjamin,
Would someone be able to explain if the
`backend.providers.YourBackend.config.concurrent-job-limit`
actually applies to the scatter method. I am trying to align and complete alignment QC on 114 fastq files to a reference and so I do this with scatter as in below but my HPC admins indicate the CPU usage increases substantially even when I set `concurrent-job-limit=10` and the server crashes.
my.conf
include required(classpath("application")) call-caching { enabled = true } backend { providers { BackendName { actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory" config { concurrent-job-limit = 10 } } } }
workflow where samples is 114 sample structs with fastqs
scatter (s in samples) { call bwa_task.Mem as bwa { input : trim = trim, read1 = s.read1, #read2 = s.read2, bwaIndex = bwaIndex, outputPrefix = s.outputPrefix, readgroup = s.readgroup, runtime_params = standard_runtime_bwa } call samtools.sort as samsort { input : sam = bwa.outputSam, outputPrefix = s.outputPrefix, runtime_params = standard_runtime_samtools }
-
This forum support issues and questions related to GATK tools and pipelines, external/personal instances of Cromwell isn't supported. Bioinformatics StackExchange and cromwellhq.slack.com are the best resources for those questions.
- Best
Please sign in to leave a comment.
13 comments