Help with WDL workflow for mutec2?Answered
Hi this is concerning the workflow: https://github.com/gatk-workflows/gatk4-somatic-snvs-indels
Not sure where to ask but I thought I try here first. I'm trying to run the example in the json locally. All files have been donwloaded locally and using cromwell-49.jar. However the last merging step seem to have this error I cannot dicipher and wondering if someone can help or point me to the right direction?
the error is this:
cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs:
Bad output 'M2.tumor_sample': Failed to read_string("tumor_name.txt") (reason 1 of 1): Futures timed out after [60 seconds]
Bad output 'M2.normal_sample': Failed to read_string("normal_name.txt") (reason 1 of 1): Futures timed out after [60 seconds]
so its strange because the "tumor_name.txt" is literally a string and in each sub directory I can clearly see that there are these files. Anyone know what is going on here? thanks.
In the WDL script the output block for the M2 task specifies two variables
String tumor_sample = read_string("tumor_name.txt")
String normal_sample = read_string("normal_name.txt")
The read_string() function should be reading the contents of those files, perhaps the files are empty? Try checking the log.stderr and log.stdout files to confirm M2 task ran without any problems.
Also, the repo you're pointing to is archived, it's best to use the latest version of the workflow found here.
Beri thanks. What is strange is that each subdirectory looks fine. There are no errors. and both tumor_name.txt and normal has a text to the file names. Each subdirectory has a a output vcf. Stdout looks fine
Moreover the stderr no errors, the last entry was a command.
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar GetPileupSummaries -R /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/-1275058761/Homo_sapiens_assembly19.fasta -I /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/2119282986/HM5086F_A.b37.bam --interval-set-rule INTERSECTION -L /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/1020858285/0040-scattered.interval_list -V /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/-1275058761/small_exac_common_3.vcf -L /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/-1275058761/small_exac_common_3.vcf -O normal-pileups.table
So If there aren't any errors in the stderr and GATK says it was successful chances are it related to Cromwell/WDL.
The WDL documentation has some info mentioning possible issues when using read_string()
If the entire contents of the file can not be read for any reason, the calling task or workflow will be considered to have failed. Examples of failure include but are not limted to not having access to the file, resource limitations (e.g. memory) when reading the file, and implementation imposed file size limits.
Beri thanks. What is bizaare is that even if I hard coded the file names, what just get rid of read_string() then there are other errors. I've just pretty much given up on this since it would've been faster had I just build this from scratch!
Please sign in to leave a comment.