Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Help with WDL workflow for mutec2?

Answered
0

5 comments

  • Avatar
    Beri

    In the WDL script the output block for the M2 task specifies two variables 
    String tumor_sample = read_string("tumor_name.txt")
    String normal_sample = read_string("normal_name.txt")

    The read_string() function should be reading the contents of those files, perhaps the files are empty? Try checking the log.stderr and log.stdout files to confirm M2 task ran without any problems. 

    0
    Comment actions Permalink
  • Avatar
    Beri

    Also, the repo you're pointing to is archived, it's best to use the latest version of the workflow found here.

    0
    Comment actions Permalink
  • Avatar
    Alex Lee

    Beri thanks. What is strange is that each subdirectory looks fine.  There are no errors. and both tumor_name.txt and normal has a text to the file names. Each subdirectory has a a output vcf. Stdout looks fine 

    Tool returned:
    SUCCESS
    Tool returned:
    SUCCESS
    Tool returned:
    SUCCESS

    Moreover the stderr no errors, the last entry was a command.  

    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar GetPileupSummaries -R /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/-1275058761/Homo_sapiens_assembly19.fasta -I /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/2119282986/HM5086F_A.b37.bam --interval-set-rule INTERSECTION -L /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/1020858285/0040-scattered.interval_list -V /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/-1275058761/small_exac_common_3.vcf -L /cromwell-executions/Mutect2/40fdbfbc-bc9c-4bdf-be88-3630342aaca1/call-M2/shard-40/inputs/-1275058761/small_exac_common_3.vcf -O normal-pileups.table

     

     

    0
    Comment actions Permalink
  • Avatar
    Beri

    So If there aren't any errors in the stderr and GATK says it was successful chances are it related to Cromwell/WDL. 

    The WDL documentation has some info mentioning possible issues when using read_string()

    If the entire contents of the file can not be read for any reason, the calling task or workflow will be considered to have failed. Examples of failure include but are not limted to not having access to the file, resource limitations (e.g. memory) when reading the file, and implementation imposed file size limits.

    0
    Comment actions Permalink
  • Avatar
    Alex Lee

    Beri thanks. What is bizaare is that even if I hard coded the file names, what just get rid of read_string() then there are other errors.  I've just pretty much given up on this since it would've been faster had I just build this from scratch! 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk