Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Cannot parallelize GATK

0

4 comments

  • Avatar
    Anthony DiCi

    Hi Shaun Clare,

    Thank you for writing to the GATK forum! I hope that we can help you sort this out.

    You are correct that it seems to append another filename to the current one; the %09%20%20 means that there's a TAB character and two spaces between file:///nfs4/HORT/Bassil_Lab/HopSex/AAC5HWMM5_21538.sorted.bam and AAC7WMGM5_W1130-068.sorted.bam. I can only explain this by the command getting a malformed input string, which means that not GATK is the problem, but something with your piping from ls | parallel … doesn't work as you expect. Can you please verify that GATK gets exactly one file path per call?

    Thank you! I look forward to your reply.

    Best,

    Anthony

    0
    Comment actions Permalink
  • Avatar
    kvn95ss

    You can try writing the bam list into a file, then use parallel -a to use that list as input

    For Ex.

    ls *.sorted.bam > input_list # Not the best way to be frank
    find $PWD -name "*.sorted.bam" > input_list # Recommended, also gives full path of bam files
    parallel -a input_list ~~~ rest of the command ~~~

     

    We've also noticed, sometimes parallel can get fussy then using multiple repeating {} elements.

    You could make a function for haplotypecaller, export it and then call the function with parallel.

    # Assuming bash
    haplo_call () {
    input="$1"
    ouptut="${input/.sorted.bam/.gvcf.gz}"
    gatk haplotypecaller ~~~ your commands~~~
    }
    export -f haplo_call
    parallel -a bam_list haplo_call {1}
    0
    Comment actions Permalink
  • Avatar
    Shaun Clare

    I managed to fix it by modifying ls to write as one column using:

    ls -1 *.sorted.bam | <rest of command>

    I'm not sure why it worked in the first place, or maybe I accidentally deleted part of my code at some point. Thank you for your replies. I mainly wanted to use this way to save writing files

     

    1
    Comment actions Permalink
  • Avatar
    Anthony DiCi

    Hi Shaun Clare,

    I'm glad that we were able to help solve this issue collectively! Thank you for being a valued contributor to the GATK community.

    Please do not hesitate to reach out with any other questions/issues in the future!

    kvn95ss Thank you for your contribution to the GATK forum. We greatly value collaboration between other members of the GATK community.

    Best,
    Anthony

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk