Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Help with running GATK RNAseq workflow with docker backend and multiple cpus?

0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi ThyagoLC,

    The way you will want to configure your WDL will depend on where you are running it. If you are running on your local machine, the docker containers will have access to the available CPU and memory. 

    You can edit the commands in each task of the workflow to use a specific amount of CPUs. For example, StarGenerateReferences (the task where STAR is called), allows you to specify the amount of CPUs with --runThreadN ${threads}. Some tools do not have the option to specify CPUs but you can adjust the memory for GATK commands with --java-options and the Xmx parameter. Here is an article covering resource specification with GATK: https://gatk.broadinstitute.org/hc/en-us/articles/360035532372-Java-is-using-too-many-resources-threads-memory-or-CPU-

    Hope this helps! 

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    ThyagoLC

    Genevieve Brandt (she/her), Thanks!

    Got it. So the problem is that other than STAR, those other Picard tools don't support multi-threading, so I can't pass multiple cpus in runtime section unlike like in STAR's, right?

    So, for these tools without CPU option, I can only use the java option -XX:ConcGCThreads and set it to max I have and docker will automatically use those.

    One last question, HaplotypeCaller run on scattered intervals, do they run on multiple cpus per interval? Will I gain in speed by increasing scatter_count from the default of 6?

    Thank you.

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    No problem!

    Your first two assumptions are correct, yes. 

    For your question, each HaplotypeCaller scatter will use a thread. As the scatter count increases, if it exceeds the available threads then each shard will be competing for the CPU resources. Increasing the scatter count would help with speed because you would be running multiple shards at the same time. But if there are too many shards, the CPU becomes a bottle neck and could slow down the workflow.

    Best,

    Genevieve

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk