Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK-gCNV for WES is taking too long on the server (138 WES samples running for >20 days)

0

2 comments

  • Avatar
    Gökalp Çelik

    Hi S

    We do not support using singularity for container execution. On the other hand each instance of GermlineCNVCaller is designed to use all available threads present unless it is limited by the execution engine. I suggest you to check if this is the case. We usually run our instances with as few cores as possible to prevent overloading of virtual machines. My personal experience is using only 4 to 8 cores is more than enough to run a single GermlineCNVCaller instance. 

    Besides there may also be other reasons for the low performance. 

    1- GermlineCNVCaller uses plenty of memory depending on the number of samples x targets. You may wish to reduce the number of targets to make your compute environment use less memory to complete each task. 

    2- IO performance is important since intermediate files are written to the temporary folder during operation and we recommend setting a large temporary space for THEANO/PYTORCH compilation. 

    I hope these help.

    Regards. 

    1
    Comment actions Permalink
  • Avatar
    S

    Thank you very much Gökalp Çelik 

    Based on your suggestion, I changed the IntervalListTools flag --SCATTER_CONTENT from 30000 to 10000 and that seemed to have a drastic change. A single scatter shard that was running for 20 days got completed in a few hours. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk