Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenotypeGVCFs hangs at GenomicsDBLibLoader

0

3 comments

  • Avatar
    Dr. Oppenheim

    UPDATE:

    After testing a bunch of things, I have determined that if I run GenotypeGVCFs on a single sample gvcf file, or if I run GenotypeGVCFs on a multi-sample gvcf generated with CombineGVCFs, it works. But if I run it on the same samples combined with GenomicsDBImport, it stalls at the GenomicsDBLibLoader step.

    My SysAdmin reports that it gets stuck at

    futex(0x7f17840b8910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 4009871, NULL, FUTEX_BITSET_MATCH_ANY
     
    It is still the case that using a genomicsdb created with GenomicsDBImport as the -V input to GenotypeGVCFs works fine on other servers.
     
    It is frustrating to have to change my protocol, as my intention was to spread my work across servers, carrying out the exact same procedures on all my data.

    So I am still soliciting advice on this.
     
    Thanks!
    Sara
    0
    Comment actions Permalink
  • Avatar
    Dr. Oppenheim

    Further update:

    Using the same dataset combined using CombineGVCFs, the GenotypeGVCFs job runs fine.

    Can anyone suggest an explanation???

    0
    Comment actions Permalink
  • Avatar
    David Roazen

    Hi Sara Oppenheim,

    I don't believe it's hanging during load of the GenomicsDB library -- the fact that the log message "GenomicsDBLibLoader - GenomicsDB native library version : 1.5.1-84e800e" is output indicates that the library was actually loaded successfully.

    More likely the issue has to do with memory. You are giving 160 GB of memory to Java via the "-Xmx160g" flag, but GenomicsDB is implemented in C/C++ and requires its own memory on top of the Java memory required by GATK. If your machine only has 160 GB of physical memory total, this means that you are leaving no memory for GenomicsDB.

    I'd recommend decreasing your "-Xmx/-Xms" values to give less memory to Java and more memory to GenomicsDB, and then see if the tool is able to run to completion.

    Another thing you could try, if that doesn't work: while the Java process is running and appears to be hung, you can run the "jstack" command ("jstack <gatk_process_id>") to inspect what's going on inside the process and see where it's spending its time. 

    Hope this helps,

    David

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk