Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK GenotypeGVCFs stuck at starting traversal

1

1 comment

  • Avatar
    David Roazen

    Hi Rômulo Carleial,

    When running this tool with a GenomicsDB database as input, it's important to specify appropriate memory limits for Java in order to leave sufficient free memory for GenomicsDB, which is a native library. Failing to do so can cause significant slowdowns. As an example, if each parallel task has 32 GB of physical memory available, you might try limiting Java to 16 GB so as to leave 16 GB free for GenomicsDB to use. You can limit Java memory usage using the -Xmx argument, which you can pass to GATK like so:

    gatk --java-options "-Xmx16G" ...<rest of command>


    Another argument that can help in a cluster environment is the "--genomicsdb-shared-posixfs-optimizations" argument, which improves GenomicsDB performance for shared Posix filesystems such as NFS and Lustre commonly used in compute clusters.

    Lastly, if these suggestions don't help, you can try splitting up the two problematic intervals into smaller sub-intervals.

    Regards,

    David

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk