Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenomicsDBImport Memory and Disks

0

1 comment

  • Avatar
    Gökalp Çelik

    Hi Noah Fields

    We do have suggestions about how much disk space is needed for the number of samples you might have and in summary it scales linearly. 

    https://gatk.broadinstitute.org/hc/en-us/articles/360056138571-GenomicsDBImport-usage-and-performance-guidelines 

    For the memory requirements, having a machine with 32GB of memory should be pretty much enough for the default parameters of GenomicsDBImport (batch size, number of reader threads etc..). 

    Our default recommendation for the workflow in warp is set as below

    --java-options "-Xms8000M -Xmx25000M"

    Rest of the memory should be left for the native code outside of Java. 

    Memory requirements for the import function is mostly dependent on not the number of samples you have but the number of alleles present and it is not a linear function of alleles per se. If you observe OOM errors thrown with default settings you may want to decrease the batch size or increase the heapsize and use a machine or VM with higher amount of memory. 

    I hope this helps. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk