GenomicsDBImport Memory and Disks
Hi GATK Team,
I'm curious if there are ballparks suggesting how much memory and disk space should be allocated when running GenomicsDBImport for X gvcfs. Does memory and/or disks scale linearly? How does this depend based on which chromosome interval we are looking at.
Specifically, I'm looking at the ImportGVFs task in this WDL: https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.wdl . Let me know if you have any ways to best optimize disk and memory for this tasks. Thanks!
-
Hi Noah Fields
We do have suggestions about how much disk space is needed for the number of samples you might have and in summary it scales linearly.
For the memory requirements, having a machine with 32GB of memory should be pretty much enough for the default parameters of GenomicsDBImport (batch size, number of reader threads etc..).
Our default recommendation for the workflow in warp is set as below
--java-options "-Xms8000M -Xmx25000M"
Rest of the memory should be left for the native code outside of Java.
Memory requirements for the import function is mostly dependent on not the number of samples you have but the number of alleles present and it is not a linear function of alleles per se. If you observe OOM errors thrown with default settings you may want to decrease the batch size or increase the heapsize and use a machine or VM with higher amount of memory.
I hope this helps.
Regards.
Please sign in to leave a comment.
1 comment