Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

genotypeGVCFs won't use more than 2G no matter how much memory I give it...

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Charity Z Goeckeritz

    Does the process continue and import variants properly? Do you observe any error messages that states the process is not complete? 

    GenomicsDBImport does not have to use too much memory in fact the less heapsize you give the better since the GenomicsDB library is written in C/C++ and it uses memory outside of the heapspace so it is not bound by the java VM. 

    I hope this helps.

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Charity Z Goeckeritz

    Hi Gökalp, 

    Thanks for your quick response! This issue is for GenotypeGVCFs - is that what you mean? I guess, either way, my problem still stands. GenomicsDBImport was also giving me the same issue of not using resources given to it as GenotypeGVCFs now is. However, I was able to get GenomicsDBImport to complete after giving it ~5 days of wall time; no such luck with GenotypeGVCFs -  So I really do need to figure out how to get GenotypeGVCFs to use it's allocated memory. Otherwise it will probably need to run for more than a week, despite only trying to genotype 30ish samples per vcf using a reference genome that is about 620 Mb. As far as I can tell, the log doesn't show any errors or issues aside from the syntax issues posted in the opening comment. It just runs until it has no more wall time. If I could just figure out why the program is not using the allocated memory, I probably could get it to finish much more quickly, which is why I initially opened the issue. 

    There is one other WARN message though; it just says 'WARN  InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples', which is odd (I have more than 10 individuals...) but I don't need this calculated anyway. 

    Based on your comment about the java VM, I tried leaving out --java-options "-Xmx32G" from my command but GenotypeGVCFs still won't use more than 3.5G. 

    Any other ideas on what might be going on would be greatly appreciated! Thanks so much.

    Kindly,
    Charity 


    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    Sorry for our late response. What is the ploidy for your samples? Importing and Genotyping steps take more time to finish depending on the number of expected alleles per loci therefore if the whole process seems slow we recommend dividing your calls into multiple shards and genotype them simultaneously in parallel and later combine them into a single call set. More memory won't make the process faster but reducing the number of alleles per loci may. 

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk