Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect 2 Does not Finish or Create Stats Stops Abruptly without errors

0

13 comments

  • Avatar
    SkyWarrior

    Hi Musaddiq Awan

    Can you confirm whether both bam files have the same sample name imprinted?

    0
    Comment actions Permalink
  • Avatar
    Musaddiq Awan

    No one sample ID is 1 and the other is 2. I added the read groups myself

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    This sort of sudden failure is often caused by running out of memory.  It can result in the process being killed without any message.  It's necessary to leave some memory for non-heap memory also.  

    How much memory is available on your machine / container?

    You should always set Xmx somewhat smaller than your available memory in order to leave space for off heap memory.  (Used by native code called from java as well as your OS/other running software.). 

    For instance if you have 32g available in your container I would probably only specify xmx 28 or so to leave some available for overhead.  

    0
    Comment actions Permalink
  • Avatar
    gabriele tosadori

    Louis Bergelson, i am trying to make mutect2 work. i am running an analysis for somatic mutation with tumor and normal genome (see first post here). I changed the call to mutect to:

    ./gatk --java-options "-Xmx96G" Mutect2 \

    However, I am having problems with java heap memory. I allocated 64gb to Java and the machine i am using has 78gb RAM + swap which, if i understood correctly, should total a 160gb of available RAM. and yet mutect failed because of heap space. I now allocated 96gb of heap space but i feel like this may be more a bug than a memory problem.

    do you have any clue regarding how much memory i may need?

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi gabriele tosadori

    Java VM does not utilize swap as expected so you need to set your memory somewhere between 4 to 8gb normally unless you really need more memory to run. Setting the heap size close to the total amount of ram is not a good practice as it leaves almost no memory to the rest of the system if heap is filled with lots of objects. 

    I hope this helps

    0
    Comment actions Permalink
  • Avatar
    gabriele tosadori

    Gökalp Çelik, i am afraid it doesn't help much. if i set it to 4gb it explodes in no time. even with 32gb it is unable to finish. i also tried 64gb same problem. the machine i am working with has 78gb or ram so i really dont know what the problem is. maybe a wrong version of java?

    0
    Comment actions Permalink
  • Avatar
    gabriele tosadori

    Gökalp Çelik fun fact: even if i use the gatk container provided by gatk itself...it is not working. i had the same error, again:

    > [March 16, 2024 8:48:47 AM UTC] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 976.60 minutes. Runtime.totalMemory()=17783324672 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space...

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Are you trying to run this command in a managed instance in a cluster or a local machine? Runtime total memory shows around 16GBs of memory. This seems like a resource and permissions issue rather than Mutect2 bug. 

    Can you run the following command and send the output?

    free -h 
    0
    Comment actions Permalink
  • Avatar
    gabriele tosadori

    i am running it in a local machine. this is the output you requested:

                         total        used        free        shared    buff/cache   available
    Mem:            78Gi        20Gi        14Gi       142Mi        44Gi        57Gi
    Swap:          179Gi        15Gi       164Gi

     

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    gabriele tosadori

    Based on the free stats you posted it looks like you're allocating more memory to java than you have available.  I believe "available" is the amount of memory you are able to allocate at the moment.  I would never intentionally rely on swap space for memory as it is catastrophically slow.  (Maybe you can get away with it in some cases, but in my experience as soon as you get into swap you're basically ground to a halt.). It's useful to be able to offload things that are not being actively used in cases where you need a lot of memory, but it's really not a good idea to treat it like ram.  

    Even so, it's weird that you're hitting memory issues with 32gb.  Mutect2 usually runs with much less that that as far as I understand.

    I noticed in the stack trace that you posted that it seems to be running out of memory while loading the sequence dictionary from your reference.  It's possible that that is a red herring, but it's a bit strange.  Can you tell me about your reference file?  Is it a standard human reference? Is it another organism?  Does it have a very large number of contigs in it?  I'm trying to figure out what's special about your data that it's using so much memory.  

    One common cause of memory issues in our tools is due to sites that have a large number of different alleles or very high ploidy organisms.  Some of the calculations scale super-linearly with the number of alleles/ploidy number.  

    0
    Comment actions Permalink
  • Avatar
    gabriele tosadori

    Louis Bergelson, the organism is actually the hamster (cho cells). I dont think it is a complex genome, but i may be wrong. Not sure if it is connected but the files provided by ncbi have some issues (for instance i am trying to build a custom database using snpeff but nothing i tried works, and i am just using the base files downloaded from ncbi). not sure if it may be related. maybe mutect gets lost within ncbi problematic annotations?

    i tried assembling with the ensembl version of the genome, too, though. and the problem still stands.

    it is also probably worth mentioning that for the java options i tried several configurations: skipping the --java-option parameter, setting it a 4gb, 8gb, and up to 64gb. nothing worked.

     

     

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    I  would expect Hamsters have pretty standard mammalian diploid genomes.  It looks like you're references have a lot of contigs.  250k/100k are many more than we usually use with a human genome.  It's possible that that's causing unexpected memory pressure. 

    Is it possible to find a higher quality assembly with fewer longer contigs?   This paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6045439/)  talks about some different Chinese hamster genome versions so maybe there is something out there.  That's definitely a longshot/huge inconvenience though.  If you're really desperate it's one thing to try.  

    In the meantime I'm looking into possible memory leaks but I'm not sure if / when I'll find something.  

    0
    Comment actions Permalink
  • Avatar
    gabriele tosadori

    Louis Bergelson, thanks for your replies and thanks for the reference. however, i am not sure whether we can actually use that genome, or others that are available, for the task. i am working with choK1 and i guess that if i am to make a comparison, i should use the genome that best represents the cell type i am working with.

     

     

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk