Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CollectAllelicCounts- Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

Answered
0

12 comments

  • 0
    Comment actions Permalink
  • Avatar
    rahelp

    Hi Genevieve!

    Thank you for these links. Increasing memory helped, now I can run it. However, the whole process is still extremely slow- it takes ca 20 hours to complete one sample. How long does it usually run? Is there any way to make the process faster?

    Thank you for your help.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    rahelp when it is running, can you determine if it is running as normal and then slows down at some point? If so, there could still be an issue where you are running out of memory and so the tool is using a lot of read/write instead of storing the data in memory which really slows the tool down.

    It also can really help to use a temporary directory that is fast for reading and writing. 

    0
    Comment actions Permalink
  • Avatar
    rahelp

    Dear Genevieve,

    I have tried creating a tmp folder. It did not seem to help much. I am using the latest gatk version on docker.

    gatk --java-options "-Xmx200g" CollectAllelicCounts     -L targets_C.preprocessed.interval_list     -I 3.bam     -R Homo_sapiens_assembly19.fasta   --tmp-dir tmp  -O 3.allelicCounts.tsv

    Using GATK jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx200g -jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar CollectAllelicCounts -L targets_C.preprocessed.interval_list -I 3.bam -R Homo_sapiens_assembly19.fasta --tmp-dir tmp -O 3.allelicCounts.tsv

    08:11:22.249 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so

    08:11:22.428 INFO  CollectAllelicCounts - ------------------------------------------------------------

    08:11:22.428 INFO  CollectAllelicCounts - The Genome Analysis Toolkit (GATK) v4.1.9.0-SNAPSHOT

    08:11:22.428 INFO  CollectAllelicCounts - For support and documentation go to https://software.broadinstitute.org/gatk/

    08:11:22.429 INFO  CollectAllelicCounts - Executing as root@f64f5651979b on Linux v5.10.25-linuxkit amd64

    08:11:22.429 INFO  CollectAllelicCounts - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08

    08:11:22.429 INFO  CollectAllelicCounts - Start Date/Time: May 20, 2021 8:11:22 AM GMT

    08:11:22.429 INFO  CollectAllelicCounts - ------------------------------------------------------------

    08:11:22.429 INFO  CollectAllelicCounts - ------------------------------------------------------------

    08:11:22.430 INFO  CollectAllelicCounts - HTSJDK Version: 2.23.0

    08:11:22.430 INFO  CollectAllelicCounts - Picard Version: 2.23.3

    08:11:22.430 INFO  CollectAllelicCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    08:11:22.430 INFO  CollectAllelicCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    08:11:22.430 INFO  CollectAllelicCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    08:11:22.430 INFO  CollectAllelicCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    08:11:22.430 INFO  CollectAllelicCounts - Deflater: IntelDeflater

    08:11:22.430 INFO  CollectAllelicCounts - Inflater: IntelInflater

    08:11:22.430 INFO  CollectAllelicCounts - GCS max retries/reopens: 20

    08:11:22.430 INFO  CollectAllelicCounts - Requester pays: disabled

    08:11:22.430 INFO  CollectAllelicCounts - Initializing engine

    08:11:22.672 INFO  FeatureManager - Using codec IntervalListCodec to read file file:///gatk/USZ_melanoma/reordered/targets_C.preprocessed.interval_list

    08:11:29.414 INFO  IntervalArgumentCollection - Processing 2865216360 bp from intervals

    08:11:29.419 INFO  CollectAllelicCounts - Done initializing engine

    08:11:29.425 INFO  CollectAllelicCounts - Collecting allelic counts...

    08:11:29.425 INFO  ProgressMeter - Starting traversal

    08:11:29.425 INFO  ProgressMeter -        Current Locus  Elapsed Minutes        Loci Processed      Loci/Minute

    08:11:29.690 WARN  AllelicCountCollector - The reference position at 1:177418-177418 has an unknown base call (value: N). Skipping...

    08:11:29.690 WARN  AllelicCountCollector - The reference position at 1:177419-177419 has an unknown base call (value: N). Skipping...

    08:11:29.690 WARN  AllelicCountCollector - The reference position at 1:177420-177420 has an unknown base call (value: N). Skipping...

    ———

    21:02:14.433 INFO  ProgressMeter -          14:68146000            770.8            2147261000        2785936.6

    21:02:14.696 INFO  CollectAllelicCounts - Shutting down engine

    [May 20, 2021 9:02:14 PM GMT] org.broadinstitute.hellbender.tools.copynumber.CollectAllelicCounts done. Elapsed time: 770.88 minutes.

    Runtime.totalMemory()=203279040512

    Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit

    at java.util.Arrays.copyOf(Arrays.java:3181)

    at java.util.ArrayList.grow(ArrayList.java:265)

    at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)

    at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)

    at java.util.ArrayList.add(ArrayList.java:462)

    at org.broadinstitute.hellbender.tools.copynumber.datacollection.AllelicCountCollector.collectAtLocus(AllelicCountCollector.java:72)

    at org.broadinstitute.hellbender.tools.copynumber.CollectAllelicCounts.apply(CollectAllelicCounts.java:163)

    at org.broadinstitute.hellbender.engine.LocusWalker.lambda$traverse$0(LocusWalker.java:162)

    at org.broadinstitute.hellbender.engine.LocusWalker$$Lambda$109/1868366224.accept(Unknown Source)

    at java.util.Iterator.forEachRemaining(Iterator.java:116)

    at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:160)

    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

     

    Now I get a different error: Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit. Also, the process is still very slow. Could checking read depth help with this issue?

    Thank you for your help!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    This is another memory issue, how much memory do you have available on your machine? Do you have 200 GB available as you requested -Xmx200g?

    0
    Comment actions Permalink
  • Avatar
    rahelp

    Dear Genevieve,

    I checked the memory and both computer and docker have enough memory (over 200 GB). Also, the whole process is still VERY slow, creating a tmp folder did not help much.

    Thank you for your help!

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Hi rahelp,

    You should not be collecting allelic counts at every site contained in your preprocessed intervals. Rather, you should provide a (typically much smaller) set of common variant sites as input to `-L`. See discussion in the tutorial at https://gatk.broadinstitute.org/hc/en-us/articles/360035890011--How-to-part-II-Sensitively-detect-copy-ratio-alterations-and-allelic-segments.

    0
    Comment actions Permalink
  • Avatar
    rahelp

    Dear Samuel,

    Thank you for your reply. Could you send me a link on how to collect common variant sites? I was unable to find it from the link you provided before.

    I apologize if there is something that I missed.

    Thank you so much for your help/

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi rahelp,

    Here are a couple of links that may help you. Please let me know if this does not answer your question.

    https://gatk.broadinstitute.org/hc/en-us/articles/360035531132 

    https://gatk.broadinstitute.org/hc/en-us/articles/360035531092--How-to-part-I-Sensitively-detect-copy-ratio-alterations-and-allelic-segments

    Kind regards,

    Pamela

    1
    Comment actions Permalink
  • Avatar
    williamholding

    The java.lang.OutOfMemoryError means that your program needs more memory than your Java Virtual Machine (JVM) allowed it to use.

    How to Track the error?

    • Increase the default memory your program is allowed to use using the -Xmx option (for instance for 1024 MB: -Xmx1024m). By default, the values are based on the JRE version and system configuration. NOTE: Increasing the heap size is a bad solution, 100% temporary, because you will hit the same issue if you get several parallel requests or when you try to process a bigger file.
    • Find the root cause of memory leaks with help of profiling tools like MAT, Visual VM , jconsole etc. Once you find the root cause, You can fix this memory leaks.
    • Optimize your code so that it needs less memory, using less big data structures and getting rid of objects that are not any more used at some point in your program.

    How to avoid this issue?

    • Use local variables wherever possible.
    • Release those objects which you think shall not be needed further.
    • Avoid creation of objects in your loop each time.
    • Try to use caches.
    • Try to move with Multy Threading.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you for the insight williamholding!

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Thanks everyone for contributing to this thread!

    Just to make sure this doesn't get lost in the shuffle---if you use -L to run this tool just over common variant sites (typically around ~1-10M sites for human WGS) as intended, you shouldn't run into memory issues. If you try to run it over the whole genome (~3B sites), you definitely will!

    Hopefully all of the resources linked above will sufficiently provide additional details, if desired.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk