Mutect2 LearnReadOrientationModel Memory Error
Hi,
I am running into a memory problem with the LearnReadOrientationModel in a large Mutect2 scatter job. I am performing multisample calling with 50 8X whole genomes scattered 300 ways.
I was initially getting errors due to the GCOverheadLimit, then due to memory. Am currently running this in a 200 GB memory machine with 140 GB memory for java heap space.
Is there an adjustment to the settings that would make this feasible? Alternatively, would performing this step at the per scatter level (prior to gathering) have a significant disadvantage in terms of the calling results?
Thanks as always for your insight.
---
Workspace: broad-firecloud-ibmwatson/Getz_IBM_Ravi_SeqOnly_WGS_copy_3-3-2020
(already shared with GROUP_FireCloud-Support@firecloud.org)
Submission ID: 93280b2c-4a1b-4a6e-a780-e00a124548a6
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.1.8.1
b) Exact command used: gatk --java-options "-Xmx140G -Xms140G -XX:-UseGCOverheadLimit" LearnReadOrientationModel \
-I ${sep=" -I " orientation_bias_files} \
-O "artifact-priors.tar.gz"
c) Entire error log:
22:59:22.466 INFO LearnReadOrientationModel - Shutting down engine [September 28, 2020 10:59:22 PM GMT] org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel done. Elapsed time: 77.88 minutes. Runtime.totalMemory()=133621612544 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.TreeMap.put(TreeMap.java:577) at htsjdk.samtools.util.Histogram.increment(Histogram.java:146) at htsjdk.samtools.metrics.MetricsFile.read(MetricsFile.java:435) at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel.readMetricsFile(LearnReadOrientationModel.java:296) at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel.lambda$doWork$7(LearnReadOrientationModel.java:96) at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel$$Lambda$53/779511842.apply(Unknown Source) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel.doWork(LearnReadOrientationModel.java:97) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)
-
Arvind Ravi have you tried yet specifying a temporary directory with --tmp-dir?
-
Thanks Genevieve. I take it that allows the task to offload memory to the disk during execution?
I've updated the call as follows but am still running into a memory issue in a 30G mem VM...
Command:
gatk --java-options "-Xmx21G -Xms21G -XX:-UseGCOverheadLimit" LearnReadOrientationModel \
-I ${sep=" -I " orientation_bias_files} \
-O "artifact-priors.tar.gz" \
--tmp-dir ob_tmpError:
02:10:58.629 INFO LearnReadOrientationModel - Shutting down engine [October 2, 2020 2:10:58 AM GMT] org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel done. Elapsed time: 168.95 minutes. Runtime.totalMemory()=20043530240 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1875) at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) at java.lang.Double.parseDouble(Double.java:538) at htsjdk.samtools.util.FormatUtil.parseDouble(FormatUtil.java:141) at htsjdk.samtools.metrics.MetricsFile.read(MetricsFile.java:434) at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel.readMetricsFile(LearnReadOrientationModel.java:296) at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel.lambda$doWork$7(LearnReadOrientationModel.java:96) at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel$$Lambda$53/805561728.apply(Unknown Source) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.broadinstitute.hellbender.tools.walkers.readorientation.LearnReadOrientationModel.doWork(LearnReadOrientationModel.java:97) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) 2020/10/02 02:11:05 Starting delocalization.
-
Arvind Ravi it can help with memory issues if the temporary directory your machine is using does not have a lot of space, or is slow for reading and writing.
Did you re-try your command on the 200 GB memory machine with 140 GB memory for java heap space?
How many files are in your input?
-
Still getting the same error with the larger machine mem settings:
"Elapsed time: 85.25 minutes. Runtime.totalMemory()=133621612544"
There are 50 input files (6X whole genomes) for the call.
Any other suggestions welcome!
-
Arvind Ravi Are you running multiple samples with the same command? This should be run on a single sample.
Please sign in to leave a comment.
5 comments