SVDiscovery OutOfMemoryError.
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: The Genome Analysis Toolkit (GATK) v4.1.4.0
b) Exact command used:
export SV_DIR=/usr/local/svtoolkit
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
reffile=/home/genomics.mj/1.project/2.DOG/1.DOG_WGS/2.DG/0.reference/dog_canFam3.fa
bam_files="input_bam_files.list"
runDir=DG_GenStrip
mkdir -p ${runDir}/logs_discovery || exit 1
java -Xmx100g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVDiscovery.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile ${SV_DIR}/conf/genstrip_parameters.txt \
-R ${reffile} \
-I ${bam_files} \
-md ${runDir}/metadata \
-genderMapFile gender_map \
-runDirectory ${runDir} \
-jobLogDir ${runDir}/logs_discovery \
-O ${runDir}/svdiscovery.dels.vcf \
-minimumSize 100 \
-maximumSize 100000 \
-run \
-l DEBUG 2>&1 | tee debug_discovery.log
c) Entire error log:
ERROR 09:25:51,871 FunctionEdge - Error: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/home/genomics.mj/1.project/2.DOG/1.DOG_WGS/2.DG/OUTPUT/GenomeStrip/DG_5sample/.queue/tmp' '-cp' '/usr/local/svtoolkit/lib/SVToolkit.jar:/usr/local/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/usr/local/svtoolkit/lib/gatk/Queue.jar' '-cp' '/usr/local/svtoolkit/lib/SVToolkit.jar:/usr/local/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/usr/local/svtoolkit/lib/gatk/Queue.jar' 'org.broadinstitute.sv.apps.MergeDiscoveryOutput' '-R' '/home/genomics.mj/1.project/2.DOG/1.DOG_WGS/2.DG/0.reference/dog_canFam3.fa' '-runDirectory' 'DG_GenStrip' '-O' '/home/genomics.mj/1.project/2.DOG/1.DOG_WGS/2.DG/OUTPUT/GenomeStrip/DG_5sample/DG_GenStrip/svdiscovery.dels.unfiltered.vcf'
ERROR 09:25:51,874 FunctionEdge - Contents of /home/genomics.mj/1.project/2.DOG/1.DOG_WGS/2.DG/OUTPUT/GenomeStrip/DG_5sample/DG_GenStrip/logs_discovery/SVDiscovery-3269.out:
INFO 09:17:51,195 HelpFormatter - -------------------------------------------------------------
INFO 09:17:51,197 HelpFormatter - Program Name: org.broadinstitute.sv.apps.MergeDiscoveryOutput
INFO 09:17:51,200 HelpFormatter - Program Args: -R /home/genomics.mj/1.project/2.DOG/1.DOG_WGS/2.DG/0.reference/dog_canFam3.fa -runDirectory DG_GenStrip -O /home/genomics.mj/1.project/2.DOG/1.DOG_WGS/2.DG/OUTPUT/GenomeStrip/DG_5sample/DG_GenStrip/svdiscovery.dels.unfiltered.vcf
INFO 09:17:51,203 HelpFormatter - Executing as genomics.mj@genome on Linux 3.10.0-862.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_232-b09.
INFO 09:17:51,203 HelpFormatter - Date/Time: 2020/11/24 09:17:51
INFO 09:17:51,203 HelpFormatter - -------------------------------------------------------------
INFO 09:17:51,203 HelpFormatter - -------------------------------------------------------------
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at htsjdk.tribble.readers.PositionalBufferedStream.<init>(PositionalBufferedStream.java:51)
at htsjdk.tribble.readers.PositionalBufferedStream.<init>(PositionalBufferedStream.java:46)
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:94)
at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:82)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81)
at org.broadinstitute.sv.util.vcf.VCFReader.<init>(VCFReader.java:46)
at org.broadinstitute.sv.common.RunFileMerger.mergeVCFFilesInternal(RunFileMerger.java:215)
at org.broadinstitute.sv.common.RunFileMerger.mergeVCFOutputFiles(RunFileMerger.java:147)
at org.broadinstitute.sv.discovery.SVDiscoveryMerger.mergePartitions(SVDiscoveryMerger.java:36)
at org.broadinstitute.sv.common.RunFileMerger.merge(RunFileMerger.java:93)
at org.broadinstitute.sv.common.RunFileMerger.merge(RunFileMerger.java:83)
at org.broadinstitute.sv.apps.MergeDiscoveryOutput.run(MergeDiscoveryOutput.java:59)
at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:58)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:31)
at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:27)
at org.broadinstitute.sv.apps.MergeDiscoveryOutput.main(MergeDiscoveryOutput.java:45)
-
Hi latte kim,
To help with runtime or memory usage, try the following:
-
Verify this issue persists with the latest version of GATK.
-
Specify a --tmp-dir that has room for all necessary temporary files.
-
Specify java memory usage using java option -Xmx.
-
Run the gatk command with the gatk wrapper script command line.
-
Check the depth of coverage of your sample at the area of interest.
-
Check memory/disk space availability on your end.
-
-
Hi Genevieve Brandt,
Thanks for the comments.
It seems that by default this program runs with the older version of GATK by importing files from {SV_DIR}/lib/gatk.
As I replaced the path to the new GATK or the GATK jar files, I encounter another Error message as follows:
-------------------------------------------------------------------------------------------------------
SLF4J: The requested version 1.6.99 by your slf4j binding is not compatible with [1.5.5, 1.5.6]
SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further details.
Exception in thread "main" java.lang.NoSuchMethodError: org.reflections.util.ClasspathHelper.forManifest()Ljava/util/Set;
at org.broadinstitute.gatk.utils.classloader.JVMUtils.getClasspathURLs(JVMUtils.java:200)
at org.broadinstitute.gatk.utils.classloader.PluginManager.<clinit>(PluginManager.java:73)
at org.broadinstitute.gatk.queue.engine.QGraph.<init>(QGraph.scala:83)
at org.broadinstitute.gatk.queue.QCommandLine.<init>(QCommandLine.scala:88)
at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:49)
at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)
-------------------------------------------------------------------------------------------------------
Can you please let me know how to run it with the most recent version of GATK? I really appreciate your help!
-
Hi latte kim, I apologize, I missed in your earlier post that you were running GenomeStrip. Please disregard the earlier recommendations I gave, Bob Handsaker can provide more insight into your issue.
-
That job is defaulting to a 2GB java heap size, which is probably not enough for your data set.
There are two ways to turn up the memory:
Easiest is to use the -memLimit argument to Queue, which will set the default memory limit for every java job that doesn't specify an explicit memory requirement. The default is 2 (GB). You could use -memLimit 4, for example, or more if your data set is large. When you rerun Queue, it should only run the failed jobs and any that depend on it.
Second option is to edit the file ${SV_DIR}/qscript/SVQScript.q, which contains the MergeDiscoveryOutput class. If you add a line like
this.javaMemoryLimit = Some(4)
that will set the memory limit for only the MergeDiscoveryOutput jobs to 4 (GB).
-
I added the line to the script, and the discovery was successfully completed!
Thank you very much for your help.
Please sign in to leave a comment.
5 comments