java.lang.OutOfMemoryError using BaseRecalibrator
AnsweredUpdate
The error has been solved by redownloading Mills_and_1000G_gold_standard.indels.hg38.vcf.gz(.tbi)
but the error message is distracting anyway.
----------------------------------------------------------------
GATK version used: gatk-4.2.4.0
I met an java.lang.OutOfMemoryError when using BaseRecalibrator. The error looks like the one posted here. So I tried to increase -Xmx to 60G but the error persisted.
The code I used first is, which included three 'know sites' files from the GATK resource bundle:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 \
-jar ~/bin/gatk-4.2.4.0/gatk-package-4.2.4.0-local.jar BaseRecalibrator \
-R ~/resource/Homo_sapiens_assembly38.fasta \
-I ~/markdup/N0.markdup.small.bam \
--known-sites ~/resource/Homo_sapiens_assembly38.dbsnp138.vcf.gz \
--known-sites ~/resource/Homo_sapiens_assembly38.known_indels.vcf.gz \
--known-sites ~/resource/Homo_sapiens_assembly38.known_indels.vcf.gz \
-O ~/bqsr/N0.recal.table
The entire log is:
$ gatk BaseRecalibrator -R /public/home/lijing/wangzw/wes0619/resource/Homo_sapiens_assembly38.fasta -I /public/home/lijing/wangzw/wes_blca/markdup/N0.markdup.small.bam --known-sites /public/home/lijing/wangzw/wes0619/resource/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -O /public/home/lijing/wangzw/wes_blca/bqsr/N0.recal.table
Using GATK jar /public/home/lijing/wangzw/wes_blca/bin/gatk-4.2.4.0/gatk-package-4.2.4.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /public/home/lijing/wangzw/wes_blca/bin/gatk-4.2.4.0/gatk-package-4.2.4.0-local.jar BaseRecalibrator -R /public/home/lijing/wangzw/wes0619/resource/Homo_sapiens_assembly38.fasta -I /public/home/lijing/wangzw/wes_blca/markdup/N0.markdup.small.bam --known-sites /public/home/lijing/wangzw/wes0619/resource/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -O /public/home/lijing/wangzw/wes_blca/bqsr/N0.recal.table
23:32:11.151 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/home/lijing/wangzw/wes_blca/bin/gatk-4.2.4.0/gatk-package-4.2.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 05, 2022 11:32:11 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
23:32:11.380 INFO BaseRecalibrator - ------------------------------------------------------------
23:32:11.381 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.4.0
23:32:11.381 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
23:32:11.381 INFO BaseRecalibrator - Executing as lijing@gateway2 on Linux v2.6.32-431.el6.x86_64 amd64
23:32:11.381 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_312-b07
23:32:11.382 INFO BaseRecalibrator - Start Date/Time: January 5, 2022 11:32:11 PM CST
23:32:11.382 INFO BaseRecalibrator - ------------------------------------------------------------
23:32:11.382 INFO BaseRecalibrator - ------------------------------------------------------------
23:32:11.383 INFO BaseRecalibrator - HTSJDK Version: 2.24.1
23:32:11.383 INFO BaseRecalibrator - Picard Version: 2.25.4
23:32:11.383 INFO BaseRecalibrator - Built for Spark Version: 2.4.5
23:32:11.383 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
23:32:11.383 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
23:32:11.383 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
23:32:11.383 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
23:32:11.383 INFO BaseRecalibrator - Deflater: IntelDeflater
23:32:11.384 INFO BaseRecalibrator - Inflater: IntelInflater
23:32:11.384 INFO BaseRecalibrator - GCS max retries/reopens: 20
23:32:11.384 INFO BaseRecalibrator - Requester pays: disabled
23:32:11.384 INFO BaseRecalibrator - Initializing engine
23:32:12.244 INFO FeatureManager - Using codec VCFCodec to read file file:///public/home/lijing/wangzw/wes0619/resource/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
23:32:19.569 WARN IntelInflater - Zero Bytes Written : 0
23:32:35.636 INFO BaseRecalibrator - Shutting down engine
[January 5, 2022 11:32:35 PM CST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.41 minutes.
Runtime.totalMemory()=11849957376
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:705)
at java.util.HashMap.putVal(HashMap.java:630)
at java.util.HashMap.put(HashMap.java:613)
at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:251)
at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:287)
at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:165)
at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:129)
at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:80)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:246)
at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:209)
at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:156)
at org.broadinstitute.hellbender.engine.ReadWalker.initializeFeatures(ReadWalker.java:72)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:51)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Then I tried to debug by assigning one ’know sites‘ file each time, and found that the error was recapitulated when I used CODE 3, Mills_and_1000G_gold_standard.indels.hg38.vcf.gz. While CODE 1 & CODE 2 worked just fine.
# CODE 1
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 \
-jar ~/bin/gatk-4.2.4.0/gatk-package-4.2.4.0-local.jar BaseRecalibrator \
-R ~/resource/Homo_sapiens_assembly38.fasta \
-I ~/markdup/N0.markdup.small.bam \
--known-sites ~/resource/Homo_sapiens_assembly38.dbsnp138.vcf.gz \
-O ~/bqsr/N0.recal.table
# CODE 2
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 \
-jar ~/bin/gatk-4.2.4.0/gatk-package-4.2.4.0-local.jar BaseRecalibrator \
-R ~/resource/Homo_sapiens_assembly38.fasta \
-I ~/markdup/N0.markdup.small.bam \
--known-sites ~/resource/Homo_sapiens_assembly38.known_indels.vcf.gz \
-O ~/bqsr/N0.recal.table
# CODE 3
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 \
-jar ~/bin/gatk-4.2.4.0/gatk-package-4.2.4.0-local.jar BaseRecalibrator \
-R ~/resource/Homo_sapiens_assembly38.fasta \
-I ~/markdup/N0.markdup.small.bam \
--known-sites ~/resource/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
-O ~/bqsr/N0.recal.table
-
Thank you for posting this solution WangZiwei! Did you ever determine why the original file was causing the error? If you identified the exact cause, then I could put in a feature request to add a check for that type of problem.
I'm glad you were able to find a solution!
-
Hello Genevieve Brandt (she/her)
perhaps Mills_and_1000G_gold_standard.indels.hg38.vcf.gz(.tbi) caused the problem, because when I redownloaded it from GATK resource bundle, the error just dissapeared.
I guess the file might have been broken, which is a little bit strange since the file has been used before.
The solution is not expected so I forgot to keep the original file for further examination. Sorry for that.
Regards,
Wang.
-
That makes sense. I'm glad you got it fixed quickly!
Please sign in to leave a comment.
3 comments