Memory issues when running BaseRecalibrator
Hi run BaseRecalibrator using docker image (latest). My command is like this:
gatk --java-options "-Xms100G -Xmx100G -XX:ParallelGCThreads=4" BaseRecalibrator -I VR0024SA.withoutERCCs.markedDup.splitNcigar.bam -O "VR0024SA.withoutERCCs.markedDup.splitNcigar.baseRecal.table" -R GRCh38.primary_assembly.genome.fa --intervals exome.interval_list --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --tmp-dir .
But I keep getting errors related to memory! I have tried putting it up to Xmx200G but it still gives me memory error. I have tried running Bam file with and without readgroups (made with Picards Add Readgroup tools). I get the error no matter what...
The log is this :
Command exit status:
137
Command output:
(empty)
gatk --java-options "-Xms100G -Xmx100G -XX:ParallelGCThreads=4" BaseRecalibrator -I VR0024SA.withoutERCCs.markedDup.splitNcigar.bam -O "VR0024SA.withoutERCCs.markedDup.splitNcigar.baseRecal.table" -R GRCh38.primary_assembly.genome.fa --intervals exome.interval_list --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --tmp-dir .
Command exit status: 137
Command error:
21:26:56.350 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
21:26:56.350 INFO BaseRecalibrator - Executing as root@72f609facbfe on Linux v5.15.0-91-generic amd64
21:26:56.350 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v17.0.9+9-Ubuntu-122.04
21:26:56.351 INFO BaseRecalibrator - Start Date/Time: March 12, 2024 at 9:26:56 PM GMT
21:26:56.351 INFO BaseRecalibrator - ------------------------------------------------------------
21:26:56.351 INFO BaseRecalibrator - ------------------------------------------------------------
21:26:56.351 INFO BaseRecalibrator - HTSJDK Version: 4.1.0
21:26:56.351 INFO BaseRecalibrator - Picard Version: 3.1.1
21:26:56.351 INFO BaseRecalibrator - Built for Spark Version: 3.5.0
21:26:56.352 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
21:26:56.352 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
21:26:56.352 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
21:26:56.352 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21:26:56.352 INFO BaseRecalibrator - Deflater: IntelDeflater
21:26:56.352 INFO BaseRecalibrator - Inflater: IntelInflater
21:26:56.352 INFO BaseRecalibrator - GCS max retries/reopens: 20
21:26:56.352 INFO BaseRecalibrator - Requester pays: disabled
21:26:56.353 INFO BaseRecalibrator - Initializing engine
21:26:56.603 INFO FeatureManager - Using codec VCFCodec to read file file://1000G_phase1.snps.high_confidence.hg38.vcf.gz
21:26:56.350 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v17.0.9+9-Ubuntu-122.04
21:26:56.351 INFO BaseRecalibrator - Start Date/Time: March 12, 2024 at 9:26:56 PM GMT
21:26:56.351 INFO BaseRecalibrator - ------------------------------------------------------------
21:26:56.351 INFO BaseRecalibrator - ------------------------------------------------------------
21:26:56.351 INFO BaseRecalibrator - HTSJDK Version: 4.1.0
21:26:56.351 INFO BaseRecalibrator - Picard Version: 3.1.1
21:26:56.351 INFO BaseRecalibrator - Built for Spark Version: 3.5.0
21:26:56.352 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
21:26:56.352 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
21:26:56.352 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
21:26:56.352 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21:26:56.352 INFO BaseRecalibrator - Deflater: IntelDeflater
21:26:56.352 INFO BaseRecalibrator - Inflater: IntelInflater
21:26:56.765 INFO FeatureManager - Using codec VCFCodec to read file file://Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
21:38:48.567 INFO BaseRecalibrator - Shutting down engine
[March 12, 2024 at 9:38:48 PM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 11.87 minutes.
21:26:56.352 INFO BaseRecalibrator - Requester pays: disabled
21:26:56.353 INFO BaseRecalibrator - Initializing engine
21:26:56.765 INFO FeatureManager - Using codec VCFCodec to read file file://Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
21:38:48.567 INFO BaseRecalibrator - Shutting down engine
[March 12, 2024 at 9:38:48 PM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 11.87 minutes.
Runtime.totalMemory()=107374182400
java.lang.OutOfMemoryError: Java heap space
at htsjdk.tribble.readers.TabixReader.readInt(TabixReader.java:189)
at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:274)
at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:287)
at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:165)
at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:129)
at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:80)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:245)
at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
at org.broadinstitute.hellbender.engine.ReadWalker.initializeFeatures(ReadWalker.java:72)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:51)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:147)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
Using GATK jar /gatk/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms100G -Xmx100G -XX:ParallelGCThreads=4 -jar /gatk/gatk-package-4.5.0.0-local.jar BaseRecalibrator -I VR0024SA.withoutERCCs.markedDup.splitNcigar.bam -O VR0024SA.withoutERCCs.markedDup.splitNcigar.baseRecal.table -R GRCh38.primary_assembly.genome.fa --intervals exome.interval_list --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --tmp-dir .
I am running only one bam file with ONE sample. Eventually I will have to run multiple bam files (one per sample). Why does it take that much memory? am I doing something wrong here?
Hope you can help!
Br,
Mette
-
BaseRecalibrator does not require too much memory to run. It may be possible that your system does not have enough memory to serve for all the heapsize you requested from it. Just setting up as below should suffice for most users.
--java-options "-Xmx8G
I hope this helps.
-
Hi Gökalp,
Thanks for the reply.I am running the command on aws hpc with more than 1TB of available memory, so I cannot figure out why I get the memory error.
Could it be another thing causing the error? I found a similar post on this subject but the guy "solved" the problem by downloading the reference files again. Why would that cause an issue like this? I have not tried it, but I can see my files should not be corrupt.
Best regards,
Mette -
Hi again.
This could be related to how memory pool is setup in that hpc of yours so I still believe that it is worth a shot to set it up with a low amount of heap space such as 8 to 12G and not modifying parallel GC threads that java sets up for. Average BaseRecalibrator does not use more than 10G of memory for a whole genome operation. If you are interested in accelerating the process you may split the BaseRecalibration process to multiple intervals and run them in parallel and later collect all recalibration reports together using GatherBQSRReports and run ApplyBQSR. Results will be the same as running BaseRecalibrator in whole file vs scatters.
I hope this helps.
Please sign in to leave a comment.
3 comments