GetPileupSummaries Error: java.lang.OutOfMemoryError: Java heap space

Answered

November 16, 2021 13:11

Hi, I am getting an "out of memory" error using GetPileUpSummaries. I already tried with the java setting "-Xmx4G" as suggested in other discussions about this issue. How can I fix this error?

Thank you!

GATK version used:

GATK/4.1.4.1

Exact command used:

gatk --java-options "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true" GetPileupSummaries \
--tmp-dir ${TMP_DIR}
-I ${BAM} \
-V ${VCF} \
-L ${VCF} \
-O ${OUTPUT}

Entire error log:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
at java.base/java.util.ArrayList.grow(ArrayList.java:237)
at java.base/java.util.ArrayList.grow(ArrayList.java:242)
at java.base/java.util.ArrayList.add(ArrayList.java:485)
at java.base/java.util.ArrayList.add(ArrayList.java:498)
at org.broadinstitute.hellbender.utils.IntervalUtils.featureFileToIntervals(IntervalUtils.java:320)
at org.broadinstitute.hellbender.utils.IntervalUtils.parseIntervalArguments(IntervalUtils.java:274)
at org.broadinstitute.hellbender.utils.IntervalUtils.loadIntervals(IntervalUtils.java:226)
at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.parseIntervals(IntervalArgumentCollection.java:174)
at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getTraversalParameters(IntervalArgumentCollection.java:155)
at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getIntervals(IntervalArgumentCollection.java:111)
at org.broadinstitute.hellbender.engine.GATKTool.initializeIntervals(GATKTool.java:513)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:708)
at org.broadinstitute.hellbender.engine.LocusWalker.onStartup(LocusWalker.java:136)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)

11 comments

Genevieve Brandt (she/her)

November 16, 2021 22:00
Hi Lynn Loy,

Are you able to update your version of GATK? The version you are using is older at this point and there could be some changes in the newer version that will help this issue.

Could you also paste your command when using the Xmx option along with the full program log?

Thank you,

Genevieve
0

Comment actions Permalink
Lynn Loy

November 25, 2021 12:23

Edited
Hi Genevieve,

thank you for your help! I took me a while to install the new version and do some tests. Now it seems that loading the vcf file that I use takes very long (i reserved a lot of memory and it did not show any progress for 10 hours). I tested the program with a reduced vcf (10000 lines) and it worked.

How can I use GetPileupSummaries more efficiently? I suppose that it would be useful to use an interval list. At this point, I always used the same vcf resource as input for the "-L" flag. I am unsure which intervals I have to define and in which format they should be.

Thank you!

GATK version:

GATK/4.2.3.0-Java-1.8

Command:

gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' GetPileupSummaries \
--tmp-dir ${SCRATCH_DIR}/Alignment_QC/tmp_dir2/ \
-I ${QC_BAM} \
-V ${GR} \
-L ${GR} \
-O ${PILEUPS}

Log file from run that did not finish after 10 hours:

Using GATK jar /Applic.HPC/Easybuild/skylake/2021a/software/GATK/4.2.3.0-GCCcore-10.3.0-Java-1.8/gatk-package-4.2.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /Applic.HPC/Easybuild/skylake/2021a/software/GATK/4.2.3.0-GCCcore-10.3.0-Java-1.8/gatk-package-4.2.3.0-local.jar GetPileupSummaries --tmp-dir /<path>/tmp_dir2/ -I /<path>/xxx.bam -V /<path>/af-only-gnomad.hg38.vcf.gz -L /<path>/af-only-gnomad.hg38.vcf.gz -O /<path>/xxx_pileups.table
11:19:48.388 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/Applic.HPC/Easybuild/skylake/2021a/software/GATK/4.2.3.0-GCCcore-10.3.0-Java-1.8/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 24, 2021 11:19:49 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:19:49.542 INFO GetPileupSummaries - ------------------------------------------------------------
11:19:49.543 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.3.0
11:19:49.543 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
11:19:49.544 INFO GetPileupSummaries - Executing as xxx on Linux v3.10.0-1160.36.2.el7.x86_64 amd64
11:19:49.544 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_281-b09
11:19:49.544 INFO GetPileupSummaries - Start Date/Time: November 24, 2021 11:19:47 AM GMT
11:19:49.544 INFO GetPileupSummaries - ------------------------------------------------------------
11:19:49.544 INFO GetPileupSummaries - ------------------------------------------------------------
11:19:49.545 INFO GetPileupSummaries - HTSJDK Version: 2.24.1
11:19:49.545 INFO GetPileupSummaries - Picard Version: 2.25.4
11:19:49.545 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
11:19:49.545 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:19:49.545 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:19:49.545 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:19:49.545 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:19:49.545 INFO GetPileupSummaries - Deflater: IntelDeflater
11:19:49.545 INFO GetPileupSummaries - Inflater: IntelInflater
11:19:49.545 INFO GetPileupSummaries - GCS max retries/reopens: 20
11:19:49.545 INFO GetPileupSummaries - Requester pays: disabled
11:19:49.545 INFO GetPileupSummaries - Initializing engine
11:19:50.128 INFO FeatureManager - Using codec VCFCodec to read file file:/<path>/af-only-gnomad.hg38.vcf.gz
11:19:50.381 INFO FeatureManager - Using codec VCFCodec to read file file:/<path>/af-only-gnomad.hg38.vcf.gz
11:48:31.694 WARN IntelInflater - Zero Bytes Written : 0
11:51:25.326 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
11:56:46.504 INFO GetPileupSummaries - Done initializing engine
11:56:46.504 INFO ProgressMeter - Starting traversal
11:56:46.505 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute

slurmstepd: error: *** JOB 9959435 ON r09n25 CANCELLED AT 2021-11-24T22:19:42 DUE TO TIME LIMIT ***
0

Comment actions Permalink
Genevieve Brandt (she/her)

December 01, 2021 04:15
Hi Lynn Loy,

VCF interval files can really slow down the run if they are very large. What resource files are you using? Could you also try running with the -Xmx parameter?

Another alternative is to run this tool chromosome by chromosome then combine the results.

Hope this helps!

Best,

Genevieve
0

Comment actions Permalink
Lynn Loy

December 01, 2021 14:34

Edited
Hi Genevieve,

in the meantime I tried to run the program with a custom interval list and also with a list from the resource bundle. I used a vcf file from the same bucket.

Interval list:

somatic-hg38_CNV_and_centromere_blacklist.hg38liftover.list

VCF file:

af-only-gnomad.hg38.vcf.gz

Custom interval list:

list with the contig names of my reference genome file (which I used for the alignment)

It works with both interval lists. What do you think is the better approach?

Version:

GATK/4.2.3.0-Java-1.8

Command:

gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' GetPileupSummaries \
--tmp-dir tmp/dir/ \
-I aligned.bam \
-V af-only-gnomad.hg38.vcf.gz \
-L somatic-hg38_CNV_and_centromere_blacklist.hg38liftover.list \
-O pileups.table

Thank you!

Lynn
0

Comment actions Permalink
Genevieve Brandt (she/her)

December 02, 2021 02:42
Hi Lynn Loy,

I think with GetPileupSummaries you want to use a specific VCF list containing sites for your intervals.

There's a forum post containing this question and our Mutect2 developer provided an answer, here: https://gatk.broadinstitute.org/hc/en-us/community/posts/360067310872-How-to-find-or-generate-common-germline-variant-sites-VCF-required-by-GetPileupSummaries

Let me know if you have any other questions!

Best,

Genevieve
0

Comment actions Permalink
Ahmad Al Alwash

July 11, 2022 13:53
Hi,

I'm running into the same issue here where I get the error message:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space.

I tried increasing the -Xmx (up to 50G!), and also providing a tmp directory, but it still shows the same error. The VCF file I'm using is af-only-gnomad.hg38.vcf.gz, and the GATK version is v4.2.3.0. The gatk command I'm using is as follows:

gatk GetPileupSummaries --java-options '-Xmx50G' --tmp-dir /scratch/user/ \
-I file.bam \
-V af-only-gnomad.hg38.vcf.gz \
-L af-only-gnomad.hg38.vcf.gz \
-O file.pileup.table
0

Comment actions Permalink
Anthony DiCi

July 21, 2022 15:16
Hi Ahmad Al Alwash,

Thank you for writing to the GATK forum! We hope that we can help you sort this out.

I checked with our developers, and this is a recently fixed bug. Unfortunately, the updated code will not be available until the next version of GATK is released. That said, I can offer you two options to move forward:
1. Wait for the latest version to be released.
2. Clone the GATK repository from our Docker.
The most up-to-date code and any unreleased updates/bug fixes planned for future versions are published on Docker every night.

I hope this helps! Please feel free to reach out with any other questions at any time.

Best,

Anthony
0

Comment actions Permalink
Anitha R

August 21, 2024 08:50

Edited
While doing all this step I am getting this error after running CalculateContamination Table

1. Error:
WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
17:17:17.611 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (12) to segment; using all data points to calculate kernel matrix.
17:17:17.612 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (12). Local changepoint costs will not be calculated for this window size.

1.
{
bed file creation:
zcat file.vcfz |tail -n +12 |awk '{FS="\t";OFS="\t";print $1,$2-1,$2,$3, etc}' > file.bed

GetPileupSummaries:

java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V af-only-gnomad.hg38.vcf.gz -L file.bed -O tumor_pileups.table
java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V af-only-gnomad.hg38.vcf.gz -L file.bed -O normal_pileups.table

CalculateContamination Table:

java -jar gatk-package-4.5.0.0-local.jar CalculateContamination -I tumor_pileups.table -matched normal_pileups.table -O contamination.table

}

2. And also I have tried with this this still I am getting the samme error {KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
13:40:48.284 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
13:40:48.418 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
13:40:48.547 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
13:40:48.600 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
13:40:48.602 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (78) to segment; using all data points to calculate kernel matrix.
}

2.
{
GetPileupSummaries:

java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V somatic-hg38_small_exac_common_3.hg38.vcf.gz -L somatic-hg38_small_exac_common_3.hg38.vcf.gz -O tumor_pileups.table
java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V somatic-hg38_small_exac_common_3.hg38.vcf.gz -L somatic-hg38_small_exac_common_3.hg38.vcf.gz -O normal_pileups.table

CalculateContamination Table:

java -jar gatk-package-4.5.0.0-local.jar CalculateContamination -I tumor_pileups.table -matched normal_pileups.table -O contamination.table

}

3. And also i have tried -L and -V with same vcf file that has been used in the recal table generation. While running this following step I am getting java heap error.[Error: GetPileupSummaries - Shutting down engine
[20 August 2024 at 4:22:50 pm IST] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 21.42 minutes.
Runtime.totalMemory()=377487360
java.lang.OutOfMemoryError: Java heap space
]

3.
{
GetPileupSummaries:

java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V af-only-gnomad.hg38.vcf.gz -L af-only-gnomad.hg38.vcf.gz -O tumor_pileups.table
java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V af-only-gnomad.hg38.vcf.gz -L af-only-gnomad.hg38.vcf.gz -O normal_pileups.table

CalculateContamination Table:

java -jar gatk-package-4.5.0.0-local.jar CalculateContamination -I tumor_pileups.table -matched normal_pileups.table -O contamination.table

}

Can anyone suggest me where I am doing mistake.
0

Comment actions Permalink
Gökalp Çelik

August 21, 2024 09:33

Edited
Hi Anitha R

You have a tail command within the first step of making the bed file. Can you check how many lines are present in your bed file? I am assuming that it may not be enough to call good amount of pileups for calculating the contamination and segmentation. CalculateContamination tool performs a basic copy number prediction for the tumor sample to provide proper prior for germline vs contamination vs somatic mutation filtering.

Also make sure that your data covers enough of those common variants you provide for pileups. Otherwise you will not have enough points to cover for the segmentation. If that is the case you may skip this step totally.

Additionally, It may be better to post these under a new topic which will provide a better context for your problem.

I hope this helps.
0

Comment actions Permalink
Anitha R

August 21, 2024 11:15
The bed file consist of 399171 rows.

chr1 16018 16019 .
chr1 17451 17452 .
chr1 56986 56987 .
chr1 82733 82734 .
chr1 98992 98993 .
0

Comment actions Permalink
Gökalp Çelik

August 21, 2024 12:42
How about your bam files? How many genes are covered by them?
0

Comment actions Permalink