Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GetPileupSummaries Error: java.lang.OutOfMemoryError: Java heap space

Answered
0

11 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Lynn Loy,

    Are you able to update your version of GATK? The version you are using is older at this point and there could be some changes in the newer version that will help this issue.

    Could you also paste your command when using the Xmx option along with the full program log?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Lynn Loy

    Hi Genevieve,

    thank you for your help! I took me a while to install the new version and do some tests. Now it seems that loading the vcf file that I use takes very long (i reserved a lot of memory and it did not show any progress for 10 hours). I tested the program with a reduced vcf (10000 lines) and it worked.

    How can I use GetPileupSummaries more efficiently? I suppose that it would be useful to use an interval list. At this point, I always used the same vcf resource as input for the "-L" flag. I am unsure which intervals I have to define and in which format they should be.

    Thank you!

     

    GATK version:

    GATK/4.2.3.0-Java-1.8

     

    Command:

    gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' GetPileupSummaries \
    --tmp-dir ${SCRATCH_DIR}/Alignment_QC/tmp_dir2/ \
    -I ${QC_BAM} \
    -V ${GR} \
    -L ${GR} \
    -O ${PILEUPS}

     

    Log file from run that did not finish after 10 hours:

    Using GATK jar /Applic.HPC/Easybuild/skylake/2021a/software/GATK/4.2.3.0-GCCcore-10.3.0-Java-1.8/gatk-package-4.2.3.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /Applic.HPC/Easybuild/skylake/2021a/software/GATK/4.2.3.0-GCCcore-10.3.0-Java-1.8/gatk-package-4.2.3.0-local.jar GetPileupSummaries --tmp-dir /<path>/tmp_dir2/ -I /<path>/xxx.bam -V /<path>/af-only-gnomad.hg38.vcf.gz -L /<path>/af-only-gnomad.hg38.vcf.gz -O /<path>/xxx_pileups.table
    11:19:48.388 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/Applic.HPC/Easybuild/skylake/2021a/software/GATK/4.2.3.0-GCCcore-10.3.0-Java-1.8/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Nov 24, 2021 11:19:49 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    11:19:49.542 INFO GetPileupSummaries - ------------------------------------------------------------
    11:19:49.543 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.3.0
    11:19:49.543 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
    11:19:49.544 INFO GetPileupSummaries - Executing as xxx on Linux v3.10.0-1160.36.2.el7.x86_64 amd64
    11:19:49.544 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_281-b09
    11:19:49.544 INFO GetPileupSummaries - Start Date/Time: November 24, 2021 11:19:47 AM GMT
    11:19:49.544 INFO GetPileupSummaries - ------------------------------------------------------------
    11:19:49.544 INFO GetPileupSummaries - ------------------------------------------------------------
    11:19:49.545 INFO GetPileupSummaries - HTSJDK Version: 2.24.1
    11:19:49.545 INFO GetPileupSummaries - Picard Version: 2.25.4
    11:19:49.545 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
    11:19:49.545 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    11:19:49.545 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    11:19:49.545 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    11:19:49.545 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    11:19:49.545 INFO GetPileupSummaries - Deflater: IntelDeflater
    11:19:49.545 INFO GetPileupSummaries - Inflater: IntelInflater
    11:19:49.545 INFO GetPileupSummaries - GCS max retries/reopens: 20
    11:19:49.545 INFO GetPileupSummaries - Requester pays: disabled
    11:19:49.545 INFO GetPileupSummaries - Initializing engine
    11:19:50.128 INFO FeatureManager - Using codec VCFCodec to read file file:/<path>/af-only-gnomad.hg38.vcf.gz
    11:19:50.381 INFO FeatureManager - Using codec VCFCodec to read file file:/<path>/af-only-gnomad.hg38.vcf.gz
    11:48:31.694 WARN IntelInflater - Zero Bytes Written : 0
    11:51:25.326 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
    11:56:46.504 INFO GetPileupSummaries - Done initializing engine
    11:56:46.504 INFO ProgressMeter - Starting traversal
    11:56:46.505 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute

    slurmstepd: error: *** JOB 9959435 ON r09n25 CANCELLED AT 2021-11-24T22:19:42 DUE TO TIME LIMIT ***

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Lynn Loy,

    VCF interval files can really slow down the run if they are very large. What resource files are you using? Could you also try running with the -Xmx parameter

    Another alternative is to run this tool chromosome by chromosome then combine the results.

    Hope this helps!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Lynn Loy

    Hi Genevieve,

    in the meantime I tried to run the program with a custom interval list and also with a list from the resource bundle. I used a vcf file from the same bucket.

    Interval list:

    somatic-hg38_CNV_and_centromere_blacklist.hg38liftover.list

    VCF file:

    af-only-gnomad.hg38.vcf.gz

    Custom interval list:

    list with the contig names of my reference genome file (which I used for the alignment)

     

    It works with both interval lists. What do you think is the better approach?

     

    Version:

    GATK/4.2.3.0-Java-1.8

    Command:

    gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' GetPileupSummaries \
    --tmp-dir tmp/dir/ \
    -I aligned.bam \
    -V af-only-gnomad.hg38.vcf.gz \
    -L somatic-hg38_CNV_and_centromere_blacklist.hg38liftover.list \
    -O pileups.table

     

    Thank you!

    Lynn

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Lynn Loy,

    I think with GetPileupSummaries you want to use a specific VCF list containing sites for your intervals.

    There's a forum post containing this question and our Mutect2 developer provided an answer, here: https://gatk.broadinstitute.org/hc/en-us/community/posts/360067310872-How-to-find-or-generate-common-germline-variant-sites-VCF-required-by-GetPileupSummaries

    Let me know if you have any other questions!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Ahmad Al Alwash

    Hi,

    I'm running into the same issue here where I get the error message:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space.

    I tried increasing the -Xmx (up to 50G!), and also providing a tmp directory, but it still shows the same error. The VCF file I'm using is af-only-gnomad.hg38.vcf.gz, and the GATK version is v4.2.3.0. The gatk command I'm using is as follows:

    gatk GetPileupSummaries --java-options '-Xmx50G' --tmp-dir /scratch/user/ \
        -I file.bam \
        -V af-only-gnomad.hg38.vcf.gz \
        -L af-only-gnomad.hg38.vcf.gz \
        -O file.pileup.table

    0
    Comment actions Permalink
  • Avatar
    Anthony DiCi

    Hi Ahmad Al Alwash,

    Thank you for writing to the GATK forum! We hope that we can help you sort this out.

    I checked with our developers, and this is a recently fixed bug. Unfortunately, the updated code will not be available until the next version of GATK is released. That said, I can offer you two options to move forward:

    1. Wait for the latest version to be released.
    2. Clone the GATK repository from our Docker.

    The most up-to-date code and any unreleased updates/bug fixes planned for future versions are published on Docker every night.

    I hope this helps! Please feel free to reach out with any other questions at any time.

    Best,

    Anthony

    0
    Comment actions Permalink
  • Avatar
    Anitha R

    While doing all this step I am getting this error after running CalculateContamination Table

    1. Error:  
    WARN  KernelSegmenter - No changepoint candidates were found.  The specified window sizes may be inappropriate, or there may be insufficient data points.
    17:17:17.611 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (12) to segment; using all data points to calculate kernel matrix.
    17:17:17.612 WARN  KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (12).  Local changepoint costs will not be calculated for this window size.

    1.
    {  
    bed file creation:
    zcat file.vcfz |tail -n +12 |awk '{FS="\t";OFS="\t";print $1,$2-1,$2,$3, etc}' > file.bed

    GetPileupSummaries:

    java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V af-only-gnomad.hg38.vcf.gz -L file.bed -O tumor_pileups.table
    java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V af-only-gnomad.hg38.vcf.gz -L file.bed -O normal_pileups.table

    CalculateContamination Table:

    java -jar gatk-package-4.5.0.0-local.jar CalculateContamination -I tumor_pileups.table -matched normal_pileups.table -O contamination.table

    }

    2. And also I have tried with this this still I am getting the samme error {KernelSegmenter - No changepoint candidates were found.  The specified window sizes may be inappropriate, or there may be insufficient data points.
    13:40:48.284 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    13:40:48.418 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    13:40:48.547 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    13:40:48.600 INFO  KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
    13:40:48.602 WARN  KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (78) to segment; using all data points to calculate kernel matrix.
    }


    2.
    {
    GetPileupSummaries:

    java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V somatic-hg38_small_exac_common_3.hg38.vcf.gz -L somatic-hg38_small_exac_common_3.hg38.vcf.gz -O tumor_pileups.table
    java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V somatic-hg38_small_exac_common_3.hg38.vcf.gz -L somatic-hg38_small_exac_common_3.hg38.vcf.gz -O normal_pileups.table

    CalculateContamination Table:

    java -jar gatk-package-4.5.0.0-local.jar CalculateContamination -I tumor_pileups.table -matched normal_pileups.table -O contamination.table

    }

    3. And also i have tried -L and -V with same vcf file that has been used in the recal table generation. While running this following step I am getting java heap error.[Error: GetPileupSummaries - Shutting down engine
    [20 August 2024 at 4:22:50 pm IST] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 21.42 minutes.
    Runtime.totalMemory()=377487360
    java.lang.OutOfMemoryError: Java heap space
    ]

    3.
    {
    GetPileupSummaries:

    java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V af-only-gnomad.hg38.vcf.gz -L af-only-gnomad.hg38.vcf.gz -O tumor_pileups.table
    java -jar gatk-package-4.5.0.0-local.jar GetPileupSummaries -I tumor_bqsr.bam -V af-only-gnomad.hg38.vcf.gz -L af-only-gnomad.hg38.vcf.gz -O normal_pileups.table

    CalculateContamination Table:

    java -jar gatk-package-4.5.0.0-local.jar CalculateContamination -I tumor_pileups.table -matched normal_pileups.table -O contamination.table

    }

    Can anyone suggest me where I am doing mistake.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Anitha R

    You have a tail command within the first step of making the bed file. Can you check how many lines are present in your bed file? I am assuming that it may not be enough to call good amount of pileups for calculating the contamination and segmentation. CalculateContamination tool performs a basic copy number prediction for the tumor sample to provide proper prior for germline vs contamination vs somatic mutation filtering. 

    Also make sure that your data covers enough of those common variants you provide for pileups. Otherwise you will not have enough points to cover for the segmentation. If that is the case you may skip this step totally. 

    Additionally, It may be better to post these under a new topic which will provide a better context for your problem. 

    I hope this helps.

    0
    Comment actions Permalink
  • Avatar
    Anitha R

    The bed file consist of 399171 rows.

    chr1    16018    16019    .    
    chr1    17451    17452    .    
    chr1    56986    56987    .    
    chr1    82733    82734    .    
    chr1    98992    98993    .    

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    How about your bam files? How many genes are covered by them?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk