GetPileupSummaries Empty File Output
Issue Synopsis: Been trying to run GetPileupSummaries for some time to create a pileup table to use to calculate contamination, all in order to run FilterMutectCalls. I am working with WES that I have followed GATK guidelines for to clean the FASTQs, convert to BAM, then create vcfs with Mutect2 by alignment to the reference genome provided by GATK. I am running in tumor-ony mode. I used --panel-of-normals dbSNP.vcf while running Mutect2 (I think this may be an issue, but I had this issue previously when not using that so unsure). End goal is to run Funconator on the filtered vcf files. Commmands and output log are all below- but when I run this code it runs successfully (no FailCodes which I have encountered before) but the output pileup tables are empty. For the PON I am using the af-only-gnomad.hg38.vcf file from the broad institute online repository.
REQUIRED for all errors and issues:
a) GATK version used: 4.4.0.0
b) Exact command used:
GetPileupSummaries -I ./134/Alignments/134.L1.clean.marked_dup.bam -V ../broad.reference.genome/af-only-gnomad.hg38.vcf -L ../broad.reference.genome/af-only-gnomad.hg38.vcf -O ./134/Pileups/134.L1.pileups.table
c) Entire program log:
Using GATK jar /util/opt/anaconda/deployed-conda-envs/packages/gatk4/envs/gatk4-4.4.0.0/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar defined in environment variable GATK_LOCAL_JAR
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /util/opt/anaconda/deployed-conda-envs/packages/gatk4/envs/gatk4-4.4.0.0/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar GetPileupSummaries -I ./134/Alignments/134.L1.clean.marked_dup.bam -V ../broad.reference.genome/af-only-gnomad.hg38.vcf -L ../broad.reference.genome/af-only-gnomad.hg38.vcf -O ./134/Pileups/134.L1.pileups.table
12:43:39.346 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/util/opt/anaconda/deployed-conda-envs/packages/gatk4/envs/gatk4-4.4.0.0/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
12:43:39.374 INFO GetPileupSummaries - -----------------------------------------------------
12:43:39.376 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
12:43:39.376 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
12:43:39.377 INFO GetPileupSummaries - Executing as willmiklav@c1517.swan.hcc.unl.edu on Linux v4.18.0-477.21.1.el8_8.x86_64 amd64
12:43:39.377 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.7+4-jvmci-23.0-b10
12:43:39.377 INFO GetPileupSummaries - Start Date/Time: January 15, 2024 at 12:43:39 PM CST
12:43:39.377 INFO GetPileupSummaries - ---------------------------------------------------------
12:43:39.377 INFO GetPileupSummaries - ---------------------------------------------------------
12:43:39.378 INFO GetPileupSummaries - HTSJDK Version: 3.0.5
12:43:39.378 INFO GetPileupSummaries - Picard Version: 3.0.0
12:43:39.378 INFO GetPileupSummaries - Built for Spark Version: 3.3.1
12:43:39.378 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:43:39.378 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:43:39.378 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:43:39.378 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:43:39.378 INFO GetPileupSummaries - Deflater: IntelDeflater
12:43:39.379 INFO GetPileupSummaries - Inflater: IntelInflater
12:43:39.379 INFO GetPileupSummaries - GCS max retries/reopens: 20
12:43:39.379 INFO GetPileupSummaries - Requester pays: disabled
12:43:39.379 INFO GetPileupSummaries - Initializing engine
12:43:40.967 INFO FeatureManager - Using codec VCFCodec to read file file:///lustre/work/mahollin/willmiklav/WES/../broad.reference.genome/af-only-gnomad.hg38.vcf
12:43:41.007 INFO FeatureManager - Using codec VCFCodec to read file file:///lustre/work/mahollin/willmiklav/WES/../broad.reference.genome/af-only-gnomad.hg38.vcf
12:49:00.464 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
12:49:06.696 INFO GetPileupSummaries - Done initializing engine
12:49:06.713 INFO ProgressMeter - Starting traversal
12:49:06.714 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
12:53:55.965 INFO GetPileupSummaries - Shutting down engine
[January 15, 2024 at 12:53:55 PM CST] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 10.28 minutes.
Runtime.totalMemory()=21139292160
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at htsjdk.samtools.GenomicIndexUtil.regionToBins(GenomicIndexUtil.java:164)
at htsjdk.samtools.BinningIndexContent.getChunksOverlapping(BinningIndexContent.java:121)
at htsjdk.samtools.CachingBAMFileIndex.getSpanOverlapping(CachingBAMFileIndex.java:75)
at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:930)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:947)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:628)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:413)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:336)
at java.base/java.lang.Iterable.spliterator(Iterable.java:101)
at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1176)
at org.broadinstitute.hellbender.engine.GATKTool.getTransformedReadStream(GATKTool.java:384)
at org.broadinstitute.hellbender.engine.LocusWalker.getAlignmentContextIterator(LocusWalker.java:174)
at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:149)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /util/opt/anaconda/deployed-conda-envs/packages/gatk4/envs/gatk4-4.4.0.0/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar defined in environment variable GATK_LOCAL_JAR
-
Looks like your archnemesis lies with your heap size.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
It is better if you can set java options with higher heapsize to compasate for this tool.
Regards.
-
Thank you very much- can't believe I missed that.
Please sign in to leave a comment.
2 comments