java.lang.OutofMemoryError: GC overhead limit exceeded
AnsweredIf you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.1.9
b) Exact command used:
Master run file:
bash ../src/Mutect_run.sh "$patient"_exT_rmdup.bam "$patient"_exH_rmdup.bam mutecto -Xmx75g --active-probability-threshold .0015 &
Mutect script:
$gatk GetPileupSummaries -I ../$1 -V /users/aas4579/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -L /users/aas4579/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -O tumorgetpileupsummaries.table
c) Entire error log:
10:20:57.791 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/users/aas4579/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 28, 2021 10:20:57 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:20:57.970 INFO GetPileupSummaries - ------------------------------------------------------------
10:20:57.970 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.1.9.0
10:20:57.970 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
10:20:57.971 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
10:20:57.971 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
10:20:57.971 INFO GetPileupSummaries - Start Date/Time: April 28, 2021 10:20:57 AM CDT
10:20:57.971 INFO GetPileupSummaries - ------------------------------------------------------------
10:20:57.971 INFO GetPileupSummaries - ------------------------------------------------------------
10:20:57.972 INFO GetPileupSummaries - HTSJDK Version: 2.23.0
10:20:57.972 INFO GetPileupSummaries - Picard Version: 2.23.3
10:20:57.972 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:20:57.972 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:20:57.972 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:20:57.972 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:20:57.972 INFO GetPileupSummaries - Deflater: IntelDeflater
10:20:57.972 INFO GetPileupSummaries - Inflater: IntelInflater
10:20:57.972 INFO GetPileupSummaries - GCS max retries/reopens: 20
10:20:57.972 INFO GetPileupSummaries - Requester pays: disabled
10:20:57.972 INFO GetPileupSummaries - Initializing engine
10:20:58.447 INFO FeatureManager - Using codec VCFCodec to read file file:///users/aas4579/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
10:21:01.364 INFO FeatureManager - Using codec VCFCodec to read file file:///users/aas4579/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
11:01:38.324 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
11:04:23.835 INFO GetPileupSummaries - Done initializing engine
11:04:23.836 INFO ProgressMeter - Starting traversal
11:04:23.836 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
21:03:40.149 INFO GetPileupSummaries - Shutting down engine
[April 28, 2021 9:03:40 PM CDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 642.71 minutes.
Runtime.totalMemory()=23432003584
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at java.util.ArrayList.grow(ArrayList.java:261)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
at java.util.ArrayList.add(ArrayList.java:458)
at htsjdk.samtools.BinningIndexContent.getChunksOverlapping(BinningIndexContent.java:131)
at htsjdk.samtools.CachingBAMFileIndex.getSpanOverlapping(CachingBAMFileIndex.java:75)
at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:935)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:405)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:125)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:66)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:407)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:331)
at java.lang.Iterable.spliterator(Iterable.java:101)
at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1099)
at org.broadinstitute.hellbender.engine.GATKTool.getTransformedReadStream(GATKTool.java:380)
at org.broadinstitute.hellbender.engine.LocusWalker.getAlignmentContextIterator(LocusWalker.java:182)
at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:157)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
If not an error, choose a category for your question(REQUIRED):
a)How do I (......)? fix this? I saw a similar post https://gatk.broadinstitute.org/hc/en-us/community/posts/360072627111-HaplotypeCaller-Exception-in-thread-main-java-lang-OutOfMemoryError-Java-Heap-Size-Allele-Specific-Annotation however I tried their suggestion and it failed to fix the memory problem
-
Hi Jack Prazich,
To help with runtime or memory usage, try the following:
-
Check memory/disk space availability on your end.
-
Specify java memory usage using java option -Xmx.
-
Specify a --tmp-dir that has room for all necessary temporary files.
-
Verify this issue persists with the latest version of GATK.
-
Check the depth of coverage of your sample at the area of interest.
-
-
You have to specify the heap size whenever you run your program. If you are executing on the command line, whenever you execute using "java " include a parameter: "-Xmx4g -Xmx4g" or whatever you want your heap size to be. The flag Xmx specifies the maximum memory allocation pool for a Java Virtual Machine (JVM), while Xms specifies the initial memory allocation pool. The Xms flag has no default value, and Xmx typically has a default value of 256 MB. A common use for these flags is when you encounter a java.lang.OutOfMemoryError. If you are dealing with large data sets, then its always better to initialise your Java Collections (ex. Java Arraylist) with the correct size.
-
Thank you for the insight marktoddy!
-
Hello Genevieve Brandt (she/her) and marktoddy, thank you for the comments and help. I'm still having trouble with this unfortunately. I'm running the program in a folder with 180 GB of free space so I don't think its a disk space problem. Last time that I ran it and in my original post, I specified a heap size of 75GB, "-Xmx75g", is that the correct syntax? Should I lower it to 4GB? I'm running this on the most recent version of GATK.
In this next attempt I will specify a temporary directory. But I was wondering if my memory error may be due to me using Conda to keep my gatk packages updated and installed and so I'm running this command while in that gatk conda environment? https://gatk.broadinstitute.org/hc/en-us/articles/360035889851--How-to-Install-and-use-Conda-for-GATK4
-
Ok interesting. As a way to test the gatk conda installation, it says to do conda list and my environment correctly showed gatkpythonpackages as one of the packages installed so I thought it was fine earlier. But when I do 'which gatkpythonpackages' I get this:
(gatk) [jp57634@genome src]$ which gatkpythonpackages
/usr/bin/which: no gatkpythonpackages in (/work/jjlab/jp57634/miniconda3/envs/gatk/bin:/work/jjlab/jp57634/miniconda3/condabin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jp57634/Tools/samtools/bin:/home/jp57634/Tools/bwa-0.7.17:/home/jp57634/Tools/groovy-2.4.7/bin:/work/jjlab/jp57634/tools/seqtk:/users/aas4579/tools/FastQC:/home/jp57634/bin) -
Yeah, it looks like there might be a problem with the installation, but that doesn't look like what would cause the out of memory errors. I would definitely try the tmp directory to make sure that isn't causing the issue.
-
So I tried to specify a tmp directory and I got the same out of memory error after four hours and my temporary directory is empty. Did I specify the temporary directory correctly? Here is my command:
bash ../src/Mutect_run.sh "$patient"_exT_rmdup.bam "$patient"_exH_rmdup.bam "$patient" "-Xmx75g" --active-probability-threshold .0015 --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >log4 2>&1 &
-
Jack Prazich Could you share your Mutect2 command line? I don't know what is contained in the bash script.
-
Hi Genevieve Brandt (she/her), thank you so much for your patience with this. Yeah so I tried to simplify it by pulling the Mutect part out of my master run script and running it separately.
Bash command:
bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam "-Xmx75g" --active-probability-threshold .0015 --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >Contamlog2 2>&1
Mutect_run.sh script:
#Estimate contamination of bam files
$gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -O tumorgetpileupsummaries.table
$gatk GetPileupSummaries -I $2 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -O healthygetpileupsummaries.tableIt just stopped running after an hour? Without an out of memory error? And no output files? So weird. I'm running it again now.
But here's the thing. I decided to take all references to contamination tables and ran FilterMutectCalls on my Mutect2 output and it ran fine and it ran really quickly. How important is it to include the contamination tables in FilterMutectCalls?
A couple other questions:
Do I need to put --java-options before I specify -Xmx for it to work correctly?
I noticed in the GetPileupSummaries manual page that typically the variant and intervals file, -V and -L, are zipped files, file.vcf.gz. I have been using just .vcf files. Would that lead to my memory problem? I tried zipping my file into a .vcf.gz and got this error:
A USER ERROR has occurred: An index is required but was not found for file /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.
-
Okay, I'm going to try to address all your main points:
- Did Mutect2 finish once you ran it separately from GetPileupSummaries?
- Could you share the GetPileupSummaries stack trace? If there are no output files there will be something in the stack trace to figure out what went wrong.
- The contamination tables are pretty important so let's see if we can get it to work!
- Here is an article on gatk command line syntax: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax
- Zipped files might help your memory problem. Create an index for somatic-hg38_af-only-gnomad.hg38.vcf.gz with IndexFeatureFile and it should run just fine!
-
1) Yes, Mutect2 ran fine and gave me mutect_vars.bai mutect_vars.bam mutect_vars.vcf.gz mutect_vars.vcf.gz.stats mutect_vars.vcf.gz.tbi output files.
2) GetPileupSummaries stack trace when I call Mutect_run.sh from master file:
08:59:06.420 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 30, 2021 8:59:06 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:59:06.594 INFO GetPileupSummaries - ------------------------------------------------------------
08:59:06.595 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
08:59:06.595 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
08:59:06.595 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
08:59:06.595 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
08:59:06.595 INFO GetPileupSummaries - Start Date/Time: June 30, 2021 8:59:06 AM CDT
08:59:06.595 INFO GetPileupSummaries - ------------------------------------------------------------
08:59:06.596 INFO GetPileupSummaries - ------------------------------------------------------------
08:59:06.596 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
08:59:06.596 INFO GetPileupSummaries - Picard Version: 2.25.0
08:59:06.596 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
08:59:06.596 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:59:06.596 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:59:06.596 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:59:06.596 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:59:06.597 INFO GetPileupSummaries - Deflater: IntelDeflater
08:59:06.597 INFO GetPileupSummaries - Inflater: IntelInflater
08:59:06.597 INFO GetPileupSummaries - GCS max retries/reopens: 20
08:59:06.597 INFO GetPileupSummaries - Requester pays: disabled
08:59:06.597 INFO GetPileupSummaries - Initializing engine
08:59:07.063 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
08:59:09.955 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
09:39:54.954 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
09:42:07.480 INFO GetPileupSummaries - Done initializing engine
09:42:07.481 INFO ProgressMeter - Starting traversal
09:42:07.481 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
13:05:31.025 INFO GetPileupSummaries - Shutting down engine
[June 30, 2021 1:05:31 PM CDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 246.41 minutes.
Runtime.totalMemory()=31269060608
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.BitSet.initWords(BitSet.java:166)
at java.util.BitSet.<init>(BitSet.java:161)
at htsjdk.samtools.GenomicIndexUtil.regionToBins(GenomicIndexUtil.java:164)
at htsjdk.samtools.BinningIndexContent.getChunksOverlapping(BinningIndexContent.java:121)
at htsjdk.samtools.CachingBAMFileIndex.getSpanOverlapping(CachingBAMFileIndex.java:75)
at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:914)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:931)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:412)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:336)
at java.lang.Iterable.spliterator(Iterable.java:101)
at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1176)
at org.broadinstitute.hellbender.engine.GATKTool.getTransformedReadStream(GATKTool.java:378)
at org.broadinstitute.hellbender.engine.LocusWalker.getAlignmentContextIterator(LocusWalker.java:182)
at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:157)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1058)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)GetPileupSummaries Stack trace when I run Mutect_run.sh separately (no error or stack trace just stops running?):
15:42:02.856 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 29, 2021 3:42:03 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:42:03.023 INFO GetPileupSummaries - ------------------------------------------------------------
15:42:03.023 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
15:42:03.023 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
15:42:03.023 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
15:42:03.023 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
15:42:03.024 INFO GetPileupSummaries - Start Date/Time: June 29, 2021 3:42:02 PM CDT
15:42:03.024 INFO GetPileupSummaries - ------------------------------------------------------------
15:42:03.024 INFO GetPileupSummaries - ------------------------------------------------------------
15:42:03.024 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
15:42:03.024 INFO GetPileupSummaries - Picard Version: 2.25.0
15:42:03.024 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
15:42:03.025 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:42:03.025 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:42:03.025 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:42:03.025 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:42:03.025 INFO GetPileupSummaries - Deflater: IntelDeflater
15:42:03.025 INFO GetPileupSummaries - Inflater: IntelInflater
15:42:03.025 INFO GetPileupSummaries - GCS max retries/reopens: 20
15:42:03.025 INFO GetPileupSummaries - Requester pays: disabled
15:42:03.025 INFO GetPileupSummaries - Initializing engine
15:42:05.254 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
15:42:07.012 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
16:22:23.217 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
16:24:31.584 INFO GetPileupSummaries - Done initializing engine
16:24:31.584 INFO ProgressMeter - Starting traversal
16:24:31.585 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute3) Tried your suggestion to create an IndexFeatureFile from somatic-hg38_af-only-gnomad.hg38.vcf.gz and got this error:
A USER ERROR has occurred: Error while trying to create index for /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz. Error was: htsjdk.tribble.TribbleException.MalformedFeatureFile: Input file is not in valid block compressed format., for input source: /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException$CouldNotIndexFile: Error while trying to create index for /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz. Error was: htsjdk.tribble.TribbleException.MalformedFeatureFile: Input file is not in valid block compressed format., for input source: /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
at org.broadinstitute.hellbender.tools.IndexFeatureFile.createAppropriateIndexInMemory(IndexFeatureFile.java:123)
at org.broadinstitute.hellbender.tools.IndexFeatureFile.doWork(IndexFeatureFile.java:75)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Input file is not in valid block compressed format., for input source: /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
at htsjdk.tribble.index.IndexFactory$FeatureIterator.initIndexableBlockCompressedStream(IndexFactory.java:628)
at htsjdk.tribble.index.IndexFactory$FeatureIterator.<init>(IndexFactory.java:599)
at htsjdk.tribble.index.IndexFactory.createTabixIndex(IndexFactory.java:476)
at htsjdk.tribble.index.IndexFactory.createTabixIndex(IndexFactory.java:502)
at htsjdk.tribble.index.IndexFactory.createIndex(IndexFactory.java:403)
at org.broadinstitute.hellbender.tools.IndexFeatureFile.createAppropriateIndexInMemory(IndexFeatureFile.java:109) -
Thank you!
- Could you share your GetPileupSummaries gatk command line? I can't see what is going on in your bash scripts.
- If the command ends unexpectedly, your machine could be shutting it down for taking too much memory.
- It looks like there was an issue when you zipped the somatic-hg38_af-only-gnomad.hg38.vcf file and it became malformed. You can try zipping it again.
-
My bad.
Command line:
bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam --java-options "-Xmx75g" --active-probability-threshold .0015 --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir 2>&1 &
Mutect_run.sh script:
gatk=/home/jp57634/Tools/gatk-4.2.0.0/gatk
source /work/jjlab/jp57634/miniconda3/etc/profile.d/conda.sh
conda activate gatk
$gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -O tumorgetpileupsummaries.table
conda deactivate
Ok I will try zipping it again and retry.
-
Try using the Xmx options for your GetPileupSummaries command and specify a temporary directory in the command itself.
-
Ok so I've put -Xmx in the GetPileupSummaries command while keeping the --tmp-dir in the command line and tried to also reduce memory by working with the zipped .vcf file and using IndexFeatureFile as you suggested. The IndexFeatureFile has an output .vcf.gz.tbi file which I reference in the GetPileupSummaries command and is leading to an error.
Command line:
bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >Contamlog5 2>&1 &
Mutect_run.sh script:
$gatk IndexFeatureFile -I /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
$gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz.tbi -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz.tbi -O tumorgetpileupsummaries.table --java-options "-Xmx75g"Error:
A USER ERROR has occurred: Cannot read file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz.tbi because no suitable codecs found
-
Jack Prazich the .tbi index file should not be specified in the command, it just needs to exist in the same directory as the vcf file. For your -V and -L inputs, those should be the .vcf.gz files. Your error message is occurring because the tool is looking for a VCF file and it is not in that format.
-
Genevieve Brandt (she/her) I see.
So first I ran the Mutect script separately and again it stopped running after an hour:
13:13:55.757 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 06, 2021 1:13:55 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:13:55.942 INFO GetPileupSummaries - ------------------------------------------------------------
13:13:55.942 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
13:13:55.942 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
13:13:55.942 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
13:13:55.942 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
13:13:55.943 INFO GetPileupSummaries - Start Date/Time: July 6, 2021 1:13:55 PM CDT
13:13:55.943 INFO GetPileupSummaries - ------------------------------------------------------------
13:13:55.943 INFO GetPileupSummaries - ------------------------------------------------------------
13:13:55.943 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
13:13:55.944 INFO GetPileupSummaries - Picard Version: 2.25.0
13:13:55.944 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
13:13:55.944 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:13:55.944 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:13:55.944 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:13:55.944 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:13:55.944 INFO GetPileupSummaries - Deflater: IntelDeflater
13:13:55.944 INFO GetPileupSummaries - Inflater: IntelInflater
13:13:55.944 INFO GetPileupSummaries - GCS max retries/reopens: 20
13:13:55.944 INFO GetPileupSummaries - Requester pays: disabled
13:13:55.944 INFO GetPileupSummaries - Initializing engine
13:13:56.434 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
13:13:56.654 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
13:49:35.341 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
13:49:59.408 INFO GetPileupSummaries - Done initializing engine
13:49:59.409 INFO ProgressMeter - Starting traversal
13:49:59.409 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/MinuteCommand line:
bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >Contamlog5 2>&1 &
Mutect script:
$gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O tumorgetpileupsummaries.table --java-options "-Xmx75g"
And then I tried to run it from my master script. It ran for 8 hours when previously it had been running for ~4 hours. And had a slightly different memory error message:
19:56:06.739 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 06, 2021 7:56:06 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
19:56:06.911 INFO GetPileupSummaries - ------------------------------------------------------------
19:56:06.912 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
19:56:06.912 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
19:56:06.912 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
19:56:06.912 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
19:56:06.912 INFO GetPileupSummaries - Start Date/Time: July 6, 2021 7:56:06 PM CDT
19:56:06.912 INFO GetPileupSummaries - ------------------------------------------------------------
19:56:06.912 INFO GetPileupSummaries - ------------------------------------------------------------
19:56:06.913 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
19:56:06.913 INFO GetPileupSummaries - Picard Version: 2.25.0
19:56:06.913 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
19:56:06.913 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
19:56:06.913 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:56:06.913 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
19:56:06.913 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:56:06.914 INFO GetPileupSummaries - Deflater: IntelDeflater
19:56:06.914 INFO GetPileupSummaries - Inflater: IntelInflater
19:56:06.914 INFO GetPileupSummaries - GCS max retries/reopens: 20
19:56:06.914 INFO GetPileupSummaries - Requester pays: disabled
19:56:06.914 INFO GetPileupSummaries - Initializing engine
19:56:07.379 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
19:56:07.575 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
20:32:48.701 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
20:33:19.620 INFO GetPileupSummaries - Done initializing engine
20:33:19.621 INFO ProgressMeter - Starting traversal
20:33:19.621 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
03:40:48.257 INFO GetPileupSummaries - Shutting down engine
org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 464.69 minutes.
Runtime.totalMemory()=78694055936
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at java.util.ArrayList.grow(ArrayList.java:261)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
at java.util.ArrayList.addAll(ArrayList.java:579)
at htsjdk.samtools.BAMFileSpan.merge(BAMFileSpan.java:307)
at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:919)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:931)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:412)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:336)
at java.lang.Iterable.spliterator(Iterable.java:101)
at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1176)
at org.broadinstitute.hellbender.engine.GATKTool.getTransformedReadStream(GATKTool.java:378)
at org.broadinstitute.hellbender.engine.LocusWalker.getAlignmentContextIterator(LocusWalker.java:182)
at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:157)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1058)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)Master run file:
bash ../src/Mutect_run.sh "$patient"_exT_rmdup.bam "$patient"_exH_rmdup.bam "$patient" --java-options "-Xmx75g" --active-probability-threshold .0015 --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >Contamlog6 2>&1 &
Mutect_run.sh script:
$gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O tumorgetpileupsummaries.table --java-options "-Xmx75g"
I also got a tmp_read_resource_8814225368090996308.config file written to my temporary directory from one of the runs. I'm not sure which because I didn't check it until both had run.
-
Jack Prazich These still look like memory issues.
I noticed that your GetPileupSummaries commands are not following the gatk command line syntax. The java options should follow the gatk wrapper script. You also do not have a temporary directory in that command. I really think you need to include a temporary directory in the specific GetPileupSummaries command.
Here is the command line syntax article with examples: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax
-
Ahh I see when you said "command itself" you meant the gatk command. I thought you meant the command line.
So I changed my command to put the tmpdir in the Mutect run file and changed the --java-options to the front to match the command line syntax article you sent:
Command line:
bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam >Contamlog5 2>&1 &
Mutect_run.sh:
$gatk GetPileupSummaries --java-options "-Xmx5g" -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O tumorgetpileupsummaries.table --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir2
Stack trace still getting a memory problem:
[July 8, 2021 9:35:22 AM CDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 127.04 minutes.
Runtime.totalMemory()=5344591872
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.String.substring(String.java:1969)
at htsjdk.tribble.util.ParsingUtils.split(ParsingUtils.java:259)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:375)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:173)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at org.broadinstitute.hellbender.utils.IntervalUtils.featureFileToIntervals(IntervalUtils.java:359)
at org.broadinstitute.hellbender.utils.IntervalUtils.parseIntervalArguments(IntervalUtils.java:319)
at org.broadinstitute.hellbender.utils.IntervalUtils.loadIntervals(IntervalUtils.java:239)
at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.parseIntervals(IntervalArgumentCollection.java:200)
at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getTraversalParameters(IntervalArgumentCollection.java:180)
at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getIntervals(IntervalArgumentCollection.java:111)
at org.broadinstitute.hellbender.engine.GATKTool.initializeIntervals(GATKTool.java:514)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:709)
at org.broadinstitute.hellbender.engine.LocusWalker.onStartup(LocusWalker.java:136)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)In my temporary directory I'm now getting three .config output files:
tmp_read_resource_3935243153837023284.config tmp_read_resource_5129549662898584203.config tmp_read_resource_9116916725022102535.config
-
Jack Prazich I see! Glad there is more clarity now.
I noticed earlier you specified 75g for the xmx java option but here you are only specifying 5g. Did you try it both ways yet?
-
Genevieve Brandt (she/her) Yeah I tried it with 75 GB. It ran for 12 hours and hit the same memory error. It output six .config files to the tmpdir this time instead of three.
Mutect_run.sh Command:
$gatk GetPileupSummaries --java-options "-Xmx75g" -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O tumorgetpileupsummaries.table --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir2
I'm out of ideas basically..
-
Most likely this tool is taking a long time because you are using an intervals file with many variants which is slow for GATK to process. Since you have exhausted all other options, I would recommend splitting the intervals file and scattering the jobs. You can split the intervals file in many different ways but one option would be by chromosome. After running multiple instances of GetPileupSummaries, you can gather the output tables with the GATK tool GatherPileupSummaries and proceed to your next step.
-
Hi, I recently ran into the same out of memory/GC overhead error. For me, it turned out the issue was that the gnomad vcf file I provided was way too big. After filtering the vcf to keep only the variants with AF>0.01 (i.e. corresponding to the default --minimum-population-allele-frequency of GetPileupSummaries, so the discarded variants won't be used by default anyway), I ended up with a vcf.gz file that is about 150M, and then everything worked without any issue. Hope this may help.
-
Thanks for this insight Kenneth, it's very helpful!
-
Hi Kenneth and Genevieve Brandt (she/her), that sounds promising. Would you mind sharing how you filtered? I was going to use awk and the column, but I see that both the AC and AF value are in that column. Now I'm trying to use python but running into problems with reading in the vcf.gz file
-
Hello Genevieve Brandt (she/her), I feel like I'm very close to getting this. So my original gnomad vcf file was 17 GB and 3 GB when zipped. To filter on AF like Kenneth recommended I used gatk VariantFiltration:
Command:
#$gatk VariantFiltration -R /home/jp57634/References/BWA_Reference/GRCh38/GCA_000001405.15_GRCh38_full_analysis_set.fna -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O filtsomatic-hg38_af-only-gnomad.hg38.vcf.gz --filter-expression "AF<0.01||AF>0.2" --filter-name "AF_out_of_range" &
That changed the filter status of the majority of my variants. However, I couldn’t figure out how to filter only the “PASS” epitopes using VariantFiltration. I know there has to be an easier way.
Anyway instead I unzipped the file and ran:
sed '/#CHROM/q' filtsomatic-hg38_af-only-gnomad.hg38.vcf > test.vcf #Pull off the header
awk '$7 == "PASS" { print $0 }' filtsomatic-hg38_af-only-gnomad.hg38.vcf >> test.vcf #Pull all Pass epitopes and add them to the vcf file.
That appeared to do the job and so I rezipped it and my new gnomad file with only “PASS” epitopes was 518 MB.
I then ran GetPileupSummaries. It says GetPileupSummaries completes and I'm not getting a memory problem, but I'm getting a different error and no output table.
Command:
$gatk GetPileupSummaries --java-options "-Xmx10g" -I $1 -V /scratch/jp57634/UTMPi5039/Neoantigen_Pipeline_DNA/src/test.vcf.gz -L /scratch/jp57634/UTMPi5039/Neoantigen_Pipeline_DNA/src/test.vcf.gz -O tumorgetpileupsummaries.table --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir2
Output:
10:28:36.717 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 21, 2021 10:28:36 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:28:36.914 INFO GetPileupSummaries - ------------------------------------------------------------
10:28:36.915 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
10:28:36.915 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
10:28:36.915 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
10:28:36.915 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
10:28:36.916 INFO GetPileupSummaries - Start Date/Time: July 21, 2021 10:28:36 AM CDT
10:28:36.916 INFO GetPileupSummaries - ------------------------------------------------------------
10:28:36.916 INFO GetPileupSummaries - ------------------------------------------------------------
10:28:36.916 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
10:28:36.917 INFO GetPileupSummaries - Picard Version: 2.25.0
10:28:36.917 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
10:28:36.917 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:28:36.917 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:28:36.917 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:28:36.917 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:28:36.917 INFO GetPileupSummaries - Deflater: IntelDeflater
10:28:36.917 INFO GetPileupSummaries - Inflater: IntelInflater
10:28:36.917 INFO GetPileupSummaries - GCS max retries/reopens: 20
10:28:36.917 INFO GetPileupSummaries - Requester pays: disabled
10:28:36.917 INFO GetPileupSummaries - Initializing engine
10:28:37.449 INFO FeatureManager - Using codec VCFCodec to read file file:///scratch/jp57634/UTMPi5039/Neoantigen_Pipeline_DNA/src/test.vcf.gz
10:28:37.649 INFO FeatureManager - Using codec VCFCodec to read file file:///scratch/jp57634/UTMPi5039/Neoantigen_Pipeline_DNA/src/test.vcf.gz
10:31:35.909 INFO IntervalArgumentCollection - Processing 47666635 bp from intervals
10:31:49.566 INFO GetPileupSummaries - Done initializing engine
10:31:49.567 INFO ProgressMeter - Starting traversal
10:31:49.567 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
10:39:19.616 INFO GetPileupSummaries - 0 read(s) filtered by: MappingQualityAvailableReadFilter
1044983 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappedReadFilter
29456 read(s) filtered by: PrimaryLineReadFilter
15721092 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
198797 read(s) filtered by: MateOnSameContigOrNoMappedMateReadFilter
0 read(s) filtered by: GoodCigarReadFilter
40777909 read(s) filtered by: WellformedReadFilter
57772237 total reads filtered
10:39:19.617 INFO ProgressMeter - unmapped 7.5 0 0.0
10:39:19.617 INFO ProgressMeter - Traversal complete. Processed 0 total loci in 7.5 minutes.
10:39:19.619 INFO GetPileupSummaries - Shutting down engine
[July 21, 2021 10:39:19 AM CDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 10.72 minutes.
Runtime.totalMemory()=10668212224
java.util.NoSuchElementException: No value present
at java.util.Optional.get(Optional.java:135)
at org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries.onTraversalSuccess(GetPileupSummaries.java:210)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1062)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
-
Jack Prazich, yes, VariantFiltration is great for what you are looking to do. To remove the variants that do not pass your filtering threshold from the file, you can use SelectVariants with --exclude-filtered set to true.
The error message you got from GetPileupSummaries is really strange, I have never seen it before. Could you try filtering your non-PASS variants with SelectVariants instead of sed and see of you still get the error message?
-
Hi Jack Prazich, I obtained the original gnomad vcf file that someone else at my institution processed, which contains only AF in the INFO field. I used SelectVariants like below to obtain the smaller-sized vcf which I then used for GetPileupSummaries.
java -Xmx8G -Djava.io.tmpdir=$JAVA_TEMP -jar $GATK SelectVariants -V gnomad.vcf \
--select-type-to-include SNP \
--restrict-alleles-to BIALLELIC \
-select "AF > 0.01" \
-O gnomad.subset.vcf.gz \
--lenient
Please sign in to leave a comment.
28 comments