Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

java.lang.OutofMemoryError: GC overhead limit exceeded

Answered
0

28 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Jack Prazich,

    To help with runtime or memory usage, try the following:

    1. Check memory/disk space availability on your end.

    2. Specify java memory usage using java option -Xmx.

    3. Specify a --tmp-dir that has room for all necessary temporary files.

    4. Verify this issue persists with the latest version of GATK.

    5. Check the depth of coverage of your sample at the area of interest.

    0
    Comment actions Permalink
  • Avatar
    marktoddy

    You have to specify the heap size whenever you run your program. If you are executing on the command line, whenever you execute using "java " include a parameter: "-Xmx4g -Xmx4g" or whatever you want your heap size to be. The flag Xmx specifies the maximum memory allocation pool for a Java Virtual Machine (JVM), while Xms specifies the initial memory allocation pool. The Xms flag has no default value, and Xmx typically has a default value of 256 MB. A common use for these flags is when you encounter a java.lang.OutOfMemoryError. If you are dealing with large data sets, then its always better to initialise your Java Collections (ex. Java Arraylist) with the correct size.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you for the insight marktoddy!

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    Hello Genevieve-Brandt-she-her and marktoddy, thank you for the comments and help. I'm still having trouble with this unfortunately. I'm running the program in a folder with 180 GB of free space so I don't think its a disk space problem. Last time that I ran it and in my original post, I specified a heap size of 75GB, "-Xmx75g", is that the correct syntax? Should I lower it to 4GB? I'm running this on the most recent version of GATK.

     

    In this next attempt I will specify a temporary directory. But I was wondering if my memory error may be due to me using Conda to keep my gatk packages updated and installed and so I'm running this command while in that gatk conda environment? https://gatk.broadinstitute.org/hc/en-us/articles/360035889851--How-to-Install-and-use-Conda-for-GATK4

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    Ok interesting. As a way to test the gatk conda installation, it says to do conda list and my environment correctly showed gatkpythonpackages as one of the packages installed so I thought it was fine earlier. But when I do 'which gatkpythonpackages' I get this:

    (gatk) [jp57634@genome src]$ which gatkpythonpackages
    /usr/bin/which: no gatkpythonpackages in (/work/jjlab/jp57634/miniconda3/envs/gatk/bin:/work/jjlab/jp57634/miniconda3/condabin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jp57634/Tools/samtools/bin:/home/jp57634/Tools/bwa-0.7.17:/home/jp57634/Tools/groovy-2.4.7/bin:/work/jjlab/jp57634/tools/seqtk:/users/aas4579/tools/FastQC:/home/jp57634/bin)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yeah, it looks like there might be a problem with the installation, but that doesn't look like what would cause the out of memory errors. I would definitely try the tmp directory to make sure that isn't causing the issue.

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    So I tried to specify a tmp directory and I got the same out of memory error after four hours and my temporary directory is empty. Did I specify the temporary directory correctly? Here is my command:

    bash ../src/Mutect_run.sh "$patient"_exT_rmdup.bam "$patient"_exH_rmdup.bam "$patient" "-Xmx75g" --active-probability-threshold .0015 --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >log4 2>&1 &

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Jack Prazich Could you share your Mutect2 command line? I don't know what is contained in the bash script.

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    Hi Genevieve-Brandt-she-her, thank you so much for your patience with this. Yeah so I tried to simplify it by pulling the Mutect part out of my master run script and running it separately. 

    Bash command: 

    bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam "-Xmx75g" --active-probability-threshold .0015 --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >Contamlog2 2>&1

    Mutect_run.sh script:

    #Estimate contamination of bam files
    $gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -O tumorgetpileupsummaries.table
    $gatk GetPileupSummaries -I $2 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -O healthygetpileupsummaries.table

    It just stopped running after an hour? Without an out of memory error? And no output files? So weird. I'm running it again now.

     

    But here's the thing. I decided to take all references to contamination tables and ran FilterMutectCalls on my Mutect2 output and it ran fine and it ran really quickly. How important is it to include the contamination tables in FilterMutectCalls?

     

    A couple other questions:

    Do I need to put --java-options before I specify -Xmx for it to work correctly?

    I noticed in the GetPileupSummaries manual page that typically the variant and intervals file, -V and -L, are zipped files, file.vcf.gz. I have been using just .vcf files. Would that lead to my memory problem? I tried zipping my file into a .vcf.gz and got this error:

    A USER ERROR has occurred: An index is required but was not found for file /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Okay, I'm going to try to address all your main points:

    • Did Mutect2 finish once you ran it separately from GetPileupSummaries?
    • Could you share the GetPileupSummaries stack trace? If there are no output files there will be something in the stack trace to figure out what went wrong.
    • The contamination tables are pretty important so let's see if we can get it to work!
    • Here is an article on gatk command line syntax: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax
    • Zipped files might help your memory problem. Create an index for somatic-hg38_af-only-gnomad.hg38.vcf.gz with IndexFeatureFile and it should run just fine!
    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    1) Yes, Mutect2 ran fine and gave me mutect_vars.bai mutect_vars.bam mutect_vars.vcf.gz mutect_vars.vcf.gz.stats mutect_vars.vcf.gz.tbi output files.

     

    2) GetPileupSummaries stack trace when I call Mutect_run.sh from master file:

    08:59:06.420 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jun 30, 2021 8:59:06 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    08:59:06.594 INFO GetPileupSummaries - ------------------------------------------------------------
    08:59:06.595 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
    08:59:06.595 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
    08:59:06.595 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
    08:59:06.595 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
    08:59:06.595 INFO GetPileupSummaries - Start Date/Time: June 30, 2021 8:59:06 AM CDT
    08:59:06.595 INFO GetPileupSummaries - ------------------------------------------------------------
    08:59:06.596 INFO GetPileupSummaries - ------------------------------------------------------------
    08:59:06.596 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
    08:59:06.596 INFO GetPileupSummaries - Picard Version: 2.25.0
    08:59:06.596 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
    08:59:06.596 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:59:06.596 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:59:06.596 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:59:06.596 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:59:06.597 INFO GetPileupSummaries - Deflater: IntelDeflater
    08:59:06.597 INFO GetPileupSummaries - Inflater: IntelInflater
    08:59:06.597 INFO GetPileupSummaries - GCS max retries/reopens: 20
    08:59:06.597 INFO GetPileupSummaries - Requester pays: disabled
    08:59:06.597 INFO GetPileupSummaries - Initializing engine
    08:59:07.063 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
    08:59:09.955 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
    09:39:54.954 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
    09:42:07.480 INFO GetPileupSummaries - Done initializing engine
    09:42:07.481 INFO ProgressMeter - Starting traversal
    09:42:07.481 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
    13:05:31.025 INFO GetPileupSummaries - Shutting down engine
    [June 30, 2021 1:05:31 PM CDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 246.41 minutes.
    Runtime.totalMemory()=31269060608
    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.BitSet.initWords(BitSet.java:166)
    at java.util.BitSet.<init>(BitSet.java:161)
    at htsjdk.samtools.GenomicIndexUtil.regionToBins(GenomicIndexUtil.java:164)
    at htsjdk.samtools.BinningIndexContent.getChunksOverlapping(BinningIndexContent.java:121)
    at htsjdk.samtools.CachingBAMFileIndex.getSpanOverlapping(CachingBAMFileIndex.java:75)
    at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:914)
    at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:931)
    at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
    at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
    at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
    at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
    at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
    at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:412)
    at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:336)
    at java.lang.Iterable.spliterator(Iterable.java:101)
    at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1176)
    at org.broadinstitute.hellbender.engine.GATKTool.getTransformedReadStream(GATKTool.java:378)
    at org.broadinstitute.hellbender.engine.LocusWalker.getAlignmentContextIterator(LocusWalker.java:182)
    at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:157)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1058)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

    GetPileupSummaries Stack trace when I run Mutect_run.sh separately (no error or stack trace just stops running?):

    15:42:02.856 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jun 29, 2021 3:42:03 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    15:42:03.023 INFO GetPileupSummaries - ------------------------------------------------------------
    15:42:03.023 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
    15:42:03.023 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
    15:42:03.023 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
    15:42:03.023 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
    15:42:03.024 INFO GetPileupSummaries - Start Date/Time: June 29, 2021 3:42:02 PM CDT
    15:42:03.024 INFO GetPileupSummaries - ------------------------------------------------------------
    15:42:03.024 INFO GetPileupSummaries - ------------------------------------------------------------
    15:42:03.024 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
    15:42:03.024 INFO GetPileupSummaries - Picard Version: 2.25.0
    15:42:03.024 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
    15:42:03.025 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    15:42:03.025 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    15:42:03.025 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    15:42:03.025 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    15:42:03.025 INFO GetPileupSummaries - Deflater: IntelDeflater
    15:42:03.025 INFO GetPileupSummaries - Inflater: IntelInflater
    15:42:03.025 INFO GetPileupSummaries - GCS max retries/reopens: 20
    15:42:03.025 INFO GetPileupSummaries - Requester pays: disabled
    15:42:03.025 INFO GetPileupSummaries - Initializing engine
    15:42:05.254 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
    15:42:07.012 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf
    16:22:23.217 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
    16:24:31.584 INFO GetPileupSummaries - Done initializing engine
    16:24:31.584 INFO ProgressMeter - Starting traversal
    16:24:31.585 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute

     

    3) Tried your suggestion to create an IndexFeatureFile from somatic-hg38_af-only-gnomad.hg38.vcf.gz and got this error:

    A USER ERROR has occurred: Error while trying to create index for /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz. Error was: htsjdk.tribble.TribbleException.MalformedFeatureFile: Input file is not in valid block compressed format., for input source: /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz

    ***********************************************************************
    org.broadinstitute.hellbender.exceptions.UserException$CouldNotIndexFile: Error while trying to create index for /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz. Error was: htsjdk.tribble.TribbleException.MalformedFeatureFile: Input file is not in valid block compressed format., for input source: /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
    at org.broadinstitute.hellbender.tools.IndexFeatureFile.createAppropriateIndexInMemory(IndexFeatureFile.java:123)
    at org.broadinstitute.hellbender.tools.IndexFeatureFile.doWork(IndexFeatureFile.java:75)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Input file is not in valid block compressed format., for input source: /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
    at htsjdk.tribble.index.IndexFactory$FeatureIterator.initIndexableBlockCompressedStream(IndexFactory.java:628)
    at htsjdk.tribble.index.IndexFactory$FeatureIterator.<init>(IndexFactory.java:599)
    at htsjdk.tribble.index.IndexFactory.createTabixIndex(IndexFactory.java:476)
    at htsjdk.tribble.index.IndexFactory.createTabixIndex(IndexFactory.java:502)
    at htsjdk.tribble.index.IndexFactory.createIndex(IndexFactory.java:403)
    at org.broadinstitute.hellbender.tools.IndexFeatureFile.createAppropriateIndexInMemory(IndexFeatureFile.java:109)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you!

    • Could you share your GetPileupSummaries gatk command line? I can't see what is going on in your bash scripts.
    • If the command ends unexpectedly, your machine could be shutting it down for taking too much memory.
    • It looks like there was an issue when you zipped the somatic-hg38_af-only-gnomad.hg38.vcf file and it became malformed. You can try zipping it again.
    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    My bad.

    Command line:

    bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam --java-options "-Xmx75g" --active-probability-threshold .0015 --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir 2>&1 &

     

    Mutect_run.sh script: 

    gatk=/home/jp57634/Tools/gatk-4.2.0.0/gatk

    source /work/jjlab/jp57634/miniconda3/etc/profile.d/conda.sh

    conda activate gatk

    $gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf -O tumorgetpileupsummaries.table

    conda deactivate

     

    Ok I will try zipping it again and retry.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Try using the Xmx options for your GetPileupSummaries command and specify a temporary directory in the command itself.

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    Ok so I've put -Xmx in the GetPileupSummaries command while keeping the --tmp-dir in the command line and tried to also reduce memory by working with the zipped .vcf file and using IndexFeatureFile as you suggested. The IndexFeatureFile has an output .vcf.gz.tbi file which I reference in the GetPileupSummaries command and is leading to an error.

     

    Command line:

    bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >Contamlog5 2>&1 &

     

    Mutect_run.sh script:

    $gatk IndexFeatureFile -I /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
    $gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz.tbi -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz.tbi -O tumorgetpileupsummaries.table --java-options "-Xmx75g"

     

    Error: 

    A USER ERROR has occurred: Cannot read file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz.tbi because no suitable codecs found

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Jack Prazich the .tbi index file should not be specified in the command, it just needs to exist in the same directory as the vcf file. For your -V and -L inputs, those should be the .vcf.gz files. Your error message is occurring because the tool is looking for a VCF file and it is not in that format.

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    Genevieve-Brandt-she-her I see. 

     

    So first I ran the Mutect script separately and again it stopped running after an hour: 

    13:13:55.757 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jul 06, 2021 1:13:55 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    13:13:55.942 INFO GetPileupSummaries - ------------------------------------------------------------
    13:13:55.942 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
    13:13:55.942 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
    13:13:55.942 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
    13:13:55.942 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
    13:13:55.943 INFO GetPileupSummaries - Start Date/Time: July 6, 2021 1:13:55 PM CDT
    13:13:55.943 INFO GetPileupSummaries - ------------------------------------------------------------
    13:13:55.943 INFO GetPileupSummaries - ------------------------------------------------------------
    13:13:55.943 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
    13:13:55.944 INFO GetPileupSummaries - Picard Version: 2.25.0
    13:13:55.944 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
    13:13:55.944 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    13:13:55.944 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    13:13:55.944 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    13:13:55.944 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    13:13:55.944 INFO GetPileupSummaries - Deflater: IntelDeflater
    13:13:55.944 INFO GetPileupSummaries - Inflater: IntelInflater
    13:13:55.944 INFO GetPileupSummaries - GCS max retries/reopens: 20
    13:13:55.944 INFO GetPileupSummaries - Requester pays: disabled
    13:13:55.944 INFO GetPileupSummaries - Initializing engine
    13:13:56.434 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
    13:13:56.654 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
    13:49:35.341 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
    13:49:59.408 INFO GetPileupSummaries - Done initializing engine
    13:49:59.409 INFO ProgressMeter - Starting traversal
    13:49:59.409 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute

     

    Command line:

    bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >Contamlog5 2>&1 &

    Mutect script:

    $gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O tumorgetpileupsummaries.table --java-options "-Xmx75g"

     

    And then I tried to run it from my master script. It ran for 8 hours when previously it had been running for ~4 hours. And had a slightly different memory error message: 

    19:56:06.739 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jul 06, 2021 7:56:06 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    19:56:06.911 INFO GetPileupSummaries - ------------------------------------------------------------
    19:56:06.912 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
    19:56:06.912 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
    19:56:06.912 INFO GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64
    19:56:06.912 INFO GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
    19:56:06.912 INFO GetPileupSummaries - Start Date/Time: July 6, 2021 7:56:06 PM CDT
    19:56:06.912 INFO GetPileupSummaries - ------------------------------------------------------------
    19:56:06.912 INFO GetPileupSummaries - ------------------------------------------------------------
    19:56:06.913 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
    19:56:06.913 INFO GetPileupSummaries - Picard Version: 2.25.0
    19:56:06.913 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
    19:56:06.913 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    19:56:06.913 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    19:56:06.913 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    19:56:06.913 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    19:56:06.914 INFO GetPileupSummaries - Deflater: IntelDeflater
    19:56:06.914 INFO GetPileupSummaries - Inflater: IntelInflater
    19:56:06.914 INFO GetPileupSummaries - GCS max retries/reopens: 20
    19:56:06.914 INFO GetPileupSummaries - Requester pays: disabled
    19:56:06.914 INFO GetPileupSummaries - Initializing engine
    19:56:07.379 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
    19:56:07.575 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz
    20:32:48.701 INFO IntervalArgumentCollection - Processing 326649654 bp from intervals
    20:33:19.620 INFO GetPileupSummaries - Done initializing engine
    20:33:19.621 INFO ProgressMeter - Starting traversal
    20:33:19.621 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
    03:40:48.257 INFO GetPileupSummaries - Shutting down engine
    org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 464.69 minutes.
    Runtime.totalMemory()=78694055936
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3181)
    at java.util.ArrayList.grow(ArrayList.java:261)
    at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
    at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
    at java.util.ArrayList.addAll(ArrayList.java:579)
    at htsjdk.samtools.BAMFileSpan.merge(BAMFileSpan.java:307)
    at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:919)
    at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:931)
    at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
    at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
    at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
    at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
    at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
    at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:412)
    at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:336)
    at java.lang.Iterable.spliterator(Iterable.java:101)
    at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1176)
    at org.broadinstitute.hellbender.engine.GATKTool.getTransformedReadStream(GATKTool.java:378)
    at org.broadinstitute.hellbender.engine.LocusWalker.getAlignmentContextIterator(LocusWalker.java:182)
    at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:157)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1058)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

    Master run file:

    bash ../src/Mutect_run.sh "$patient"_exT_rmdup.bam "$patient"_exH_rmdup.bam "$patient" --java-options "-Xmx75g" --active-probability-threshold .0015 --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir >Contamlog6 2>&1 &

    Mutect_run.sh script: 

    $gatk GetPileupSummaries -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O tumorgetpileupsummaries.table --java-options "-Xmx75g"

     

    I also got a tmp_read_resource_8814225368090996308.config file written to my temporary directory from one of the runs. I'm not sure which because I didn't check it until both had run.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Jack Prazich These still look like memory issues.

    I noticed that your GetPileupSummaries commands are not following the gatk command line syntax. The java options should follow the gatk wrapper script. You also do not have a temporary directory in that command. I really think you need to include a temporary directory in the specific GetPileupSummaries command.

    Here is the command line syntax article with examples: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    Ahh I see when you said "command itself" you meant the gatk command. I thought you meant the command line.

     

    So I changed my command to put the tmpdir in the Mutect run file and changed the --java-options to the front to match the command line syntax article you sent: 

    Command line:

    bash Mutect_run.sh UTMPi5039testpatient_exT_rmdup.bam UTMPi5039testpatient_exH_rmdup.bam >Contamlog5 2>&1 &

    Mutect_run.sh:

    $gatk GetPileupSummaries --java-options "-Xmx5g" -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O tumorgetpileupsummaries.table --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir2

     

    Stack trace still getting a memory problem:

    [July 8, 2021 9:35:22 AM CDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 127.04 minutes.
    Runtime.totalMemory()=5344591872
    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at java.lang.String.substring(String.java:1969)
    at htsjdk.tribble.util.ParsingUtils.split(ParsingUtils.java:259)
    at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:375)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)
    at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:173)
    at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
    at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
    at org.broadinstitute.hellbender.utils.IntervalUtils.featureFileToIntervals(IntervalUtils.java:359)
    at org.broadinstitute.hellbender.utils.IntervalUtils.parseIntervalArguments(IntervalUtils.java:319)
    at org.broadinstitute.hellbender.utils.IntervalUtils.loadIntervals(IntervalUtils.java:239)
    at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.parseIntervals(IntervalArgumentCollection.java:200)
    at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getTraversalParameters(IntervalArgumentCollection.java:180)
    at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getIntervals(IntervalArgumentCollection.java:111)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeIntervals(GATKTool.java:514)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:709)
    at org.broadinstitute.hellbender.engine.LocusWalker.onStartup(LocusWalker.java:136)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

    In my temporary directory I'm now getting three .config output files:

    tmp_read_resource_3935243153837023284.config tmp_read_resource_5129549662898584203.config tmp_read_resource_9116916725022102535.config

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Jack Prazich I see! Glad there is more clarity now. 

    I noticed earlier you specified 75g for the xmx java option but here you are only specifying 5g. Did you try it both ways yet? 

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    Genevieve-Brandt-she-her Yeah I tried it with 75 GB. It ran for 12 hours and hit the same memory error. It output six .config files to the tmpdir this time instead of three.

     

    Mutect_run.sh Command: 

    $gatk GetPileupSummaries --java-options "-Xmx75g" -I $1 -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -L /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O tumorgetpileupsummaries.table --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir2

     

    I'm out of ideas basically..

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Most likely this tool is taking a long time because you are using an intervals file with many variants which is slow for GATK to process. Since you have exhausted all other options, I would recommend splitting the intervals file and scattering the jobs. You can split the intervals file in many different ways but one option would be by chromosome. After running multiple instances of GetPileupSummaries, you can gather the output tables with the GATK tool GatherPileupSummaries and proceed to your next step.

    0
    Comment actions Permalink
  • Avatar
    Kenneth

    Hi, I recently ran into the same out of memory/GC overhead error. For me, it turned out the issue was that the gnomad vcf file I provided was way too big. After filtering the vcf to keep only the variants with AF>0.01 (i.e. corresponding to the default --minimum-population-allele-frequency of GetPileupSummaries, so the discarded variants won't be used by default anyway), I ended up with a vcf.gz file that is about 150M, and then everything worked without any issue. Hope this may help.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for this insight Kenneth, it's very helpful!

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    Hi Kenneth and Genevieve-Brandt-she-her, that sounds promising. Would you mind sharing how you filtered? I was going to use awk and the column, but I see that both the AC and AF value are in that column. Now I'm trying to use python but running into problems with reading in the vcf.gz file

    0
    Comment actions Permalink
  • Avatar
    Jack Prazich

    Hello Genevieve-Brandt-she-her, I feel like I'm very close to getting this. So my original gnomad vcf file was 17 GB and 3 GB when zipped. To filter on AF like Kenneth recommended I used gatk VariantFiltration:

     

    Command:

    #$gatk VariantFiltration -R /home/jp57634/References/BWA_Reference/GRCh38/GCA_000001405.15_GRCh38_full_analysis_set.fna -V /home/jp57634/References/Mutect/somatic-hg38_af-only-gnomad.hg38.vcf.gz -O filtsomatic-hg38_af-only-gnomad.hg38.vcf.gz --filter-expression "AF<0.01||AF>0.2" --filter-name "AF_out_of_range" &

     

    That changed the filter status of the majority of my variants. However, I couldn’t figure out how to filter only the “PASS” epitopes using VariantFiltration. I know there has to be an easier way.

     

    Anyway instead I unzipped the file and ran:

    sed '/#CHROM/q' filtsomatic-hg38_af-only-gnomad.hg38.vcf  > test.vcf #Pull off the header

    awk '$7 == "PASS" { print $0 }' filtsomatic-hg38_af-only-gnomad.hg38.vcf >> test.vcf #Pull all Pass epitopes and add them to the vcf file.

     

    That appeared to do the job and so I rezipped it and my new gnomad file with only “PASS” epitopes was 518 MB.

     

    I then ran GetPileupSummaries. It says GetPileupSummaries completes and I'm not getting a memory problem, but I'm getting a different error and no output table.

     

    Command:

    $gatk GetPileupSummaries --java-options "-Xmx10g" -I $1  -V /scratch/jp57634/UTMPi5039/Neoantigen_Pipeline_DNA/src/test.vcf.gz -L /scratch/jp57634/UTMPi5039/Neoantigen_Pipeline_DNA/src/test.vcf.gz -O tumorgetpileupsummaries.table --tmp-dir /scratch/jp57634/UTMPi5039/tmpdir2

     

     

    Output:

    10:28:36.717 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

    Jul 21, 2021 10:28:36 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    10:28:36.914 INFO  GetPileupSummaries - ------------------------------------------------------------

    10:28:36.915 INFO  GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0

    10:28:36.915 INFO  GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/

    10:28:36.915 INFO  GetPileupSummaries - Executing as jp57634@genome.bme.utexas.edu on Linux v2.6.32-642.6.1.el6.x86_64 amd64

    10:28:36.915 INFO  GetPileupSummaries - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11

    10:28:36.916 INFO  GetPileupSummaries - Start Date/Time: July 21, 2021 10:28:36 AM CDT

    10:28:36.916 INFO  GetPileupSummaries - ------------------------------------------------------------

    10:28:36.916 INFO  GetPileupSummaries - ------------------------------------------------------------

    10:28:36.916 INFO  GetPileupSummaries - HTSJDK Version: 2.24.0

    10:28:36.917 INFO  GetPileupSummaries - Picard Version: 2.25.0

    10:28:36.917 INFO  GetPileupSummaries - Built for Spark Version: 2.4.5

    10:28:36.917 INFO  GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    10:28:36.917 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    10:28:36.917 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    10:28:36.917 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    10:28:36.917 INFO  GetPileupSummaries - Deflater: IntelDeflater

    10:28:36.917 INFO  GetPileupSummaries - Inflater: IntelInflater

    10:28:36.917 INFO  GetPileupSummaries - GCS max retries/reopens: 20

    10:28:36.917 INFO  GetPileupSummaries - Requester pays: disabled

    10:28:36.917 INFO  GetPileupSummaries - Initializing engine

    10:28:37.449 INFO  FeatureManager - Using codec VCFCodec to read file file:///scratch/jp57634/UTMPi5039/Neoantigen_Pipeline_DNA/src/test.vcf.gz

    10:28:37.649 INFO  FeatureManager - Using codec VCFCodec to read file file:///scratch/jp57634/UTMPi5039/Neoantigen_Pipeline_DNA/src/test.vcf.gz

    10:31:35.909 INFO  IntervalArgumentCollection - Processing 47666635 bp from intervals

    10:31:49.566 INFO  GetPileupSummaries - Done initializing engine

    10:31:49.567 INFO  ProgressMeter - Starting traversal

    10:31:49.567 INFO  ProgressMeter -        Current Locus  Elapsed Minutes        Loci Processed      Loci/Minute

    10:39:19.616 INFO  GetPileupSummaries - 0 read(s) filtered by: MappingQualityAvailableReadFilter

    1044983 read(s) filtered by: MappingQualityNotZeroReadFilter

    0 read(s) filtered by: MappedReadFilter

    29456 read(s) filtered by: PrimaryLineReadFilter

    15721092 read(s) filtered by: NotDuplicateReadFilter

    0 read(s) filtered by: PassesVendorQualityCheckReadFilter

    0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter

    198797 read(s) filtered by: MateOnSameContigOrNoMappedMateReadFilter

    0 read(s) filtered by: GoodCigarReadFilter

    40777909 read(s) filtered by: WellformedReadFilter

    57772237 total reads filtered

    10:39:19.617 INFO  ProgressMeter -             unmapped              7.5                     0              0.0

    10:39:19.617 INFO  ProgressMeter - Traversal complete. Processed 0 total loci in 7.5 minutes.

    10:39:19.619 INFO  GetPileupSummaries - Shutting down engine

    [July 21, 2021 10:39:19 AM CDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 10.72 minutes.

    Runtime.totalMemory()=10668212224

    java.util.NoSuchElementException: No value present

            at java.util.Optional.get(Optional.java:135)

            at org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries.onTraversalSuccess(GetPileupSummaries.java:210)

            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1062)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)

            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

            at org.broadinstitute.hellbender.Main.main(Main.java:289)

    Using GATK jar /home/jp57634/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Jack Prazich, yes, VariantFiltration is great for what you are looking to do. To remove the variants that do not pass your filtering threshold from the file, you can use SelectVariants with --exclude-filtered set to true. 

    The error message you got from GetPileupSummaries is really strange, I have never seen it before. Could you try filtering your non-PASS variants with SelectVariants instead of sed and see of you still get the error message?

    0
    Comment actions Permalink
  • Avatar
    Kenneth

    Hi Jack Prazich, I obtained the original gnomad vcf file that someone else at my institution processed, which contains only AF in the INFO field. I used SelectVariants like below to obtain the smaller-sized vcf which I then used for GetPileupSummaries.

    java -Xmx8G -Djava.io.tmpdir=$JAVA_TEMP -jar $GATK SelectVariants -V gnomad.vcf \
    --select-type-to-include SNP \
    --restrict-alleles-to BIALLELIC \
    -select "AF > 0.01" \
    -O gnomad.subset.vcf.gz \
    --lenient
    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk