Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK Liftover memory issues on cohort vcf

Answered
0

11 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Jose Arcadio Buendia, please include your command, GATK version, and entire stack trace for troubleshooting.

    0
    Comment actions Permalink
  • Avatar
    Jose Arcadio Buendia
    gatk LiftoverVcf --TMP_DIR tmp/ -I test.vcf -O out.vcf --CHAIN /igm/home/liftover/Hg38Tob37.over.chain --REJECT out.rejected.vcf -R /igm/apps/genomes/Homo_sapiens/human_g1k_v37_decoy/human_g1k_v37_decoy.fasta --WRITE_ORIGINAL_POSITION TRUE
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Jose Arcadio Buendia, you can set memory allocation using the java option -Xmx, more info can be found here: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax

    0
    Comment actions Permalink
  • Avatar
    Jose Arcadio Buendia

    Yes, thank you that ended up working; it just required a ton of memory. 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Glad it ended up working! Thanks for the update.

    0
    Comment actions Permalink
  • Avatar
    chirag lakhani

    I'm having a similar issue.  I am trying to use GATK LiftOver on a WGS VCF.  I am using GATK 4.1.8.1 where I have requested a VM with 100 GB of RAM.  I have set my Java memory to 80GB but I still get a java heapspace error.  The file is only 15GB in size.

    gatk --java-options "-Xmx80G" LiftoverVcf --INPUT xxx.vcf.gz --OUTPUT yyy.vcf.gz --REJECT zzz.vcf.gz --CHAIN $chain --REFERENCE_SEQUENCE $fasta

     

    Using GATK jar /nfs/sw/gatk/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar

    Running:

      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80G -jar /nfs/sw/gatk/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar LiftoverVcf --INPUT xxxx/xxxx.vcf.gz --OUTPUT xxxx/xxxx.vcf.gz --REJECT xxxx/xxxxx.chr1.vcf.gz --CHAIN /gpfs/commons/groups/xxxxx/data/xxxxx/xxxxx/liftover/hg38ToHg19.over.chain.gz --REFERENCE_SEQUENCE /gpfs/commons/groups/xxxxx/data/xxxx/xxxxx/resources/hg19.fa

    21:13:06.689 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/nfs/sw/gatk/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

    [Wed Dec 08 21:13:06 EST 2021] LiftoverVcf --INPUT filtered_data_biallelic/ADSP_annotated_chr1.vcf.gz --OUTPUT filtered_data_biallelic_hg37/ADSP_annotated_hg37_chr1.vcf.gz --CHAIN /gpfs/commons/groups/xxxx/data/yyyy/zzzz/liftover/hg38ToHg19.over.chain.gz --REJECT xxxx/zzzz --REFERENCE_SEQUENCE /gpfs/commons/groups/xxxxx/data/yyyy/zzzzz/zzzzz/hg19.fa --WARN_ON_MISSING_CONTIG false --LOG_FAILED_INTERVALS true --WRITE_ORIGINAL_POSITION false --WRITE_ORIGINAL_ALLELES false --LIFTOVER_MIN_MATCH 1.0 --ALLOW_MISSING_FIELDS_IN_HEADER false --RECOVER_SWAPPED_REF_ALT false --TAGS_TO_REVERSE AF --TAGS_TO_DROP MAX_AF --DISABLE_SORT false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

    Dec 08, 2021 9:13:07 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    [Wed Dec 08 21:13:07 EST 2021] Executing as xxxx@xxxx.org on Linux 3.10.0-1062.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.1

    INFO    2021-12-08 21:13:08     LiftoverVcf     Loading up the target reference genome.

    INFO    2021-12-08 21:13:22     LiftoverVcf     Lifting variants over and sorting (not yet writing the output file.)

    INFO    2021-12-08 21:15:47     LiftOver        Interval chr1:180992-180996 failed to match chain 2410 because intersection length 4 < minMatchSize 5.0 (0.8 < 1.0)

    INFO    2021-12-08 21:15:47     LiftOver        Interval chr1:180992-180996 failed to match chain 2576 because intersection length 1 < minMatchSize 5.0 (0.2 < 1.0)

    INFO    2021-12-08 21:15:48     LiftOver        Interval chr1:181768-181824 failed to match chain 2410 because intersection length 30 < minMatchSize 57.0 (0.5263158 < 1.0)

    INFO    2021-12-08 21:15:48     LiftOver        Interval chr1:181768-181824 failed to match chain 1926 because intersection length 27 < minMatchSize 57.0 (0.47368422 < 1.0)

    INFO    2021-12-08 21:19:12     LiftOver        Interval chr1:183291-183300 failed to match chain 1926 because intersection length 2 < minMatchSize 10.0 (0.2 < 1.0)

    INFO    2021-12-08 21:19:12     LiftOver        Interval chr1:183381-183401 failed to match chain 1926 because intersection length 14 < minMatchSize 21.0 (0.6666667 < 1.0)

    INFO    2021-12-08 21:19:13     LiftOver        Interval chr1:184426-184429 failed to match chain 2410 because intersection length 1 < minMatchSize 4.0 (0.25 < 1.0)

    INFO    2021-12-08 21:19:13     LiftOver        Interval chr1:184426-184429 failed to match chain 1926 because intersection length 3 < minMatchSize 4.0 (0.75 < 1.0)

    INFO    2021-12-08 21:19:14     LiftOver        Interval chr1:185455-185472 failed to match chain 1926 because intersection length 14 < minMatchSize 18.0 (0.7777778 < 1.0)

    INFO    2021-12-08 21:19:44     LiftOver        Interval chr1:196676-196687 failed to match chain 1926 because intersection length 11 < minMatchSize 12.0 (0.9166667 < 1.0)

    INFO    2021-12-08 21:19:47     LiftOver        Interval chr1:198559-198563 failed to match chain 1926 because intersection length 1 < minMatchSize 5.0 (0.2 < 1.0)

    INFO    2021-12-08 21:19:47     LiftOver        Interval chr1:198892-198895 failed to match chain 1926 because intersection length 3 < minMatchSize 4.0 (0.75 < 1.0)

    INFO    2021-12-08 21:19:48     LiftOver        Interval chr1:201410-201423 failed to match chain 1926 because intersection length 8 < minMatchSize 14.0 (0.5714286 < 1.0)

    INFO    2021-12-08 21:21:08     LiftOver        Interval chr1:455118-455119 failed to match chain 381 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)

    INFO    2021-12-08 21:21:16     LiftOver        Interval chr1:494561-494593 failed to match chain 381 because intersection length 28 < minMatchSize 33.0 (0.8484849 < 1.0)

    INFO    2021-12-08 21:23:40     LiftOver        Interval chr1:533511-533521 failed to match chain 706 because intersection length 2 < minMatchSize 11.0 (0.18181819 < 1.0)

    INFO    2021-12-08 21:23:40     LiftOver        Interval chr1:533511-533521 failed to match chain 829 because intersection length 9 < minMatchSize 11.0 (0.8181818 < 1.0)

    [Wed Dec 08 21:32:43 EST 2021] picard.vcf.LiftoverVcf done. Elapsed time: 19.61 minutes.

    Runtime.totalMemory()=76355207168

    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

            at java.util.Arrays.copyOf(Arrays.java:3332)

            at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)

            at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)

            at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)

            at java.lang.StringBuilder.append(StringBuilder.java:136)

            at htsjdk.tribble.util.ParsingUtils.split(ParsingUtils.java:266)

            at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:375)

            at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)

            at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)

            at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:173)

            at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)

            at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)

            at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:411)

            at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)

            at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)

            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

            at org.broadinstitute.hellbender.Main.main(Main.java:289)
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Jose Arcadio Buendia how much memory did you need to end up giving the command?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi chirag lakhani! There is one other common argument you can try to fix your Java heap space error, which is to directly specify a temporary directory. The LiftoverVcf command could be using a temporary file that does not have enough room for the temporary files. You can specify the temporary directory with the option --TMP_DIR. 

    I hope this helps!

    0
    Comment actions Permalink
  • Avatar
    chirag lakhani

    Thanks, I will try this out and let you know.

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    This is a different error message, it changed from Java heap space to GC overhead limit exceeded. So I think you should keep the temp directory option. Did you still run this with the Xmx option? How much space memory and disk space is available where you are running this job? How many samples are in this VCF and are there a lot of alternate alleles at some loci?

    0
    Comment actions Permalink
  • Avatar
    chirag lakhani

    Sorry, I deleted the message from before.  I modified the script by adding the --TMP_DIR as well as create_index true which is how it failed.  I am re-running it now without the create_index flag and can let you know if it still fails. I am trying to run liftover on chromosome 1 of the gnomad 3.1 VCF.  Have you guys tried to run liftover on the gnomad data?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk