GATK Liftover memory issues on cohort vcf
AnsweredDear GATK team,
I've been attempting to LiftOver a chromosome from the 1000 genomes cohort going from hg38 to hg37. However, I keep running into memory errors.
Do I just need to throw more heap memory at the problem? The file size is ~ 10 gb
Picked up JAVA_TOOL_OPTIONS: -XX:ParallelGCThreads=1 -Djava.io.tmpdir=/igm/temp
11:36:47.793 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Aug 24 11:36:49 EDT 2020] LiftoverVcf --INPUT new.vcf --OUTPUT b37lo.vcf --CHAIN /home/Hg38Tob37.over.chain --REJECT rejected.vcf --WRITE_ORIGINAL_POSITION true --TMP_DIR tmp21 --REFERENCE_SEQUENCE human_g1k_v37_decoy.fasta --WARN_ON_MISSING_CONTIG false --LOG_FAILED_INTERVALS true --WRITE_ORIGINAL_ALLELES false --LIFTOVER_MIN_MATCH 1.0 --ALLOW_MISSING_FIELDS_IN_HEADER false --RECOVER_SWAPPED_REF_ALT false --TAGS_TO_REVERSE AF --TAGS_TO_DROP MAX_AF --DISABLE_SORT false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Aug 24, 2020 11:36:50 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Mon Aug 24 11:36:50 EDT 2020] Executing as XXXXX@XX-XXXX on Linux 3.10.0-1127.13.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_242-b08; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.7.0
INFO 2020-08-24 11:36:51 LiftoverVcf Loading up the target reference genome.
INFO 2020-08-24 11:37:05 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
[Mon Aug 24 11:55:40 EDT 2020] picard.vcf.LiftoverVcf done. Elapsed time: 18.89 minutes.
Runtime.totalMemory()=12168724480
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:332)
at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:356)
at htsjdk.tribble.readers.SynchronousLineReader.readLine(SynchronousLineReader.java:51)
at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:24)
at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:11)
at htsjdk.samtools.util.AbstractIterator.next(AbstractIterator.java:57)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:373)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:354)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:315)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:389)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
-
Hi Jose Arcadio Buendia, please include your command, GATK version, and entire stack trace for troubleshooting.
-
gatk LiftoverVcf --TMP_DIR tmp/ -I test.vcf -O out.vcf --CHAIN /igm/home/liftover/Hg38Tob37.over.chain --REJECT out.rejected.vcf -R /igm/apps/genomes/Homo_sapiens/human_g1k_v37_decoy/human_g1k_v37_decoy.fasta --WRITE_ORIGINAL_POSITION TRUE
-
Hi Jose Arcadio Buendia, you can set memory allocation using the java option -Xmx, more info can be found here: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax
-
Yes, thank you that ended up working; it just required a ton of memory.
-
Glad it ended up working! Thanks for the update.
-
I'm having a similar issue. I am trying to use GATK LiftOver on a WGS VCF. I am using GATK 4.1.8.1 where I have requested a VM with 100 GB of RAM. I have set my Java memory to 80GB but I still get a java heapspace error. The file is only 15GB in size.
gatk --java-options "-Xmx80G" LiftoverVcf --INPUT xxx.vcf.gz --OUTPUT yyy.vcf.gz --REJECT zzz.vcf.gz --CHAIN $chain --REFERENCE_SEQUENCE $fasta
Using GATK jar /nfs/sw/gatk/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80G -jar /nfs/sw/gatk/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar LiftoverVcf --INPUT xxxx/xxxx.vcf.gz --OUTPUT xxxx/xxxx.vcf.gz --REJECT xxxx/xxxxx.chr1.vcf.gz --CHAIN /gpfs/commons/groups/xxxxx/data/xxxxx/xxxxx/liftover/hg38ToHg19.over.chain.gz --REFERENCE_SEQUENCE /gpfs/commons/groups/xxxxx/data/xxxx/xxxxx/resources/hg19.fa
21:13:06.689 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/nfs/sw/gatk/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Dec 08 21:13:06 EST 2021] LiftoverVcf --INPUT filtered_data_biallelic/ADSP_annotated_chr1.vcf.gz --OUTPUT filtered_data_biallelic_hg37/ADSP_annotated_hg37_chr1.vcf.gz --CHAIN /gpfs/commons/groups/xxxx/data/yyyy/zzzz/liftover/hg38ToHg19.over.chain.gz --REJECT xxxx/zzzz --REFERENCE_SEQUENCE /gpfs/commons/groups/xxxxx/data/yyyy/zzzzz/zzzzz/hg19.fa --WARN_ON_MISSING_CONTIG false --LOG_FAILED_INTERVALS true --WRITE_ORIGINAL_POSITION false --WRITE_ORIGINAL_ALLELES false --LIFTOVER_MIN_MATCH 1.0 --ALLOW_MISSING_FIELDS_IN_HEADER false --RECOVER_SWAPPED_REF_ALT false --TAGS_TO_REVERSE AF --TAGS_TO_DROP MAX_AF --DISABLE_SORT false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Dec 08, 2021 9:13:07 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Wed Dec 08 21:13:07 EST 2021] Executing as xxxx@xxxx.org on Linux 3.10.0-1062.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.1
INFO 2021-12-08 21:13:08 LiftoverVcf Loading up the target reference genome.
INFO 2021-12-08 21:13:22 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
INFO 2021-12-08 21:15:47 LiftOver Interval chr1:180992-180996 failed to match chain 2410 because intersection length 4 < minMatchSize 5.0 (0.8 < 1.0)
INFO 2021-12-08 21:15:47 LiftOver Interval chr1:180992-180996 failed to match chain 2576 because intersection length 1 < minMatchSize 5.0 (0.2 < 1.0)
INFO 2021-12-08 21:15:48 LiftOver Interval chr1:181768-181824 failed to match chain 2410 because intersection length 30 < minMatchSize 57.0 (0.5263158 < 1.0)
INFO 2021-12-08 21:15:48 LiftOver Interval chr1:181768-181824 failed to match chain 1926 because intersection length 27 < minMatchSize 57.0 (0.47368422 < 1.0)
INFO 2021-12-08 21:19:12 LiftOver Interval chr1:183291-183300 failed to match chain 1926 because intersection length 2 < minMatchSize 10.0 (0.2 < 1.0)
INFO 2021-12-08 21:19:12 LiftOver Interval chr1:183381-183401 failed to match chain 1926 because intersection length 14 < minMatchSize 21.0 (0.6666667 < 1.0)
INFO 2021-12-08 21:19:13 LiftOver Interval chr1:184426-184429 failed to match chain 2410 because intersection length 1 < minMatchSize 4.0 (0.25 < 1.0)
INFO 2021-12-08 21:19:13 LiftOver Interval chr1:184426-184429 failed to match chain 1926 because intersection length 3 < minMatchSize 4.0 (0.75 < 1.0)
INFO 2021-12-08 21:19:14 LiftOver Interval chr1:185455-185472 failed to match chain 1926 because intersection length 14 < minMatchSize 18.0 (0.7777778 < 1.0)
INFO 2021-12-08 21:19:44 LiftOver Interval chr1:196676-196687 failed to match chain 1926 because intersection length 11 < minMatchSize 12.0 (0.9166667 < 1.0)
INFO 2021-12-08 21:19:47 LiftOver Interval chr1:198559-198563 failed to match chain 1926 because intersection length 1 < minMatchSize 5.0 (0.2 < 1.0)
INFO 2021-12-08 21:19:47 LiftOver Interval chr1:198892-198895 failed to match chain 1926 because intersection length 3 < minMatchSize 4.0 (0.75 < 1.0)
INFO 2021-12-08 21:19:48 LiftOver Interval chr1:201410-201423 failed to match chain 1926 because intersection length 8 < minMatchSize 14.0 (0.5714286 < 1.0)
INFO 2021-12-08 21:21:08 LiftOver Interval chr1:455118-455119 failed to match chain 381 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)
INFO 2021-12-08 21:21:16 LiftOver Interval chr1:494561-494593 failed to match chain 381 because intersection length 28 < minMatchSize 33.0 (0.8484849 < 1.0)
INFO 2021-12-08 21:23:40 LiftOver Interval chr1:533511-533521 failed to match chain 706 because intersection length 2 < minMatchSize 11.0 (0.18181819 < 1.0)
INFO 2021-12-08 21:23:40 LiftOver Interval chr1:533511-533521 failed to match chain 829 because intersection length 9 < minMatchSize 11.0 (0.8181818 < 1.0)
[Wed Dec 08 21:32:43 EST 2021] picard.vcf.LiftoverVcf done. Elapsed time: 19.61 minutes.
Runtime.totalMemory()=76355207168
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelpException in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at htsjdk.tribble.util.ParsingUtils.split(ParsingUtils.java:266)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:375)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:173)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:411)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289) -
Jose Arcadio Buendia how much memory did you need to end up giving the command?
-
Hi chirag lakhani! There is one other common argument you can try to fix your Java heap space error, which is to directly specify a temporary directory. The LiftoverVcf command could be using a temporary file that does not have enough room for the temporary files. You can specify the temporary directory with the option --TMP_DIR.
I hope this helps!
-
Thanks, I will try this out and let you know.
-
This is a different error message, it changed from Java heap space to GC overhead limit exceeded. So I think you should keep the temp directory option. Did you still run this with the Xmx option? How much space memory and disk space is available where you are running this job? How many samples are in this VCF and are there a lot of alternate alleles at some loci?
-
Sorry, I deleted the message from before. I modified the script by adding the --TMP_DIR as well as create_index true which is how it failed. I am re-running it now without the create_index flag and can let you know if it still fails. I am trying to run liftover on chromosome 1 of the gnomad 3.1 VCF. Have you guys tried to run liftover on the gnomad data?
Please sign in to leave a comment.
11 comments