LiftoverVcf fails to lift over a GVCF file. Error java.lang.ArrayIndexOutOfBoundsException: -1
Hello,
LiftoverVcf gives me this error java.lang.ArrayIndexOutOfBoundsException: -1 when trying to lift over a gvcf file from hg38 to hg19. I have validated the input vcf file with ValidateVariants and checked that the reference fasta file is accompained with .fai and .dict
I have provided below the requested details.
Any advice will be much appreciated.
Thanks,
Jorge
a) GATK version used:
4.1.6.0
b) Exact GATK commands used
gatk LiftoverVcf \
--INPUT=HG01679.alt_bwamem_GRCh38DH.20150826.IBS.exome.cram.g.vcf.gz \
--OUTPUT=lifted_over.vcf \
--CHAIN=hg38ToHg19.over.chain.gz \
--REJECT=rejected_variants.vcf \
--REFERENCE_SEQUENCE=hg19.fa
c) The entire error log if applicable.
java.lang.ArrayIndexOutOfBoundsException: -1
at picard.util.LiftoverUtils.lambda$leftAlignVariant$4(LiftoverUtils.java:379)
at java.util.stream.Collectors.lambda$groupingBy$45(Collectors.java:907)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1625)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at picard.util.LiftoverUtils.leftAlignVariant(LiftoverUtils.java:379)
at picard.util.LiftoverUtils.reverseComplementVariantContext(LiftoverUtils.java:178)
at picard.util.LiftoverUtils.liftVariant(LiftoverUtils.java:76)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:426)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
-
Hi,
Can you please post the entire error log. Also where did you get the chain file from?
-
Hello Bhanu,
I'm afraid I deleted the full logs but will re run and get back to you.
The chain file comes from UCSC:
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz
Thanks so much,
Jorge
-
Hello again Bhanu,
I was missing get you the full log. There it is.
Thanks so much
Jorge
Using GATK jar /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Core/gatk/4.1.6.0/gatk-package-4.1.6.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Core/gatk/4.1.6.0/gatk-package-4.1.6.0-local.jar LiftoverVcf --INPUT=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/SAME/HG01679.alt_bwamem_GRCh38DH.20150826.IBS.exome.cram.g.vcf.gz --OUTPUT=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/5000_NEUROMEGEN-PE/lifted_over.vcf --CHAIN=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/LIFTOVER/hg38ToHg19.over.chain --REJECT=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/5000_NEUROMEGEN-PE/rejected_variants.vcf --REFERENCE_SEQUENCE=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/hg19.fa
12:19:09.529 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Core/gatk/4.1.6.0/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Jun 03 12:19:09 CEST 2020] LiftoverVcf --INPUT /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/SAME/HG01679.alt_bwamem_GRCh38DH.20150826.IBS.exome.cram.g.vcf.gz --OUTPUT /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/5000_NEUROMEGEN-PE/lifted_over.vcf --CHAIN /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/LIFTOVER/hg38ToHg19.over.chain --REJECT /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/5000_NEUROMEGEN-PE/rejected_variants.vcf --REFERENCE_SEQUENCE /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/hg19.fa --WARN_ON_MISSING_CONTIG false --LOG_FAILED_INTERVALS true --WRITE_ORIGINAL_POSITION false --WRITE_ORIGINAL_ALLELES false --LIFTOVER_MIN_MATCH 1.0 --ALLOW_MISSING_FIELDS_IN_HEADER false --RECOVER_SWAPPED_REF_ALT false --TAGS_TO_REVERSE AF --TAGS_TO_DROP MAX_AF --DISABLE_SORT false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Jun 03, 2020 12:19:10 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Wed Jun 03 12:19:10 CEST 2020] Executing as uscmgjzb@c6606 on Linux 3.10.0-862.14.4.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.6.0
INFO 2020-06-03 12:19:12 LiftoverVcf Loading up the target reference genome.
INFO 2020-06-03 12:19:29 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
INFO 2020-06-03 12:19:29 LiftOver Interval chr1:1-10365 failed to match chain 2 because intersection length 365 < minMatchSize 10365.0 (0.035214666 < 1.0)
INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 977 because intersection length 38 < minMatchSize 4592.0 (0.008275261 < 1.0)
INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 2410 because intersection length 49 < minMatchSize 4592.0 (0.010670732 < 1.0)
INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 2 because intersection length 1086 < minMatchSize 4592.0 (0.23649825 < 1.0)
INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 23749811 because intersection length 63 < minMatchSize 4592.0 (0.013719512 < 1.0)
INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 2576 because intersection length 29 < minMatchSize 4592.0 (0.006315331 < 1.0)
INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 816 because intersection length 3296 < minMatchSize 4592.0 (0.71777004 < 1.0)
INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 7666873 because intersection length 26 < minMatchSize 4592.0 (0.0056620208 < 1.0)
INFO 2020-06-03 12:19:30 LiftOver Interval chr1:180924-180949 failed to match chain 2410 because intersection length 11 < minMatchSize 26.0 (0.42307693 < 1.0)
INFO 2020-06-03 12:19:30 LiftOver Interval chr1:180924-180949 failed to match chain 2576 because intersection length 15 < minMatchSize 26.0 (0.5769231 < 1.0)
[Wed Jun 03 12:19:30 CEST 2020] picard.vcf.LiftoverVcf done. Elapsed time: 0.35 minutes.
Runtime.totalMemory()=4290260992
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.ArrayIndexOutOfBoundsException: -1
at picard.util.LiftoverUtils.lambda$leftAlignVariant$4(LiftoverUtils.java:379)
at java.util.stream.Collectors.lambda$groupingBy$45(Collectors.java:907)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1625)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at picard.util.LiftoverUtils.leftAlignVariant(LiftoverUtils.java:379)
at picard.util.LiftoverUtils.reverseComplementVariantContext(LiftoverUtils.java:178)
at picard.util.LiftoverUtils.liftVariant(LiftoverUtils.java:76)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:426)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
slurmstepd: error: Exceeded step memory limit at some point.
-
Hi jorgez
This happens when the contig names are not compatible between the dictionary and the chain files
- Can you please confirm that the contig names are compatible between dict file and chain file?
- Can you please share your sequence dictionary file?
-
Hello Bhanu,
I have shared my dict file here:
https://gist.github.com/jazberna1/4b4ca9cba42753c5ca884687574b2e05#file-hg19-dictThese are the contig names in the dict file:
https://gist.github.com/jazberna1/4b4ca9cba42753c5ca884687574b2e05#file-contig_names_in_dict_file-txtThese are the contig names in the chain file:
https://gist.github.com/jazberna1/4b4ca9cba42753c5ca884687574b2e05#file-contig_names_in_chain_file-txtI see there are 395 contins in the chain file not present in the dict file by doing
diff contig_names_in_chain_file.txt contig_names_in_dict_file.txt |grep '<' | wc -l
I also see there 19 contigs in the dict file not present in the chain file by doing
diff contig_names_in_chain_file.txt contig_names_in_dict_file.txt |grep '>' | wc -lIs it possible those 19 contigs listed below are causing the issue?
chr17_ctg5_hap1
chr17_gl000206_random
chr21_gl000210_random
chr4_ctg9_hap1
chr6_apd_hap1
chr6_cox_hap2
chr6_dbb_hap3
chr6_mann_hap4
chr6_mcf_hap5
chr6_qbl_hap6
chr6_ssto_hap7
chr8_gl000197_random
chr9_gl000201_random
chrUn_gl000223
chrUn_gl000227
chrUn_gl000238
chrUn_gl000242
chrUn_gl000248
chrUn_gl000249Many thanks
Jorge -
HI jorgez
Yeah we suspect that the reason you are seeing this error is because of the 19 contigs in the dict file not present in the chain file.Variants in our vcf may be from contigs not be present in the chain file. The chain fle is not compatible with the input and target refs.
There are lots of versions of hg19 s out there. We created a doc explaining the common ones in this doc: https://gatk.broadinstitute.org/hc/en-us/articles/360035890711-GRCh37-hg19-b37-humanG1Kv37-Human-Reference-Discrepancies This doc should give you a better understanding of why the difference in the version of hg19 cause such error. Here is a resource bundle with a few chain files we provide but I am not sure if its helpful in your case: https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle
Sorry if this is not very helpful but the incompatible chain and ref files issues are difficult to solve.
-
Hello Bhanu,
I am afraid the problem persists even lifting over to a reference conatining only autosomes, X, Y and M like this:
chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM
I also tried from a fresh download of the hg19 reference fasta file:
ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz
Again the same error occured.
Is there the possibility that the error is related to the g.vcf file? If so, do you happen to know some sort of tool that checks for inconsistencies in g.vcf files?
Many thanks for you help
Jorge
-
hI jorgez
Did you rerun with a new chain file or with the old? With a new version of hg19 you would need the corresponding chain file too.
-
Hello Bhanu,
This is my only one chain file:
http://hgdownload.cse.ucsc.edu/goldenpath/hg38/liftOver/hg38ToHg19.over.chain.gz
Jorge
-
Hi jorgez
Ah that is the issue. You need to have a chain file that is built specifically for original and target assemblies. I think that's why you are facing this issue. This is not a GATK issue though. But take a look at this link: https://groups.google.com/a/soe.ucsc.edu/g/genome/c/LCNw5ADFuJk
Please sign in to leave a comment.
10 comments