Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

LiftoverVcf fails to lift over a GVCF file. Error java.lang.ArrayIndexOutOfBoundsException: -1

0

10 comments

  • Avatar
    Bhanu Gandham

    Hi,

     

    Can you please post the entire error log. Also where did you get the chain file from?

    0
    Comment actions Permalink
  • Avatar
    jorgez

    Hello Bhanu,

    I'm afraid I deleted the full logs but will re run and get back to you.

     

    The chain file comes from UCSC:

    http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz

     

    Thanks so much,

    Jorge

    0
    Comment actions Permalink
  • Avatar
    jorgez

    Hello again Bhanu,

    I was missing get you the full log. There it is.

    Thanks so much 

    Jorge

     

     

    Using GATK jar /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Core/gatk/4.1.6.0/gatk-package-4.1.6.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Core/gatk/4.1.6.0/gatk-package-4.1.6.0-local.jar LiftoverVcf --INPUT=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/SAME/HG01679.alt_bwamem_GRCh38DH.20150826.IBS.exome.cram.g.vcf.gz --OUTPUT=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/5000_NEUROMEGEN-PE/lifted_over.vcf --CHAIN=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/LIFTOVER/hg38ToHg19.over.chain --REJECT=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/5000_NEUROMEGEN-PE/rejected_variants.vcf --REFERENCE_SEQUENCE=/mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/hg19.fa

    12:19:09.529 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Core/gatk/4.1.6.0/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

    [Wed Jun 03 12:19:09 CEST 2020] LiftoverVcf  --INPUT /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/SAME/HG01679.alt_bwamem_GRCh38DH.20150826.IBS.exome.cram.g.vcf.gz --OUTPUT /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/5000_NEUROMEGEN-PE/lifted_over.vcf --CHAIN /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/LIFTOVER/hg38ToHg19.over.chain --REJECT /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/5000_NEUROMEGEN-PE/rejected_variants.vcf --REFERENCE_SEQUENCE /mnt/lustre/scratch/home/usc/mg/jzb/NORMALIZATION/hg19.fa  --WARN_ON_MISSING_CONTIG false --LOG_FAILED_INTERVALS true --WRITE_ORIGINAL_POSITION false --WRITE_ORIGINAL_ALLELES false --LIFTOVER_MIN_MATCH 1.0 --ALLOW_MISSING_FIELDS_IN_HEADER false --RECOVER_SWAPPED_REF_ALT false --TAGS_TO_REVERSE AF --TAGS_TO_DROP MAX_AF --DISABLE_SORT false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

    Jun 03, 2020 12:19:10 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    [Wed Jun 03 12:19:10 CEST 2020] Executing as uscmgjzb@c6606 on Linux 3.10.0-862.14.4.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.6.0

    INFO 2020-06-03 12:19:12 LiftoverVcf Loading up the target reference genome.

    INFO 2020-06-03 12:19:29 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)

    INFO 2020-06-03 12:19:29 LiftOver Interval chr1:1-10365 failed to match chain 2 because intersection length 365 < minMatchSize 10365.0 (0.035214666 < 1.0)

    INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 977 because intersection length 38 < minMatchSize 4592.0 (0.008275261 < 1.0)

    INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 2410 because intersection length 49 < minMatchSize 4592.0 (0.010670732 < 1.0)

    INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 2 because intersection length 1086 < minMatchSize 4592.0 (0.23649825 < 1.0)

    INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 23749811 because intersection length 63 < minMatchSize 4592.0 (0.013719512 < 1.0)

    INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 2576 because intersection length 29 < minMatchSize 4592.0 (0.006315331 < 1.0)

    INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 816 because intersection length 3296 < minMatchSize 4592.0 (0.71777004 < 1.0)

    INFO 2020-06-03 12:19:30 LiftOver Interval chr1:176332-180923 failed to match chain 7666873 because intersection length 26 < minMatchSize 4592.0 (0.0056620208 < 1.0)

    INFO 2020-06-03 12:19:30 LiftOver Interval chr1:180924-180949 failed to match chain 2410 because intersection length 11 < minMatchSize 26.0 (0.42307693 < 1.0)

    INFO 2020-06-03 12:19:30 LiftOver Interval chr1:180924-180949 failed to match chain 2576 because intersection length 15 < minMatchSize 26.0 (0.5769231 < 1.0)

    [Wed Jun 03 12:19:30 CEST 2020] picard.vcf.LiftoverVcf done. Elapsed time: 0.35 minutes.

    Runtime.totalMemory()=4290260992

    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

    java.lang.ArrayIndexOutOfBoundsException: -1

    at picard.util.LiftoverUtils.lambda$leftAlignVariant$4(LiftoverUtils.java:379)

    at java.util.stream.Collectors.lambda$groupingBy$45(Collectors.java:907)

    at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)

    at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1625)

    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)

    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)

    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)

    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)

    at picard.util.LiftoverUtils.leftAlignVariant(LiftoverUtils.java:379)

    at picard.util.LiftoverUtils.reverseComplementVariantContext(LiftoverUtils.java:178)

    at picard.util.LiftoverUtils.liftVariant(LiftoverUtils.java:76)

    at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:426)

    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)

    at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)

    at org.broadinstitute.hellbender.Main.main(Main.java:292)

    slurmstepd: error: Exceeded step memory limit at some point.

     

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi jorgez

    This happens when the contig names are not compatible between the dictionary and the chain files

    1. Can you please confirm that the contig names are compatible between dict file and chain file?
    2. Can you please share your sequence dictionary file?
    0
    Comment actions Permalink
  • Avatar
    jorgez

    Hello Bhanu,

    I have shared my dict file here:
    https://gist.github.com/jazberna1/4b4ca9cba42753c5ca884687574b2e05#file-hg19-dict

    These are the contig names in the dict file:
    https://gist.github.com/jazberna1/4b4ca9cba42753c5ca884687574b2e05#file-contig_names_in_dict_file-txt

    These are the contig names in the chain file:
    https://gist.github.com/jazberna1/4b4ca9cba42753c5ca884687574b2e05#file-contig_names_in_chain_file-txt

    I see there are 395 contins in the chain file not present in the dict file by doing
    diff contig_names_in_chain_file.txt contig_names_in_dict_file.txt |grep '<' | wc -l


    I also see there 19 contigs in the dict file not present in the chain file by doing
    diff contig_names_in_chain_file.txt contig_names_in_dict_file.txt |grep '>' | wc -l

    Is it possible those 19 contigs listed below are causing the issue?
    chr17_ctg5_hap1
    chr17_gl000206_random
    chr21_gl000210_random
    chr4_ctg9_hap1
    chr6_apd_hap1
    chr6_cox_hap2
    chr6_dbb_hap3
    chr6_mann_hap4
    chr6_mcf_hap5
    chr6_qbl_hap6
    chr6_ssto_hap7
    chr8_gl000197_random
    chr9_gl000201_random
    chrUn_gl000223
    chrUn_gl000227
    chrUn_gl000238
    chrUn_gl000242
    chrUn_gl000248
    chrUn_gl000249

    Many thanks
    Jorge

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    HI jorgez

     

    Yeah we suspect that the reason you are seeing this error is because of the 19 contigs in the dict file not present in the chain file.Variants in our vcf may be from contigs not be present in the chain file. The chain fle is not compatible with the input and target refs.

    There are lots of versions of hg19 s out there. We created a doc explaining the common ones in this doc: https://gatk.broadinstitute.org/hc/en-us/articles/360035890711-GRCh37-hg19-b37-humanG1Kv37-Human-Reference-Discrepancies This doc should give you a better understanding of why the difference in the version of hg19 cause such error. Here is a resource bundle with a few chain files we provide but I am not sure if its helpful in your case: https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle

     

    Sorry if this is not very helpful but the incompatible chain and ref files issues are difficult to solve. 

     

    0
    Comment actions Permalink
  • Avatar
    jorgez

    Hello Bhanu,

    I am afraid the problem persists even lifting over to a reference conatining only autosomes, X, Y and M like this:

    chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM

    I also tried from a fresh download of the hg19 reference fasta file:

    ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz

    Again the same error occured.

    Is there the possibility that the error is related to the g.vcf file? If so, do you happen to know some sort of tool that checks for inconsistencies in g.vcf files?

    Many thanks for you help

    Jorge

     

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    hI jorgez

     

    Did you rerun with a new chain file or with the old? With a new version of hg19 you would need the corresponding chain file too. 

    0
    Comment actions Permalink
  • Avatar
    jorgez

    Hello Bhanu,

    This is my only one chain file:

    http://hgdownload.cse.ucsc.edu/goldenpath/hg38/liftOver/hg38ToHg19.over.chain.gz

    Jorge

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi jorgez

     

    Ah that is the issue. You need to have a chain file that is built specifically for original and target assemblies. I think that's why you are facing this issue. This is not a GATK issue though. But take a look at this link: https://groups.google.com/a/soe.ucsc.edu/g/genome/c/LCNw5ADFuJk

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk