java.lang.ArrayIndexOutOfBoundsException: 32772 while running GenotypeGVCFs
AnsweredIf you are seeing an error, please provide(REQUIRED) :
a) GATK version used:
b) Exact command used:
c) Entire error log:
If not an error, choose a category for your question(REQUIRED):
a)How do I (......)?
b) What does (......) mean?
c) Why do I see (......)?
d) Where do I find (......)?
e) Will (......) be in future releases?
i don't get an error but the massge java.lang.ArrayIndexOutOfBoundsException: 32772
i use : GATK version used: 4.2.0
b) Exact command used this line for Geomics DBImport
gatk GenomicsDBImport -V MA1.g.vcf -V MA2.g.vcf -V MA3.g.vcf -V MH1.g.vcf -V MH2.g.vcf -V MH3.g.vcf -V F4_1.g.vcf -V F4_2.g.vcf -V F4_3.g.vcf --genomicsdb-workspace-path my_database1AB -L 1A -L 1B -L 2A -L 2B -L 3A -L 3B -L 4A -L 4B -L 5A -L 5B -L 6A -L 6B -L 7A -L 7B
and this for GenotypeGVCFs
gatk --java-options "-Xmx12g -Xms12g" GenotypeGVCFs -R Triticum_dicoccoides.WEWSeq_v.1.0.dna.toplevel.fa -V gendb://my_database -O output.vcf.gz --new-qual --tmp-dir temp/
c) Entire error log:
Using GATK jar /home/alonzi/miniconda3/envs/rna-seq/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx12g -Xms12g -jar /home/alonzi/miniconda3/envs/rna-seq/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar GenotypeGVCFs -R Triticum_dicoccoides.WEWSeq_v.1.0.dna.toplevel.fa -V gendb://my_database -O output.vcf.gz --new-qual --tmp-dir temp/
14:28:22.448 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/alonzi/miniconda3/envs/rna-seq/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 07, 2021 2:28:22 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
14:28:22.617 INFO GenotypeGVCFs - ------------------------------------------------------------
14:28:22.618 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.0.0
14:28:22.618 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
14:28:22.618 INFO GenotypeGVCFs - Executing as alonzi@khalil1 on Linux v4.19.0-17-amd64 amd64
14:28:22.618 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
14:28:22.618 INFO GenotypeGVCFs - Start Date/Time: July 7, 2021 2:28:22 PM IDT
14:28:22.618 INFO GenotypeGVCFs - ------------------------------------------------------------
14:28:22.618 INFO GenotypeGVCFs - ------------------------------------------------------------
14:28:22.618 INFO GenotypeGVCFs - HTSJDK Version: 2.24.0
14:28:22.618 INFO GenotypeGVCFs - Picard Version: 2.25.0
14:28:22.618 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
14:28:22.619 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:28:22.619 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:28:22.619 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:28:22.619 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:28:22.619 INFO GenotypeGVCFs - Deflater: IntelDeflater
14:28:22.619 INFO GenotypeGVCFs - Inflater: IntelInflater
14:28:22.619 INFO GenotypeGVCFs - GCS max retries/reopens: 20
14:28:22.619 INFO GenotypeGVCFs - Requester pays: disabled
14:28:22.619 INFO GenotypeGVCFs - Initializing engine
14:28:24.073 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
14:28:24.123 info NativeGenomicsDB - pid=21427 tid=21428 No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
14:28:24.123 info NativeGenomicsDB - pid=21427 tid=21428 No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
14:28:24.123 info NativeGenomicsDB - pid=21427 tid=21428 No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
14:28:24.177 INFO GenotypeGVCFs - Done initializing engine
14:28:24.210 INFO ProgressMeter - Starting traversal
14:28:24.210 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
14:28:47.179 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position 1A:219798 and possibly subsequent; at least 10 samples must have called genotypes
14:28:47.373 INFO ProgressMeter - 1A:568402 0.4 1000 2590.3
14:28:57.413 INFO ProgressMeter - 1A:44165059 0.6 255000 460815.6
14:29:07.419 INFO ProgressMeter - 1A:78552884 0.7 435000 604040.8
14:29:25.201 INFO ProgressMeter - 1A:137636565 1.0 670000 659113.6
14:29:35.211 INFO ProgressMeter - 1A:278089494 1.2 994000 839988.2
14:29:45.226 INFO ProgressMeter - 1A:317697103 1.4 1162000 860570.8
14:30:01.906 INFO ProgressMeter - 1A:363225043 1.6 1347000 827260.1
14:30:12.084 INFO ProgressMeter - 1A:441459399 1.8 1676000 932198.7
14:30:22.093 INFO ProgressMeter - 1A:466677934 2.0 1835000 933976.9
14:30:38.874 INFO ProgressMeter - 1A:495722203 2.2 1996000 889324.5
14:30:48.882 INFO ProgressMeter - 1A:536558193 2.4 2320000 962176.5
14:30:49.143 INFO GenotypeGVCFs - Shutting down engine
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),84.09050524498751,Cpu time(s),59.479603645012425
[July 7, 2021 2:30:49 PM IDT] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 2.45 minutes.
Runtime.totalMemory()=12867076096
java.lang.ArrayIndexOutOfBoundsException: 32772
at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:142)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeIndex(TabixIndexCreator.java:129)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:177)
at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:233)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.closeTool(GenotypeGVCFs.java:295)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1064)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
what am i doing wrong? how can i find what happning?
-
Hi Alon Ziv,
Here is a forum post about a similar issue that might have a helpful workaround for you. This error is most likely occurring due to an error or mismatch in your gvcf files. I would also suggest running ValidateVariants on the files to pinpoint the problem.
Kind regards,
Pamela
-
thanks for the quick replay!!! I will try running, Validate Variants
I saw this forum post you sent but I couldn't make it work right.... so I'll try again,
Thank you!
Alon
-
Hi Pamela Bretscher,
I've tried to run ValidateVariants as follows:
gatk ValidateVariants -R Triticum_dicoccoides.WEWSeq_v.1.0.dna.toplevel.fa -V MA1.g.vcf
and received this error message:
Using GATK jar /home/alonzi/miniconda3/envs/rna-seq/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/alonzi/miniconda3/envs/rna-seq/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar ValidateVariants -R Triticum_dicoccoides.WEWSeq_v.1.0.dna.toplevel.fa -V MA1.g.vcf
14:54:31.346 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/alonzi/miniconda3/envs/rna-seq/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 08, 2021 2:54:31 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
14:54:31.464 INFO ValidateVariants - ------------------------------------------------------------
14:54:31.465 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.0.0
14:54:31.465 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
14:54:31.465 INFO ValidateVariants - Executing as alonzi@khalil1 on Linux v4.19.0-17-amd64 amd64
14:54:31.465 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
14:54:31.465 INFO ValidateVariants - Start Date/Time: July 8, 2021 2:54:31 PM IDT
14:54:31.465 INFO ValidateVariants - ------------------------------------------------------------
14:54:31.465 INFO ValidateVariants - ------------------------------------------------------------
14:54:31.465 INFO ValidateVariants - HTSJDK Version: 2.24.0
14:54:31.465 INFO ValidateVariants - Picard Version: 2.25.0
14:54:31.465 INFO ValidateVariants - Built for Spark Version: 2.4.5
14:54:31.465 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:54:31.465 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:54:31.465 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:54:31.466 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:54:31.466 INFO ValidateVariants - Deflater: IntelDeflater
14:54:31.466 INFO ValidateVariants - Inflater: IntelInflater
14:54:31.466 INFO ValidateVariants - GCS max retries/reopens: 20
14:54:31.466 INFO ValidateVariants - Requester pays: disabled
14:54:31.466 INFO ValidateVariants - Initializing engine
14:54:31.717 INFO FeatureManager - Using codec VCFCodec to read file file:///media/alonzi/DATA/Alon/MA1.g.vcf
14:54:31.896 INFO ValidateVariants - Done initializing engine
14:54:31.896 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
14:54:31.896 WARN ValidateVariants - Other possible validations will still be performed
14:54:31.896 INFO ProgressMeter - Starting traversal
14:54:31.896 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
14:54:32.049 INFO ValidateVariants - Shutting down engine
[July 8, 2021 2:54:32 PM IDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=964689920
***********************************************************************
A USER ERROR has occurred: Input MA1.g.vcf fails strict validation of type ALL: one or more of the ALT allele(s) for the record at position 1A:3456221 are not observed at all in the sample genotypes
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack traceI'm assuming something is wrong with my gvfcs files.... are there any suggestions about what I should do now?
Thanks in advance,
Alon
-
Hi Alon Ziv,
I was able to find a few past forum posts with the same error message, and it seems that you most likely don't need to worry about this error. You can try the --warn-on-errors argument when running ValidateVariants so warnings will be emitted on these errors rather than terminating the job.
Kind regards,
Pamela
-
Hi Alon Ziv,
I apologize, the issue that I referenced in my previous response has actually been resolved already in an earlier version of GATK. If you are still seeing this error, could you please share the site that is causing the issue so we can look further into it?
Kind regards,
Pamela
-
Pamela Bretscher regarding your last massage, should I try doing it with the --warn-on-errors argument? and if it still does not work send you the site that casing the issue?? and also how do I share the site that causing the issue
Thanks again,
Alon
-
Hi Alon Ziv,
No, you do not need to try --warn-on-errors because the issue this addresses has already been solved. Could you share the portion of the VCF file that is causing the error message when you run ValidateVariants?
Kind regards,
Pamela
-
do you mean this picture ?
the first error i get is at the 1A: 3456221 which is the first 'blue' rectangle, indicating a SNP in some of my samples.

-
Hi Alon Ziv,
I showed your original stack trace from your GenotypeGVCFs error to some of the GATK developers and it is possible that the error is occurring due to a reference mismatch between your reference file and your vcf file headers. Could you verify that the headers/contig lengths in your vcf files match your reference file (Triticum_dicoccoides.WEWSeq_v.1.0.dna.toplevel.fa). If everything is compatible, then I can submit this as a GitHub issue for our developers to investigate further.
Kind regards,
Pamela
-
Hi Pamela Bretscher,
these are my reference headers and contig lengths
>1A dna:chromosome chromosome:WEWSeq_v.1.0:1A:1:593586810:1 REF
>1B dna:chromosome chromosome:WEWSeq_v.1.0:1B:1:690537804:1 REF
>2A dna:chromosome chromosome:WEWSeq_v.1.0:2A:1:775183943:1 REF
>2B dna:chromosome chromosome:WEWSeq_v.1.0:2B:1:803365466:1 REF
>3A dna:chromosome chromosome:WEWSeq_v.1.0:3A:1:754274518:1 REF
>3B dna:chromosome chromosome:WEWSeq_v.1.0:3B:1:841096276:1 REF
>4A dna:chromosome chromosome:WEWSeq_v.1.0:4A:1:726427787:1 REF
>4B dna:chromosome chromosome:WEWSeq_v.1.0:4B:1:673896466:1 REF
>5A dna:chromosome chromosome:WEWSeq_v.1.0:5A:1:700855599:1 REF
>5B dna:chromosome chromosome:WEWSeq_v.1.0:5B:1:712180895:1 REF
>6A dna:chromosome chromosome:WEWSeq_v.1.0:6A:1:621432051:1 REF
>6B dna:chromosome chromosome:WEWSeq_v.1.0:6B:1:703217322:1 REF
>7A dna:chromosome chromosome:WEWSeq_v.1.0:7A:1:727576108:1 REF
>7B dna:chromosome chromosome:WEWSeq_v.1.0:7B:1:755408349:1 REFand here is an example from one of my vcf files (i checked all of them)
test of Alt vs. Ref read position bias">
##contig=<ID=1A,length=593586810>
##contig=<ID=1B,length=690537804>
##contig=<ID=2A,length=775183943>
##contig=<ID=2B,length=803365466>
##contig=<ID=3A,length=754274518>
##contig=<ID=3B,length=841096276>
##contig=<ID=4A,length=726427787>
##contig=<ID=4B,length=673896466>
##contig=<ID=5A,length=700855599>
##contig=<ID=5B,length=712180895>
##contig=<ID=6A,length=621432051>
##contig=<ID=6B,length=703217322>
##contig=<ID=7A,length=727576108>
##contig=<ID=7B,length=755408349>
##source=HaplotypeCaller
##bcftools_viewVersion=1.10.2+htslib-1.10.2
##bcftools_viewCommand=view --header-only 1.g.vcf; Date=Tue Jul 13 09:35:11 2021
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT F4_1i don't see any differences is there something i'm missing here?
-
Hi Alon Ziv,
From what I can tell, it looks like everything matches up, so I created a Github ticket so the issue can be investigated further. I'm going to see if there is a workaround you could use in the meantime and will follow up.
https://github.com/broadinstitute/gatk/issues/7348
Kind regards,
Pamela
-
thanks Pamela Bretscher,
i hope we will find a way to solve this or workaround it...
-
Hi Alon Ziv,
Could you please post the lines of your MA1.g.vcf file that include the 1A:3456221 position causing the ValidateVariants error?
Thanks,
Pamela
-
Pamela Bretscher do you mean this?
1A 3456210 . G <NON_REF> . . END=3456210 GT:DP:GQ:MIN_DP:PL 0/0:21:51:21:0,51,765
1A 3456211 . A <NON_REF> . . END=3456212 GT:DP:GQ:MIN_DP:PL 0/0:21:45:21:0,45,675
1A 3456213 . T <NON_REF> . . END=3456214 GT:DP:GQ:MIN_DP:PL 0/0:20:42:20:0,42,630
1A 3456215 . G <NON_REF> . . END=3456217 GT:DP:GQ:MIN_DP:PL 0/0:18:39:18:0,39,585
1A 3456218 . T <NON_REF> . . END=3456220 GT:DP:GQ:MIN_DP:PL 0/0:17:27:17:0,27,405
1A 3456221 . C G,<NON_REF> 592.06 . DP=16;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQandDP=57600,16 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 1|1:0,14,0:14:42:0|1:3456221_C_G:606,42,0,606,42,606:3456221:0,0,6,8
1A 3456222 . G <NON_REF> . . END=3456223 GT:DP:GQ:MIN_DP:PL 0/0:14:27:14:0,27,405
1A 3456224 . A <NON_REF> . . END=3456224 GT:DP:GQ:MIN_DP:PL 0/0:14:24:14:0,24,360
1A 3456225 . G <NON_REF> . . END=3456226 GT:DP:GQ:MIN_DP:PL 0/0:14:18:13:0,18,270 -
Hi Pamela Bretscher, i think i might have manged to solve my problem
i created a BED file directly from the reference genome fasta using
grep "^>" Triticum_dicoccoides.WEWSeq_v.1.0.dna.toplevel.fa > file.bed
and than just edited each line to look like this:
1A 1 593586810
1B 1 690537804
2A 1 775183943
2B 1 803365466
3A 1 754274518
3B 1 841096276
4A 1 726427787
4B 1 673896466
5A 1 700855599
5B 1 712180895
6A 1 621432051
6B 1 703217322
7A 1 727576108
7B 1 755408349i then used GenomicsDBimport and used the BED file for intervals
gatk GenomicsDBImport -V 1.g.vcf -V 2.g.vcf -V 3.g.vcf -V 4.g.vcf -V 5.g.vcf -V 6.g.vcf -V 7.g.vcf -V 8.g.vcf -V 9.g.vcf --genomicsdb-workspace-path my_database1AB -L file.bed
and then used the GenotypeGVCFs command and it worked
gatk --java-options "-Xmx12g -Xms12g" GenotypeGVCFs -R Triticum_dicoccoides.WEWSeq_v.1.0.dna.toplevel.fa -V gendb://my_database1AB -O global.vcf --new-qua
-
Hi Alon Ziv,
I'm glad you were able to find a workaround and thank you for posting your solution here for other researchers who may have a similar problem!
Please let me know if you need anything else or have additional questions.
Kind regards,
Pamela
-
Hi Pamela Bretscher,
i don't have any additional questions at the moment
and again,
thank you for your help!!!
Alon
-
change -O output.vcf.gz to -O output.vcf if chromsomes size more then 530M+
Post is closed for comments.
18 comments