HaplotypeCaller java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770
REQUIRED for all errors and issues:
a) GATK version used:
b) Exact command used:
c) Entire program log:
I keep getting this error when running gatk HaplotypeCaller after it's been running for 6-7 hours on my amphibian sequence dataset. Can you please advise me on how to proceed? My bam file has a .csi index
REQUIRED for all errors and issues:
a) GATK version used: gatk/4.2.5.0
b) Exact command used:
gatk HaplotypeCaller \
-R ${REF} \
-I ${dir}/bams/${BAM} \
-L ${INT} \
-O ${dir}/vcfs/${ID}.nd.g.vcf.gz \
-ERC GVCF
c) Entire program log: <NOTE: entire log was too large to upload here so I omitted most of the middle.>
WARNING: GATK v4.2.5.0 support for Java 11 is in beta state. Use at your own risk.
WARNING: GATK v4.2.5.0 support for Java 11 is in beta state. Use at your own risk.
WARNING: GATK v4.2.5.0 support for Java 11 is in beta state. Use at your own risk.
Inactive Modules:
1) curl/7.78.0
Due to MODULEPATH changes, the following have been reloaded:
1) bzip2/1.0.8 2) ncurses/6.2 3) xz/5.2.5 4) zlib/1.2.11
The following have been reloaded with a version change:
1) binutils/2.37 => binutils/2.35 2) gcccore/11.2.0 => gcccore/10.2.0
16:52:23.749 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/easybuild-2019/easybuild/software/compiler/gcccore/10.2.0/gatk/4.2.5.0-python-3.8.6/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 19, 2023 4:52:23 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:52:23.950 INFO HaplotypeCaller - ------------------------------------------------------------
16:52:23.951 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.5.0
16:52:23.951 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
16:52:23.951 INFO HaplotypeCaller - Executing as tkosch@spartan-bm021.hpc.unimelb.edu.au on Linux v3.10.0-1160.66.1.el7.x86_64 amd64
16:52:23.951 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v11.0.2+9
16:52:23.951 INFO HaplotypeCaller - Start Date/Time: 19 June 2023 at 4:52:23 pm AEST
16:52:23.951 INFO HaplotypeCaller - ------------------------------------------------------------
16:52:23.951 INFO HaplotypeCaller - ------------------------------------------------------------
16:52:23.953 INFO HaplotypeCaller - HTSJDK Version: 2.24.1
16:52:23.953 INFO HaplotypeCaller - Picard Version: 2.25.4
16:52:23.953 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
16:52:23.953 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:52:23.953 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:52:23.953 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:52:23.953 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:52:23.953 INFO HaplotypeCaller - Deflater: IntelDeflater
16:52:23.953 INFO HaplotypeCaller - Inflater: IntelInflater
16:52:23.953 INFO HaplotypeCaller - GCS max retries/reopens: 20
16:52:23.953 INFO HaplotypeCaller - Requester pays: disabled
16:52:23.953 INFO HaplotypeCaller - Initializing engine
16:52:24.620 INFO FeatureManager - Using codec IntervalListCodec to read file file:///data/gpfs/projects/punim1525/Projects/PSCO-genome/chromosomes.interval_list
16:52:24.633 INFO IntervalArgumentCollection - Processing 8190274449 bp from intervals
16:52:24.651 INFO HaplotypeCaller - Done initializing engine
16:52:24.668 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
16:52:24.762 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
16:52:24.762 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
16:52:24.791 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/usr/local/easybuild-2019/easybuild/software/compiler/gcccore/10.2.0/gatk/4.2.5.0-python-3.8.6/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
16:52:24.794 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/usr/local/easybuild-2019/easybuild/software/compiler/gcccore/10.2.0/gatk/4.2.5.0-python-3.8.6/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
16:52:24.824 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions
16:52:24.825 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
16:52:24.825 INFO IntelPairHmm - Available threads: 2
16:52:24.825 INFO IntelPairHmm - Requested threads: 4
16:52:24.825 WARN IntelPairHmm - Using 2 available threads, but 4 were requested
16:52:24.825 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
16:52:24.949 INFO ProgressMeter - Starting traversal
16:52:24.950 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
16:52:26.843 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position 1:25606 and possibly subsequent; at least 10 samples must have called genotypes
16:52:34.980 INFO ProgressMeter - 1:129143 0.2 660 3948.5
16:52:44.966 WARN DepthPerSampleHC - Annotation will not be calculated at position 1:277510 and possibly subsequent; genotype for sample 08UCPB10 is not called
16:52:44.967 WARN StrandBiasBySample - Annotation will not be calculated at position 1:277510 and possibly subsequent; genotype for sample 08UCPB10 is not called
16:52:45.000 INFO ProgressMeter - 1:278401 0.3 1550 4638.4
16:52:55.518 INFO ProgressMeter - 1:498043 0.5 2530 4966.0
...
23:54:38.786 INFO ProgressMeter - 1:535614399 422.2 2607420 6175.3
23:54:48.809 INFO ProgressMeter - 1:536385965 422.4 2610510 6180.2
23:54:59.574 INFO ProgressMeter - 1:536842572 422.6 2612370 6182.0
23:55:00.307 INFO HaplotypeCaller - Shutting down engine
[19 June 2023 at 11:55:00 pm AEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 422.61 minutes.
Runtime.totalMemory()=5343543296
java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770
at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:142)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeIndex(TabixIndexCreator.java:129)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:177)
at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:233)
at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.close(GVCFWriter.java:71)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.closeTool(HaplotypeCaller.java:279)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1091)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /usr/local/easybuild-2019/easybuild/software/compiler/gcccore/10.2.0/gatk/4.2.5.0-python-3.8.6/gatk-package-4.2.5.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/easybuild-2019/easybuild/software/compiler/gcccore/10.2.0/gatk/4.2.5.0-python-3.8.6/gatk-package-4.2.5.0-local.jar HaplotypeCaller -R /data/gpfs/projects/punim1525/Projects/PSCO-genome/psco-genome.fasta.gz -I /data/gpfs/projects/punim1525/Projects/PSCO-genome/SNP-chip/sams/08UCPB10.merged.nd.fix.bam -L /data/gpfs/projects/punim1525/Projects/PSCO-genome/chromosomes.interval_list -O /data/gpfs/projects/punim1525/Projects/PSCO-genome/SNP-chip/vcfs/08UCPB10.nd.g.vcf.gz -ERC GVCF
-
Can you try generating a bai index with samtools and retry without the csi index?
-
Hello SkyWarrior,
I've tried this but cannot index the file with bai because the chromosomes are too large. Is there another way around this?
-
Hi Tiffany Kosch,
I'm sorry to give you bad news. GATK doesn't have support for CSI indexes with vcf, either reading or writing them. If you need a CSI for you input bam you probably need one for your output VCF as well. (It's a long standing open issue that we'd love to fix if we had more time and resources.).
One possible workaround. If you're done at HaplotypeCaller and not planning to use any of the downstream gatk tools you could try running HaplotypeCaller with `--create-output-variant-index false` set. This will keep it from creating an index for the output file which should work around the issue. The problem is then even if you use something like tabix to index the result GATK tools won't be able to use that index.
The more general workaround is to split your reference into smaller chunks. This is obviously a hassle though. You could maybe make the vcf without an index, then do some nasty hackery to pretend the out of range chunks are on a different contig for downstream processing.
Sorry we don't have a good solution.
-
Hello, I did as suggested and split the bam file to less than 512Mbp so it can be indexed with bai file. I still get the 32770 error. The ref.fatsa (10GB) indexing with dictionary and fai was not changed. I note that the error occurred when the haplotypecaller was searching out of the region of the bam file which is limited to chr1A, yet, it did not write all SNP of chr1A to the g.vcf file. Thank you.
11:58:11.029 INFO ProgressMeter - chr1B:316847401 47.8 3142070 65760.9
11:58:21.136 INFO ProgressMeter - chr1B:354836401 47.9 3268700 68170.9
11:58:31.136 INFO ProgressMeter - chr1B:397121401 48.1 3409650 70864.2
11:58:41.136 INFO ProgressMeter - chr1B:439004401 48.3 3549260 73511.1
11:58:51.136 INFO ProgressMeter - chr1B:481301401 48.4 3690250 76168.3
11:59:01.136 INFO ProgressMeter - chr1B:523625401 48.6 3831330 78809.1
11:59:11.136 INFO ProgressMeter - chr1B:557510401 48.8 3944280 80855.3
11:59:21.136 INFO ProgressMeter - chr1B:599843401 48.9 4085390 83462.8
11:59:31.136 INFO ProgressMeter - chr1B:643640401 49.1 4231380 86152.0
11:59:45.580 INFO HaplotypeCaller - Shutting down engine
[August 2, 2023 at 11:59:45 AM IDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 49.36 minutes.
Runtime.totalMemory()=23924310016
java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770
at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:142)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:92)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.add(IndexingVariantContextWriter.java:203)
at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:242)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.output(GVCFWriter.java:95)
at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.add(GVCFWriter.java:90)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:271)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Suppressed: java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770
at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:142)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeIndex(TabixIndexCreator.java:129)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:177)
at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:233)
at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.close(GVCFWriter.java:71)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.closeTool(HaplotypeCaller.java:277)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1095)
... 6 more
Please sign in to leave a comment.
4 comments