java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770
REQUIRED for all errors and issues:
a) GATK version used:atk-4.4.0.0
b) Exact command used: ~/gatk-4.4.0.0/gatk --java-options "-Xmx100g" HaplotypeCaller -R ~/SVEVO.fas -I SAMP.bam -O SAMP.g.vcf.gz -ERC GVCF
c) Entire program log:
11:58:11.029 INFO ProgressMeter - chr1B:316847401 47.8 3142070 65760.9
11:58:21.136 INFO ProgressMeter - chr1B:354836401 47.9 3268700 68170.9
11:58:31.136 INFO ProgressMeter - chr1B:397121401 48.1 3409650 70864.2
11:58:41.136 INFO ProgressMeter - chr1B:439004401 48.3 3549260 73511.1
11:58:51.136 INFO ProgressMeter - chr1B:481301401 48.4 3690250 76168.3
11:59:01.136 INFO ProgressMeter - chr1B:523625401 48.6 3831330 78809.1
11:59:11.136 INFO ProgressMeter - chr1B:557510401 48.8 3944280 80855.3
11:59:21.136 INFO ProgressMeter - chr1B:599843401 48.9 4085390 83462.8
11:59:31.136 INFO ProgressMeter - chr1B:643640401 49.1 4231380 86152.0
11:59:45.580 INFO HaplotypeCaller - Shutting down engine
[August 2, 2023 at 11:59:45 AM IDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 49.36 minutes.
Runtime.totalMemory()=23924310016
java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770
at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:142)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:92)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.add(IndexingVariantContextWriter.java:203)
at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:242)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.output(GVCFWriter.java:95)
at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.add(GVCFWriter.java:90)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:271)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Suppressed: java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770
at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:142)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeIndex(TabixIndexCreator.java:129)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:177)
at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:233)
at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.close(GVCFWriter.java:71)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.closeTool(HaplotypeCaller.java:277)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1095)
... 6 more
Bam file was indexed with bai index. Bam file was split to a chunk smaller than 512Mbp which is the limit of bai index. Dict and fai indexing are for the whole genome (10GB)
Thank you
-
Hi Hanan Sela
According to SAM specification v1
In the BAI format, each bin may span 2^29, 2^26, 2^23, 2^20, 2^17 or 2^14 bp. Bin 0 spans a 512Mbp region, bins
1–8 span 64Mbp, 9–72 8Mbp, 73–584 1Mbp, 585–4680 128kbp, and bins 4681–37448 span 16kbp regions.
This implies that this index format does not support reference chromosome sequences longer than 229 − 1.
The CSI format generalises the sizes of the bins, and supports reference sequences of the same length as
are supported by SAM and BAM.This means that you need to create a CSI index for your BAM file and run HaplotypeCaller with the additional parameter
--create-output-variant-index false
This will help HaplotypeCaller run without issues and will generate a VCF file without an index. Unfortunately there is no CSI like index support for VCFs in HTSJDK therefore using this VCF file in downstream analyses might require additional work that is currently beyond GATK's capabilities.
There may be future plans to implement such functionality but I cannot give a definitive answer to when such implementation may occur.
I hope this helps.
-
Hi
In this post it is claimed that indexing with samtools after GVCF generation can help. Is SAMtools generated CSI is compatible with downstream applications such as GenomicsDBImport and GenotypeGVCFs?
Thank you.
-
Hi Hanan Sela
Unfortunately current neither HTSJDK nor any GATK tools are compatible with the CSI format VCF index. If you wish to perform joint genotyping glnexus might seem to be an option but since it is outside of our realm we cannot provide any support for it. Here is the wording from the github page of glnexus.
glnexus_cli
does not use tabix indices for the input gVCFs. If you need to process only a few selected genomic ranges, then it may be advantageous to slice your gVCFs beforehand.Since it does not care about the tabix index (You cannot have anyway with your genome size) you may genotype whole gVCF without issues. Of course YMMV.
I hope this helps.
Please sign in to leave a comment.
3 comments