Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

HaplotypeCaller java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770

0

4 comments

  • Avatar
    SkyWarrior

    Can you try generating a bai index with samtools and retry without the csi index? 

    0
    Comment actions Permalink
  • Avatar
    Tiffany Kosch

    Hello SkyWarrior,

    I've tried this but cannot index the file with bai because the chromosomes are too large. Is there another way around this?

     

     

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    Hi Tiffany Kosch,

    I'm sorry to give you bad news.  GATK doesn't have support for CSI indexes with vcf, either reading or writing them.  If you need a CSI for you input bam you probably need one for your output VCF as well.  (It's a long standing open issue that we'd love to fix if we had more time and resources.). 

    One possible workaround.  If you're done at HaplotypeCaller and not planning to use any of the downstream gatk tools you could try running HaplotypeCaller with `--create-output-variant-index false` set.  This will keep it from creating an index for the output file which should work around the issue.  The problem is then even if you use something like tabix to index the result GATK tools won't be able to use that index.  

    The more general workaround is to split your reference into smaller chunks.  This is obviously a hassle though. You could maybe make the vcf without an index, then do some nasty hackery to pretend the out of range chunks are on a different contig for downstream processing. 

    Sorry we don't have a good solution.

    0
    Comment actions Permalink
  • Avatar
    Hanan Sela

    Hello, I did as suggested and split the bam file to less than 512Mbp so it can be indexed with bai file. I still get the 32770 error. The ref.fatsa (10GB) indexing with dictionary  and fai was not changed.  I note that the error occurred when the haplotypecaller was searching out of the region of the bam file which is limited to chr1A, yet, it did not write all SNP of chr1A to the g.vcf file.  Thank you.

    11:58:11.029 INFO  ProgressMeter -      chr1B:316847401             47.8               3142070          65760.9
    11:58:21.136 INFO  ProgressMeter -      chr1B:354836401             47.9               3268700          68170.9
    11:58:31.136 INFO  ProgressMeter -      chr1B:397121401             48.1               3409650          70864.2        
    11:58:41.136 INFO  ProgressMeter -      chr1B:439004401             48.3               3549260          73511.1
    11:58:51.136 INFO  ProgressMeter -      chr1B:481301401             48.4               3690250          76168.3
    11:59:01.136 INFO  ProgressMeter -      chr1B:523625401             48.6               3831330          78809.1
    11:59:11.136 INFO  ProgressMeter -      chr1B:557510401             48.8               3944280          80855.3
    11:59:21.136 INFO  ProgressMeter -      chr1B:599843401             48.9               4085390          83462.8
    11:59:31.136 INFO  ProgressMeter -      chr1B:643640401             49.1               4231380          86152.0       
    11:59:45.580 INFO  HaplotypeCaller - Shutting down engine                                                                                                                                                  
    [August 2, 2023 at 11:59:45 AM IDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 49.36 minutes.
    Runtime.totalMemory()=23924310016                                                                    
    java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770                                                                                                                       
            at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:142)
            at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
            at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:92)
            at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.add(IndexingVariantContextWriter.java:203)
            at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:242)
            at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
            at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.output(GVCFWriter.java:95)
            at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.add(GVCFWriter.java:90)
            at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:271)
            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)
            Suppressed: java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770
                    at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:142)
                    at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
                    at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeIndex(TabixIndexCreator.java:129)
                    at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:177)
                    at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:233)
                    at org.broadinstitute.hellbender.utils.variant.writers.GVCFWriter.close(GVCFWriter.java:71)
                    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.closeTool(HaplotypeCaller.java:277)
                    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1095)
                    ... 6 more

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk