Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

haplotycaller --alleles : arrayIndexOutofBoundsException

0

11 comments

  • Avatar
    David Roazen

    Hi Geert Vandeweyer,

    Can you please try running with the latest release (GATK 4.5), and report whether the error still occurs? There were some fixes for --alleles mode in that release.

    Regards,

    David

    0
    Comment actions Permalink
  • Avatar
    Geert Vandeweyer

    HI David, 

    Version 4.5.0 resulted in the same error and fix.  

    Best, 

    Geert

    0
    Comment actions Permalink
  • Avatar
    David Roazen

    Hi Geert Vandeweyer,

    Thank you for checking with the latest version. This is indeed a likely edge-case bug in the TandemRepeat annotation. I've filed a bug report on github here: https://github.com/broadinstitute/gatk/issues/8675

    In the meantime, while we wait for a proper fix, it sounds like you have a workaround that allows HaplotypeCaller to complete.

    Regards,

    David

    0
    Comment actions Permalink
  • Avatar
    James Emery

    Hello Geert Vandeweyer

    Likely this is an inherent issue with the size of the assembly window and very long indel events that in Alleles mode that we should fix. However for us to be sure about that and debug this for you we would like to have access to a minimal section of the inputs here so we can run it ourselves to run this error to ground and hopefully fix it for you. The bam/VCF inputs to Haplotype caller on the order of ~2000 basepairs in the vicinity of that site would provide enough data for us to hopefully reproduce this crash and attempt to fix it if that is something you are able to provide. 

    Here are instructions for uploading data: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671-How-do-I-submit-a-detailed-bug-report.

    0
    Comment actions Permalink
  • Avatar
    Marc Crepeau

    I am getting a similar crash with identical error messages, also when running GATK 4.4.0 HaplotypeCaller with the --alleles option.  I have 12 samples (12 BAM files) and I am trying to genotype them at 10,145,661 sites including SNVs and short indels.  The process crashes for all of the BAM files.  Increasinging the --assembly-region-padding parameter allows the processes to proceed further into the BAM files, but eventually they still crash.  Not sure how high I can reasonably set the parameter?  Currently trying it at 1000.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Marc Crepeau

    Since this issue is still not clear to us due to lack of any samples/data/snippets provided by other users we are in need of help from our users in order to diagnose the problem. 

    There are a couple of steps that could be done. Firstly can you split your target alleles as SNPs and INDELs into separate files so and try genotyping them to see if which input is really causing this issue.

    For INDELs there could be additional steps to be considered such as tagging them with VariantAnnotator using -A TandemRepeat option and possibly filtering for and against this tag to see where the problematic alleles fall into. Also you may want to check if your alleles VCF contain any structural variants or complex/MNP ones that are not split into primitives which may also cause this issue.

    Secondly we want you to make sure that alleles in your target vcf is set to biallelic by splitting them using

    bcftools norm -m-both

    and try running your workflow with 

    --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'

    parameters so that you can get a detailed stacktrace and share with us. 

    I hope this helps.

    0
    Comment actions Permalink
  • Avatar
    Marc Crepeau

    First thing to mention is that using the workaround of setting --assembly-region-padding to 1000 still results in failure for at least some of the BAM files (others are still running as of this morning).

    Next, I tried some of the tests you mentioned using one of the BAM files.  After splitting the alleles file into one file for SNPs and one for indels, the calling completed normally for the SNPs but failed for the indels.

    Next I used VariantAnnotator on the indels-only alleles file as suggested and filtered the result into two files, one with RPA tags and one without.  The calling completed normally for the indels with RPA tags but failed for the indels without RPA tags.

    Finally, I performed the suggested bcftools norm command on the indels without RPA tags and tried the variant calling using the resulting alleles file.  The variant calling failed again and in fact did not seem to progress as far into the BAM file before failing as when run using the non-normalized alleles.  I am pasting in below the stack traces for the runs with non-normalized and normalized no-RPA indels.  All of this was done with 4.4.0.  I'm working in a shared compute environment and that is the most recent version installed.  I've made a request for the sysadmins to install the latest version, but I'm not sure when that will happen.

    *********************** non-normalized ***********************

    (vgl)mcrepeau@grassi:/share/lanzarolab/seq/variant_calling/Pf_GATK/test_temp$ gatk --java-options "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g" HaplotypeCaller -I /share/lanzarolab/seq/map/Pf_comb_refs/merged_runs/re-mapped_bams/final_bams/DBS_02_1_C.bam --alleles indel_sites_only.noRPA.vcf.gz -R /share/lanzarolab/archive/reference/Pfalciparum.genome.fasta -O test_indel_noRPA.vcf -ERC GVCF
    Using GATK jar /afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g -jar /afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar HaplotypeCaller -I /share/lanzarolab/seq/map/Pf_comb_refs/merged_runs/re-mapped_bams/final_bams/DBS_02_1_C.bam --alleles indel_sites_only.noRPA.vcf.gz -R /share/lanzarolab/archive/reference/Pfalciparum.genome.fasta -O test_indel_noRPA.vcf -ERC GVCF
    16:14:16.617 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    16:14:16.689 INFO  HaplotypeCaller - ------------------------------------------------------------
    16:14:16.694 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.4.0.0
    16:14:16.694 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:14:16.695 INFO  HaplotypeCaller - Executing as mcrepeau@grassi on Linux v4.15.0-99-generic amd64
    16:14:16.695 INFO  HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v21.0.1+12-LTS-29
    16:14:16.695 INFO  HaplotypeCaller - Start Date/Time: March 1, 2024, 4:14:16 PM PST
    16:14:16.696 INFO  HaplotypeCaller - ------------------------------------------------------------
    16:14:16.696 INFO  HaplotypeCaller - ------------------------------------------------------------
    16:14:16.697 INFO  HaplotypeCaller - HTSJDK Version: 3.0.5
    16:14:16.697 INFO  HaplotypeCaller - Picard Version: 3.0.0
    16:14:16.698 INFO  HaplotypeCaller - Built for Spark Version: 3.3.1
    16:14:16.698 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:14:16.698 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:14:16.698 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:14:16.699 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:14:16.699 INFO  HaplotypeCaller - Deflater: IntelDeflater
    16:14:16.699 INFO  HaplotypeCaller - Inflater: IntelInflater
    16:14:16.699 INFO  HaplotypeCaller - GCS max retries/reopens: 20
    16:14:16.699 INFO  HaplotypeCaller - Requester pays: disabled
    16:14:16.700 INFO  HaplotypeCaller - Initializing engine
    16:14:17.570 INFO  FeatureManager - Using codec VCFCodec to read file file:///share/lanzarolab/seq/variant_calling/Pf_GATK/test_temp/indel_sites_only.noRPA.vcf.gz
    16:14:17.625 INFO  HaplotypeCaller - Done initializing engine
    16:14:17.628 INFO  HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
    16:14:17.648 INFO  HaplotypeCallerEngine - Standard Emitting and Calling confidence set to -0.0 for reference-model confidence output
    16:14:17.648 INFO  HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
    16:14:17.668 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
    16:14:17.851 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    16:14:17.892 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    16:14:18.020 INFO  IntelPairHmm - Available threads: 64
    16:14:18.020 INFO  IntelPairHmm - Requested threads: 4
    16:14:18.020 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    16:14:18.084 INFO  ProgressMeter - Starting traversal
    16:14:18.085 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
    16:14:18.832 WARN  InbreedingCoeff - InbreedingCoeff will not be calculated at position Pf3D7_01_v3:1781 and possibly subsequent; at least 10 samples must have called genotypes
    16:14:22.552 WARN  DepthPerSampleHC - Annotation will not be calculated at position Pf3D7_01_v3:39318 and possibly subsequent; genotype for sample DBS_02_1_C is not called
    16:14:22.552 WARN  StrandBiasBySample - Annotation will not be calculated at position Pf3D7_01_v3:39318 and possibly subsequent; genotype for sample DBS_02_1_C is not called
    16:14:28.201 INFO  ProgressMeter -    Pf3D7_01_v3:93676              0.2                   420           2491.3
    16:14:38.353 INFO  ProgressMeter -   Pf3D7_01_v3:104356              0.3                   480           1421.0
    16:14:51.322 INFO  ProgressMeter -   Pf3D7_01_v3:116801              0.6                   560           1011.0
    16:15:02.212 INFO  ProgressMeter -   Pf3D7_01_v3:120042              0.7                   580            788.6
    16:15:15.314 INFO  ProgressMeter -   Pf3D7_01_v3:123057              1.0                   600            629.1
    16:15:29.175 INFO  ProgressMeter -   Pf3D7_01_v3:124799              1.2                   610            514.8
    16:15:47.783 INFO  ProgressMeter -   Pf3D7_01_v3:129492              1.5                   640            428.1
    16:15:58.901 INFO  ProgressMeter -   Pf3D7_01_v3:135411              1.7                   680            404.7
    16:16:10.478 INFO  ProgressMeter -   Pf3D7_01_v3:140895              1.9                   720            384.4
    16:16:27.399 INFO  ProgressMeter -   Pf3D7_01_v3:143500              2.2                   740            343.4
    16:16:41.973 INFO  ProgressMeter -   Pf3D7_01_v3:148630              2.4                   770            321.1
    16:16:50.969 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.029914973
    16:16:50.970 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 52.165572267
    16:16:50.970 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 70.18 sec
    16:16:50.972 INFO  HaplotypeCaller - Shutting down engine
    [March 1, 2024, 4:16:50?PM PST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 2.57 minutes.
    Runtime.totalMemory()=1126170624
    java.lang.ArrayIndexOutOfBoundsException: arraycopy: source index -66 out of bounds for byte[432]
            at java.base/java.lang.System.arraycopy(Native Method)
            at java.base/java.util.Arrays.copyOfRangeByte(Arrays.java:3864)
            at java.base/java.util.Arrays.copyOfRange(Arrays.java:3854)
            at org.broadinstitute.hellbender.tools.walkers.annotator.TandemRepeat.getNumTandemRepeatUnits(TandemRepeat.java:54)
            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyRegionTrimmer.trim(AssemblyRegionTrimmer.java:189)
            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:656)
            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:271)
            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)

    ************************* normalized *************************

    (vgl)mcrepeau@grassi:/share/lanzarolab/seq/variant_calling/Pf_GATK/test_temp$ gatk --java-options "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g" HaplotypeCaller -I /share/lanzarolab/seq/map/Pf_comb_refs/merged_runs/re-mapped_bams/final_bams/DBS_02_1_C.bam --alleles indel_sites_only.noRPA.norm.vcf.gz -R /share/lanzarolab/archive/reference/Pfalciparum.genome.fasta -O test_indel_noRPA.norm.vcf -ERC GVCF
    Using GATK jar /afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g -jar /afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar HaplotypeCaller -I /share/lanzarolab/seq/map/Pf_comb_refs/merged_runs/re-mapped_bams/final_bams/DBS_02_1_C.bam --alleles indel_sites_only.noRPA.norm.vcf.gz -R /share/lanzarolab/archive/reference/Pfalciparum.genome.fasta -O test_indel_noRPA.norm.vcf -ERC GVCF
    16:32:08.926 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    16:32:09.035 INFO  HaplotypeCaller - ------------------------------------------------------------
    16:32:09.039 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.4.0.0
    16:32:09.040 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:32:09.040 INFO  HaplotypeCaller - Executing as mcrepeau@grassi on Linux v4.15.0-99-generic amd64
    16:32:09.040 INFO  HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v21.0.1+12-LTS-29
    16:32:09.040 INFO  HaplotypeCaller - Start Date/Time: March 1, 2024, 4:32:08 PM PST
    16:32:09.040 INFO  HaplotypeCaller - ------------------------------------------------------------
    16:32:09.041 INFO  HaplotypeCaller - ------------------------------------------------------------
    16:32:09.042 INFO  HaplotypeCaller - HTSJDK Version: 3.0.5
    16:32:09.042 INFO  HaplotypeCaller - Picard Version: 3.0.0
    16:32:09.042 INFO  HaplotypeCaller - Built for Spark Version: 3.3.1
    16:32:09.043 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:32:09.043 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:32:09.043 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:32:09.044 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:32:09.044 INFO  HaplotypeCaller - Deflater: IntelDeflater
    16:32:09.044 INFO  HaplotypeCaller - Inflater: IntelInflater
    16:32:09.044 INFO  HaplotypeCaller - GCS max retries/reopens: 20
    16:32:09.044 INFO  HaplotypeCaller - Requester pays: disabled
    16:32:09.045 INFO  HaplotypeCaller - Initializing engine
    16:32:09.351 INFO  FeatureManager - Using codec VCFCodec to read file file:///share/lanzarolab/seq/variant_calling/Pf_GATK/test_temp/indel_sites_only.noRPA.norm.vcf.gz
    16:32:09.402 INFO  HaplotypeCaller - Done initializing engine
    16:32:09.405 INFO  HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
    16:32:09.424 INFO  HaplotypeCallerEngine - Standard Emitting and Calling confidence set to -0.0 for reference-model confidence output
    16:32:09.425 INFO  HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
    16:32:09.446 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
    16:32:09.457 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    16:32:09.503 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    16:32:09.676 INFO  IntelPairHmm - Available threads: 64
    16:32:09.677 INFO  IntelPairHmm - Requested threads: 4
    16:32:09.677 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    16:32:09.743 INFO  ProgressMeter - Starting traversal
    16:32:09.744 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
    16:32:10.695 WARN  InbreedingCoeff - InbreedingCoeff will not be calculated at position Pf3D7_01_v3:1781 and possibly subsequent; at least 10 samples must have called genotypes
    16:32:18.167 WARN  DepthPerSampleHC - Annotation will not be calculated at position Pf3D7_01_v3:39318 and possibly subsequent; genotype for sample DBS_02_1_C is not called
    16:32:18.168 WARN  StrandBiasBySample - Annotation will not be calculated at position Pf3D7_01_v3:39318 and possibly subsequent; genotype for sample DBS_02_1_C is not called
    16:32:19.825 INFO  ProgressMeter -    Pf3D7_01_v3:47077              0.2                   190           1131.0
    16:32:31.513 INFO  ProgressMeter -    Pf3D7_01_v3:95376              0.4                   430           1185.2
    16:32:39.611 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.005375431
    16:32:39.612 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 1.35759391
    16:32:39.612 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 8.72 sec
    16:32:39.613 INFO  HaplotypeCaller - Shutting down engine
    [March 1, 2024, 4:32:39?PM PST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.51 minutes.
    Runtime.totalMemory()=1126170624
    java.lang.ArrayIndexOutOfBoundsException: arraycopy: source index -45 out of bounds for byte[387]
            at java.base/java.lang.System.arraycopy(Native Method)
            at java.base/java.util.Arrays.copyOfRangeByte(Arrays.java:3864)
            at java.base/java.util.Arrays.copyOfRange(Arrays.java:3854)
            at org.broadinstitute.hellbender.tools.walkers.annotator.TandemRepeat.getNumTandemRepeatUnits(TandemRepeat.java:54)
            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyRegionTrimmer.trim(AssemblyRegionTrimmer.java:189)
            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:656)
            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:271)
            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)
    0
    Comment actions Permalink
  • Avatar
    James Emery

    Hello Marc Crepeau. It is useful context to hear that you are enocuntering a similar failure with --alleles set. It would help us immensely to track down the issue to have some test data to debug with. Ideally the most minimal chunk of the files necessary to reproduce the error so we can debug what is happening empirically.  We have a hunch about what might be causing this error but we need to confirm that it is the correct exception and given the difficulty of reproducing assembly failures it would be best to work from an already failing file.

    Here are instructions for uploading data: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671-How-do-I-submit-a-detailed-bug-report.

    0
    Comment actions Permalink
  • Avatar
    Marc Crepeau

    Ok, I've uploaded the required files: mini.tar.gz

    0
    Comment actions Permalink
  • Avatar
    James Emery

    Hi Marc Crepeau. I have looked at your example and there is indeed a problem with our code related to the --alleles mode. Specifically very long deletions can cause indexing issues when they get injected into the HaplotypeCaller in --alleles mode. I have a branch here: https://github.com/broadinstitute/gatk/pull/8731 to fix it which should hopefully make it into the next point release of GATK and resolve your issue. 

    In the meantime, there are a few workarounds you can consider. Perhaps the biggest one is to filter out from your VCF any deletions that are longer than ~150 bases or so as they are prone to causing this issue in rare cases if they belong to very noisy assembly regions in your reference. You could also try changing the "--assembly-region-padding" argument as it will change the amount of padding that gets used for the specific method that is causing the exception here and thus you are less likely to see the bug. Increasing the padding too much can lead to assembly failures in repetitive regions and slower calling in some cases so be careful about using too high a number, however anything up to ~400 bases should work in many cases. 

    Thank you for bringing this to our attention. 

    0
    Comment actions Permalink
  • Avatar
    Marc Crepeau

    That's great!  Thanks for your help!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk