haplotycaller --alleles : arrayIndexOutofBoundsException
When running gatk 4.4.0.0 , I get an error for re-genotyping (--alleles) a 175bp deletion :
command:
gatk --java-options "-Djava.io.tmpdir=/tmp -Xmx3g" HaplotypeCaller -R /home/gvandeweyer/AWS/hg19/genome/hg19_ref_genome.fasta -O head.out.100.vcf -L head.vcf -I wes-199345-i_0007-scattered_f
inal_INT.bam -bamout bamout.bam --alleles head.vcf --bam-writer-type CALLED_HAPLOTYPES --output-mode EMIT_ALL_ACTIVE_SITES --interval-padding 50
input vcf (hg19) :
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT wes-199345-i
chr7 100550014 . CCCATAGTGACAGTGACACCCTCCTCTGTGTCAGCCACAGACACAACCTTCCACACTACAATCTCATCTACAACTAGAACCACAGAAAGGACTCCCCTGCCCACTGGAAGCATCCATACAACCACGTCCCCAACCCCAGTATTTACTACTCTCAAAACAGCAGTGACTTCCACTT C 290.60 . AC=1
;AF=0.500;AN=2;BaseQRankSum=3.955;DP=688;ExcessHet=0.0000;FS=2.076;MLEAC=1;MLEAF=0.500;MQ=59.71;MQRankSum=-15.483;QD=0.43;ReadPosRankSum=-5.710;SOR=0.971 GT:AD:DP:GQ:PL 0/1:618,65:683:99:298,0,25657
chr7 100550032 . C CCCT 3378.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.703;DP=365;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.73;MQRankSum=0.000;QD=11.98;ReadPosRankSum=-0.507;SOR=0.707 GT:AD:DP:GQ:
PL 0/1:151,131:282:99:4978,0,5884
chr7 100550079 . A C,G 8125.09 . AC=1,1;AF=0.500,0.500;AN=2;DP=355;ExcessHet=0.0000;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.72;QD=27.36;SOR=0.713 GT:AD:DP:GQ:PL 1/2:0,135,162:297:99:9856,4142,3719,
3948,0,3138
chr7 100550133 . A G 4670.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.652;DP=275;ExcessHet=0.0000;FS=0.444;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=17.11;ReadPosRankSum=-0.372;SOR=0.727 GT:AD:DP:GQ:
PL 0/1:148,125:273:99:4678,0,5640
chr7 100550138 . C T 4613.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.586;DP=268;ExcessHet=0.0000;FS=0.451;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=17.34;ReadPosRankSum=-0.713;SOR=0.650 GT:AD:DP:GQ:
PL 0/1:144,122:266:99:4621,0,5562
chr7 100550184 . C A 3404.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.436;DP=234;ExcessHet=0.0000;FS=4.688;MLEAC=1;MLEAF=0.500;MQ=59.72;MQRankSum=0.000;QD=14.93;ReadPosRankSum=-0.918;SOR=0.464 GT:AD:DP:GQ:
PL 0/1:132,96:228:99:3412,0,4974
chr7 100550203 . T A 3143.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.803;DP=230;ExcessHet=0.0000;FS=1.715;MLEAC=1;MLEAF=0.500;MQ=59.57;MQRankSum=0.000;QD=14.29;ReadPosRankSum=-0.374;SOR=0.560 GT:AD:DP:GQ:
PL 0/1:134,86:220:99:3151,0,5338
chr7 100550205 . A C 3096.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.554;DP=229;ExcessHet=0.0000;FS=0.518;MLEAC=1;MLEAF=0.500;MQ=59.57;MQRankSum=0.000;QD=14.14;ReadPosRankSum=0.612;SOR=0.625 GT:AD:DP:GQ:
PL 0/1:135,84:219:99:3104,0,5386
chr7 100550245 . G A 2098.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.448;DP=197;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.79;MQRankSum=-2.014;QD=11.34;ReadPosRankSum=1.148;SOR=0.693 GT:AD:DP:GQ:
PL 0/1:124,61:185:99:2106,0,5009
error:
[December 27, 2023 at 3:52:38 PM GMT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.10 minutes.
Runtime.totalMemory()=190840832
java.lang.ArrayIndexOutOfBoundsException: arraycopy: source index -26 out of bounds for byte[355]
at java.base/java.lang.System.arraycopy(Native Method)
at java.base/java.util.Arrays.copyOfRange(Arrays.java:3823)
at org.broadinstitute.hellbender.tools.walkers.annotator.TandemRepeat.getNumTandemRepeatUnits(TandemRepeat.java:54)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyRegionTrimmer.trim(AssemblyRegionTrimmer.java:189)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:656)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:271)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
When I increase the "--assembly-region-padding" from the default 100, to 150, the problem resolves.
Is this a bug ?
-
Hi Geert Vandeweyer,
Can you please try running with the latest release (GATK 4.5), and report whether the error still occurs? There were some fixes for --alleles mode in that release.
Regards,
David
-
HI David,
Version 4.5.0 resulted in the same error and fix.
Best,Geert
-
Hi Geert Vandeweyer,
Thank you for checking with the latest version. This is indeed a likely edge-case bug in the TandemRepeat annotation. I've filed a bug report on github here: https://github.com/broadinstitute/gatk/issues/8675
In the meantime, while we wait for a proper fix, it sounds like you have a workaround that allows HaplotypeCaller to complete.Regards,
David
-
Hello Geert Vandeweyer.
Likely this is an inherent issue with the size of the assembly window and very long indel events that in Alleles mode that we should fix. However for us to be sure about that and debug this for you we would like to have access to a minimal section of the inputs here so we can run it ourselves to run this error to ground and hopefully fix it for you. The bam/VCF inputs to Haplotype caller on the order of ~2000 basepairs in the vicinity of that site would provide enough data for us to hopefully reproduce this crash and attempt to fix it if that is something you are able to provide.
Here are instructions for uploading data: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671-How-do-I-submit-a-detailed-bug-report. -
I am getting a similar crash with identical error messages, also when running GATK 4.4.0 HaplotypeCaller with the --alleles option. I have 12 samples (12 BAM files) and I am trying to genotype them at 10,145,661 sites including SNVs and short indels. The process crashes for all of the BAM files. Increasinging the --assembly-region-padding parameter allows the processes to proceed further into the BAM files, but eventually they still crash. Not sure how high I can reasonably set the parameter? Currently trying it at 1000.
-
Hi Marc Crepeau
Since this issue is still not clear to us due to lack of any samples/data/snippets provided by other users we are in need of help from our users in order to diagnose the problem.
There are a couple of steps that could be done. Firstly can you split your target alleles as SNPs and INDELs into separate files so and try genotyping them to see if which input is really causing this issue.
For INDELs there could be additional steps to be considered such as tagging them with VariantAnnotator using -A TandemRepeat option and possibly filtering for and against this tag to see where the problematic alleles fall into. Also you may want to check if your alleles VCF contain any structural variants or complex/MNP ones that are not split into primitives which may also cause this issue.
Secondly we want you to make sure that alleles in your target vcf is set to biallelic by splitting them using
bcftools norm -m-both
and try running your workflow with
--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
parameters so that you can get a detailed stacktrace and share with us.
I hope this helps.
-
First thing to mention is that using the workaround of setting --assembly-region-padding to 1000 still results in failure for at least some of the BAM files (others are still running as of this morning).
Next, I tried some of the tests you mentioned using one of the BAM files. After splitting the alleles file into one file for SNPs and one for indels, the calling completed normally for the SNPs but failed for the indels.
Next I used VariantAnnotator on the indels-only alleles file as suggested and filtered the result into two files, one with RPA tags and one without. The calling completed normally for the indels with RPA tags but failed for the indels without RPA tags.
Finally, I performed the suggested bcftools norm command on the indels without RPA tags and tried the variant calling using the resulting alleles file. The variant calling failed again and in fact did not seem to progress as far into the BAM file before failing as when run using the non-normalized alleles. I am pasting in below the stack traces for the runs with non-normalized and normalized no-RPA indels. All of this was done with 4.4.0. I'm working in a shared compute environment and that is the most recent version installed. I've made a request for the sysadmins to install the latest version, but I'm not sure when that will happen.
*********************** non-normalized ***********************
(vgl)mcrepeau@grassi:/share/lanzarolab/seq/variant_calling/Pf_GATK/test_temp$ gatk --java-options "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g" HaplotypeCaller -I /share/lanzarolab/seq/map/Pf_comb_refs/merged_runs/re-mapped_bams/final_bams/DBS_02_1_C.bam --alleles indel_sites_only.noRPA.vcf.gz -R /share/lanzarolab/archive/reference/Pfalciparum.genome.fasta -O test_indel_noRPA.vcf -ERC GVCF
Using GATK jar /afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g -jar /afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar HaplotypeCaller -I /share/lanzarolab/seq/map/Pf_comb_refs/merged_runs/re-mapped_bams/final_bams/DBS_02_1_C.bam --alleles indel_sites_only.noRPA.vcf.gz -R /share/lanzarolab/archive/reference/Pfalciparum.genome.fasta -O test_indel_noRPA.vcf -ERC GVCF
16:14:16.617 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:14:16.689 INFO HaplotypeCaller - ------------------------------------------------------------
16:14:16.694 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:14:16.694 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
16:14:16.695 INFO HaplotypeCaller - Executing as mcrepeau@grassi on Linux v4.15.0-99-generic amd64
16:14:16.695 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v21.0.1+12-LTS-29
16:14:16.695 INFO HaplotypeCaller - Start Date/Time: March 1, 2024, 4:14:16 PM PST
16:14:16.696 INFO HaplotypeCaller - ------------------------------------------------------------
16:14:16.696 INFO HaplotypeCaller - ------------------------------------------------------------
16:14:16.697 INFO HaplotypeCaller - HTSJDK Version: 3.0.5
16:14:16.697 INFO HaplotypeCaller - Picard Version: 3.0.0
16:14:16.698 INFO HaplotypeCaller - Built for Spark Version: 3.3.1
16:14:16.698 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:14:16.698 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:14:16.698 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:14:16.699 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:14:16.699 INFO HaplotypeCaller - Deflater: IntelDeflater
16:14:16.699 INFO HaplotypeCaller - Inflater: IntelInflater
16:14:16.699 INFO HaplotypeCaller - GCS max retries/reopens: 20
16:14:16.699 INFO HaplotypeCaller - Requester pays: disabled
16:14:16.700 INFO HaplotypeCaller - Initializing engine
16:14:17.570 INFO FeatureManager - Using codec VCFCodec to read file file:///share/lanzarolab/seq/variant_calling/Pf_GATK/test_temp/indel_sites_only.noRPA.vcf.gz
16:14:17.625 INFO HaplotypeCaller - Done initializing engine
16:14:17.628 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
16:14:17.648 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to -0.0 for reference-model confidence output
16:14:17.648 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
16:14:17.668 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
16:14:17.851 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
16:14:17.892 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
16:14:18.020 INFO IntelPairHmm - Available threads: 64
16:14:18.020 INFO IntelPairHmm - Requested threads: 4
16:14:18.020 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
16:14:18.084 INFO ProgressMeter - Starting traversal
16:14:18.085 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
16:14:18.832 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position Pf3D7_01_v3:1781 and possibly subsequent; at least 10 samples must have called genotypes
16:14:22.552 WARN DepthPerSampleHC - Annotation will not be calculated at position Pf3D7_01_v3:39318 and possibly subsequent; genotype for sample DBS_02_1_C is not called
16:14:22.552 WARN StrandBiasBySample - Annotation will not be calculated at position Pf3D7_01_v3:39318 and possibly subsequent; genotype for sample DBS_02_1_C is not called
16:14:28.201 INFO ProgressMeter - Pf3D7_01_v3:93676 0.2 420 2491.3
16:14:38.353 INFO ProgressMeter - Pf3D7_01_v3:104356 0.3 480 1421.0
16:14:51.322 INFO ProgressMeter - Pf3D7_01_v3:116801 0.6 560 1011.0
16:15:02.212 INFO ProgressMeter - Pf3D7_01_v3:120042 0.7 580 788.6
16:15:15.314 INFO ProgressMeter - Pf3D7_01_v3:123057 1.0 600 629.1
16:15:29.175 INFO ProgressMeter - Pf3D7_01_v3:124799 1.2 610 514.8
16:15:47.783 INFO ProgressMeter - Pf3D7_01_v3:129492 1.5 640 428.1
16:15:58.901 INFO ProgressMeter - Pf3D7_01_v3:135411 1.7 680 404.7
16:16:10.478 INFO ProgressMeter - Pf3D7_01_v3:140895 1.9 720 384.4
16:16:27.399 INFO ProgressMeter - Pf3D7_01_v3:143500 2.2 740 343.4
16:16:41.973 INFO ProgressMeter - Pf3D7_01_v3:148630 2.4 770 321.1
16:16:50.969 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.029914973
16:16:50.970 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 52.165572267
16:16:50.970 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 70.18 sec
16:16:50.972 INFO HaplotypeCaller - Shutting down engine
[March 1, 2024, 4:16:50?PM PST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 2.57 minutes.
Runtime.totalMemory()=1126170624
java.lang.ArrayIndexOutOfBoundsException: arraycopy: source index -66 out of bounds for byte[432]
at java.base/java.lang.System.arraycopy(Native Method)
at java.base/java.util.Arrays.copyOfRangeByte(Arrays.java:3864)
at java.base/java.util.Arrays.copyOfRange(Arrays.java:3854)
at org.broadinstitute.hellbender.tools.walkers.annotator.TandemRepeat.getNumTandemRepeatUnits(TandemRepeat.java:54)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyRegionTrimmer.trim(AssemblyRegionTrimmer.java:189)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:656)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:271)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)************************* normalized *************************
(vgl)mcrepeau@grassi:/share/lanzarolab/seq/variant_calling/Pf_GATK/test_temp$ gatk --java-options "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g" HaplotypeCaller -I /share/lanzarolab/seq/map/Pf_comb_refs/merged_runs/re-mapped_bams/final_bams/DBS_02_1_C.bam --alleles indel_sites_only.noRPA.norm.vcf.gz -R /share/lanzarolab/archive/reference/Pfalciparum.genome.fasta -O test_indel_noRPA.norm.vcf -ERC GVCF
Using GATK jar /afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g -jar /afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar HaplotypeCaller -I /share/lanzarolab/seq/map/Pf_comb_refs/merged_runs/re-mapped_bams/final_bams/DBS_02_1_C.bam --alleles indel_sites_only.noRPA.norm.vcf.gz -R /share/lanzarolab/archive/reference/Pfalciparum.genome.fasta -O test_indel_noRPA.norm.vcf -ERC GVCF
16:32:08.926 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:32:09.035 INFO HaplotypeCaller - ------------------------------------------------------------
16:32:09.039 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:32:09.040 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
16:32:09.040 INFO HaplotypeCaller - Executing as mcrepeau@grassi on Linux v4.15.0-99-generic amd64
16:32:09.040 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v21.0.1+12-LTS-29
16:32:09.040 INFO HaplotypeCaller - Start Date/Time: March 1, 2024, 4:32:08 PM PST
16:32:09.040 INFO HaplotypeCaller - ------------------------------------------------------------
16:32:09.041 INFO HaplotypeCaller - ------------------------------------------------------------
16:32:09.042 INFO HaplotypeCaller - HTSJDK Version: 3.0.5
16:32:09.042 INFO HaplotypeCaller - Picard Version: 3.0.0
16:32:09.042 INFO HaplotypeCaller - Built for Spark Version: 3.3.1
16:32:09.043 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:32:09.043 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:32:09.043 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:32:09.044 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:32:09.044 INFO HaplotypeCaller - Deflater: IntelDeflater
16:32:09.044 INFO HaplotypeCaller - Inflater: IntelInflater
16:32:09.044 INFO HaplotypeCaller - GCS max retries/reopens: 20
16:32:09.044 INFO HaplotypeCaller - Requester pays: disabled
16:32:09.045 INFO HaplotypeCaller - Initializing engine
16:32:09.351 INFO FeatureManager - Using codec VCFCodec to read file file:///share/lanzarolab/seq/variant_calling/Pf_GATK/test_temp/indel_sites_only.noRPA.norm.vcf.gz
16:32:09.402 INFO HaplotypeCaller - Done initializing engine
16:32:09.405 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
16:32:09.424 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to -0.0 for reference-model confidence output
16:32:09.425 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
16:32:09.446 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
16:32:09.457 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.4.0.0/static/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
16:32:09.503 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
16:32:09.676 INFO IntelPairHmm - Available threads: 64
16:32:09.677 INFO IntelPairHmm - Requested threads: 4
16:32:09.677 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
16:32:09.743 INFO ProgressMeter - Starting traversal
16:32:09.744 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
16:32:10.695 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position Pf3D7_01_v3:1781 and possibly subsequent; at least 10 samples must have called genotypes
16:32:18.167 WARN DepthPerSampleHC - Annotation will not be calculated at position Pf3D7_01_v3:39318 and possibly subsequent; genotype for sample DBS_02_1_C is not called
16:32:18.168 WARN StrandBiasBySample - Annotation will not be calculated at position Pf3D7_01_v3:39318 and possibly subsequent; genotype for sample DBS_02_1_C is not called
16:32:19.825 INFO ProgressMeter - Pf3D7_01_v3:47077 0.2 190 1131.0
16:32:31.513 INFO ProgressMeter - Pf3D7_01_v3:95376 0.4 430 1185.2
16:32:39.611 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.005375431
16:32:39.612 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 1.35759391
16:32:39.612 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 8.72 sec
16:32:39.613 INFO HaplotypeCaller - Shutting down engine
[March 1, 2024, 4:32:39?PM PST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.51 minutes.
Runtime.totalMemory()=1126170624
java.lang.ArrayIndexOutOfBoundsException: arraycopy: source index -45 out of bounds for byte[387]
at java.base/java.lang.System.arraycopy(Native Method)
at java.base/java.util.Arrays.copyOfRangeByte(Arrays.java:3864)
at java.base/java.util.Arrays.copyOfRange(Arrays.java:3854)
at org.broadinstitute.hellbender.tools.walkers.annotator.TandemRepeat.getNumTandemRepeatUnits(TandemRepeat.java:54)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyRegionTrimmer.trim(AssemblyRegionTrimmer.java:189)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:656)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:271)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289) -
Hello Marc Crepeau. It is useful context to hear that you are enocuntering a similar failure with --alleles set. It would help us immensely to track down the issue to have some test data to debug with. Ideally the most minimal chunk of the files necessary to reproduce the error so we can debug what is happening empirically. We have a hunch about what might be causing this error but we need to confirm that it is the correct exception and given the difficulty of reproducing assembly failures it would be best to work from an already failing file.
Here are instructions for uploading data: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671-How-do-I-submit-a-detailed-bug-report. -
Ok, I've uploaded the required files: mini.tar.gz
-
Hi Marc Crepeau. I have looked at your example and there is indeed a problem with our code related to the --alleles mode. Specifically very long deletions can cause indexing issues when they get injected into the HaplotypeCaller in --alleles mode. I have a branch here: https://github.com/broadinstitute/gatk/pull/8731 to fix it which should hopefully make it into the next point release of GATK and resolve your issue.
In the meantime, there are a few workarounds you can consider. Perhaps the biggest one is to filter out from your VCF any deletions that are longer than ~150 bases or so as they are prone to causing this issue in rare cases if they belong to very noisy assembly regions in your reference. You could also try changing the "--assembly-region-padding" argument as it will change the amount of padding that gets used for the specific method that is causing the exception here and thus you are less likely to see the bug. Increasing the padding too much can lead to assembly failures in repetitive regions and slower calling in some cases so be careful about using too high a number, however anything up to ~400 bases should work in many cases.
Thank you for bringing this to our attention. -
That's great! Thanks for your help!
Please sign in to leave a comment.
11 comments