Errors with CNNScoreVariants and HaplotypeCaller: java.nio.BufferUnderflowException
Dear GATK team,
i have two problems, first i get an Error when trying to use CNNScoreVariants with 2D Model settings on a vcf file, which i created with samtools mpileup. This works, when i use CNNScoreVariants with 1D Model settings. Error: java.nio.BufferUnderflowException
The second problem is that i get the same Error when i use the HaplotypeCaller on a bam file, that works fine with smatools mpileup and freebayes.
Many Thanks for helpful tips in advance.
Below are required infos:
a) GATK version used:
GATK: 4.1.9.
To write the workflow i use snakemake with conda:
conda version: 4.9.2
snakemake-minimal: 5.28.0
python: 3.8.3
java: openjdk: 11.0.9.1 2020-11-04
i built the conda gatk environment as described in the gatk-guide here:
gatkpythonpackages in the conda environment are version 0.1
i also activate gatk 4.1.9.0 with the PATH command:
PATH=$PATH:/home/wk/daten1/tools/gatk-4.1.9.0/
b) Exact command used:
for CNNScoreVariants:
"gatk CNNScoreVariants "
"-I data/recal/NIST7035.bam "
"-V data/variance_call/all_samtools.vcf "
"-R chrom/GRCh37_latest_genomic.fasta "
"-O data/variance_call_cnnscored/all_samtools_cnnscored2D.vcf "
"-tensor-type read_tensor"
for HaplotypeCaller:
"gatk HaplotypeCaller "
"-R chrom/GRCh37_latest_genomic.fasta "
"-I data/recal/NIST7035.bam "
"-ERC GVCF "
"-L Exome_kit_targets.bed "
"-D chrom/GRCh37_latest_dbSNP_all.vcf.gz "
"-O data/variance_call/all_haplotypecaller.vcf"
c) Entire error log:
CNNScoreVariants Error:
Using GATK jar /home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CNNScoreVariants -I data/recal/NIST7035.bam -V data/variance_call/all_samtools.vcf -R chrom/GRCh37_latest_genomic.fasta -O data/variance_call_cnnscored/all_samtools_cnnscored2D.vcf -tensor-type read_tensor
16:58:47.161 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 03, 2020 4:58:47 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:58:47.309 INFO CNNScoreVariants - ------------------------------------------------------------
16:58:47.309 INFO CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.1.9.0
16:58:47.309 INFO CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
16:58:47.309 INFO CNNScoreVariants - Executing as wk@wk on Linux v5.4.0-56-generic amd64
16:58:47.310 INFO CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.9.1+1-Ubuntu-0ubuntu1.20.04
16:58:47.310 INFO CNNScoreVariants - Start Date/Time: 3. Dezember 2020 um 16:58:47 MEZ
16:58:47.310 INFO CNNScoreVariants - ------------------------------------------------------------
16:58:47.310 INFO CNNScoreVariants - ------------------------------------------------------------
16:58:47.310 INFO CNNScoreVariants - HTSJDK Version: 2.23.0
16:58:47.310 INFO CNNScoreVariants - Picard Version: 2.23.3
16:58:47.310 INFO CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:58:47.310 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:58:47.311 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:58:47.311 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:58:47.311 INFO CNNScoreVariants - Deflater: IntelDeflater
16:58:47.311 INFO CNNScoreVariants - Inflater: IntelInflater
16:58:47.311 INFO CNNScoreVariants - GCS max retries/reopens: 20
16:58:47.311 INFO CNNScoreVariants - Requester pays: disabled
16:58:47.311 INFO CNNScoreVariants - Initializing engine
16:58:47.481 INFO FeatureManager - Using codec VCFCodec to read file file:///home/wk/data2/praktikum/Variant_calling_uebung/data/variance_call/all_samtools.vcf
16:58:47.506 INFO CNNScoreVariants - Done initializing engine
16:58:47.506 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
16:58:49.164 INFO CNNScoreVariants - Using key:CNN_2D for CNN architecture:/tmp/small_2d.2299821329080019988.json and weights:/tmp/small_2d.4514848924757952291.hd5
16:58:49.761 INFO ProgressMeter - Starting traversal
16:58:49.761 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:58:49.764 INFO CNNScoreVariants - Starting pass 0 through the variants
16:58:49.793 INFO CNNScoreVariants - Done scoring variants with CNN.
16:58:49.793 INFO CNNScoreVariants - Shutting down engine
[3. Dezember 2020 um 16:58:49 MEZ] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=177209344
java.nio.BufferUnderflowException
at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:735)
at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:318)
at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:762)
at htsjdk.samtools.MemoryMappedFileBuffer.readBytes(MemoryMappedFileBuffer.java:34)
at htsjdk.samtools.AbstractBAMFileIndex.readBytes(AbstractBAMFileIndex.java:439)
at htsjdk.samtools.AbstractBAMFileIndex.verifyIndexMagicNumber(AbstractBAMFileIndex.java:376)
at htsjdk.samtools.AbstractBAMFileIndex.<init>(AbstractBAMFileIndex.java:70)
at htsjdk.samtools.AbstractBAMFileIndex.<init>(AbstractBAMFileIndex.java:64)
at htsjdk.samtools.DiskBasedBAMFileIndex.<init>(DiskBasedBAMFileIndex.java:46)
at htsjdk.samtools.BAMFileReader.getIndex(BAMFileReader.java:419)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:405)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:125)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:66)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:407)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:384)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.query(ReadsPathDataSource.java:347)
at org.broadinstitute.hellbender.engine.ReadsContext.iterator(ReadsContext.java:119)
at org.broadinstitute.hellbender.engine.ReadsContext.iterator(ReadsContext.java:102)
at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.transferReadsToPythonViaFifo(CNNScoreVariants.java:437)
at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.firstPassApply(CNNScoreVariants.java:332)
at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.nthPassApply(TwoPassVariantWalker.java:17)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverse$0(MultiplePassVariantWalker.java:40)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverseVariants$1(MultiplePassVariantWalker.java:77)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverseVariants(MultiplePassVariantWalker.java:75)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:40)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
[Thu Dec 3 16:58:50 2020]
Error in rule CNNScoreVariants2D:
jobid: 0
output: data/variance_call_cnnscored/all_samtools_cnnscored2D.vcf
shell:
gatk CNNScoreVariants -I data/recal/NIST7035.bam -V data/variance_call/all_samtools.vcf -R chrom/GRCh37_latest_genomic.fasta -O data/variance_call_cnnscored/all_samtools_cnnscored2D.vcf -tensor-type read_tensor
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
HaplotypeCaller Error:
Using GATK jar /home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller -R chrom/GRCh37_latest_genomic.fasta -I data/recal/NIST7035.bam -ERC GVCF -L chrom/nexterarapidcapture_expandedexome_targetedregions_names_adapted.bed -D chrom/GRCh37_latest_dbSNP_all.vcf.gz -O data/variance_call/all_haplotypecaller.vcf
16:59:59.024 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 03, 2020 4:59:59 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:59:59.146 INFO HaplotypeCaller - ------------------------------------------------------------
16:59:59.146 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.9.0
16:59:59.146 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
16:59:59.146 INFO HaplotypeCaller - Executing as wk@wk on Linux v5.4.0-56-generic amd64
16:59:59.146 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v11.0.9.1+1-Ubuntu-0ubuntu1.20.04
16:59:59.146 INFO HaplotypeCaller - Start Date/Time: 3. Dezember 2020 um 16:59:59 MEZ
16:59:59.146 INFO HaplotypeCaller - ------------------------------------------------------------
16:59:59.146 INFO HaplotypeCaller - ------------------------------------------------------------
16:59:59.147 INFO HaplotypeCaller - HTSJDK Version: 2.23.0
16:59:59.147 INFO HaplotypeCaller - Picard Version: 2.23.3
16:59:59.147 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:59:59.147 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:59:59.147 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:59:59.147 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:59:59.147 INFO HaplotypeCaller - Deflater: IntelDeflater
16:59:59.147 INFO HaplotypeCaller - Inflater: IntelInflater
16:59:59.147 INFO HaplotypeCaller - GCS max retries/reopens: 20
16:59:59.147 INFO HaplotypeCaller - Requester pays: disabled
16:59:59.147 INFO HaplotypeCaller - Initializing engine
16:59:59.310 INFO FeatureManager - Using codec VCFCodec to read file file:///home/wk/data2/praktikum/Variant_calling_uebung/chrom/GRCh37_latest_dbSNP_all.vcf.gz
16:59:59.479 INFO FeatureManager - Using codec BEDCodec to read file file:///home/wk/data2/praktikum/Variant_calling_uebung/chrom/nexterarapidcapture_expandedexome_targetedregions_names_adapted.bed
17:00:00.272 INFO IntervalArgumentCollection - Processing 62085286 bp from intervals
17:00:00.301 WARN IndexUtils - Feature file "/home/wk/data2/praktikum/Variant_calling_uebung/chrom/GRCh37_latest_dbSNP_all.vcf.gz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
17:00:00.505 INFO HaplotypeCaller - Done initializing engine
17:00:00.507 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
17:00:00.514 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
17:00:00.514 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
17:00:00.524 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:00:00.525 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
17:00:00.547 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
17:00:00.547 INFO IntelPairHmm - Available threads: 1
17:00:00.547 INFO IntelPairHmm - Requested threads: 4
17:00:00.547 WARN IntelPairHmm - Using 1 available threads, but 4 were requested
17:00:00.547 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
17:00:00.580 INFO ProgressMeter - Starting traversal
17:00:00.581 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
17:00:00.598 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
17:00:00.598 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
17:00:00.599 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
17:00:00.599 INFO HaplotypeCaller - Shutting down engine
[3. Dezember 2020 um 17:00:00 MEZ] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=278921216
java.nio.BufferUnderflowException
at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:735)
at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:318)
at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:762)
at htsjdk.samtools.MemoryMappedFileBuffer.readBytes(MemoryMappedFileBuffer.java:34)
at htsjdk.samtools.AbstractBAMFileIndex.readBytes(AbstractBAMFileIndex.java:439)
at htsjdk.samtools.AbstractBAMFileIndex.verifyIndexMagicNumber(AbstractBAMFileIndex.java:376)
at htsjdk.samtools.AbstractBAMFileIndex.<init>(AbstractBAMFileIndex.java:70)
at htsjdk.samtools.AbstractBAMFileIndex.<init>(AbstractBAMFileIndex.java:64)
at htsjdk.samtools.CachingBAMFileIndex.<init>(CachingBAMFileIndex.java:56)
at htsjdk.samtools.BAMFileReader.getIndex(BAMFileReader.java:418)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:405)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:125)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:66)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:407)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:331)
at org.broadinstitute.hellbender.engine.MultiIntervalLocalReadShard.iterator(MultiIntervalLocalReadShard.java:134)
at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.<init>(AssemblyRegionIterator.java:86)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:188)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
[Thu Dec 3 17:00:00 2020]
Error in rule haplotypecaller:
jobid: 0
output: data/variance_call/all_haplotypecaller.vcf
shell:
gatk HaplotypeCaller -R chrom/GRCh37_latest_genomic.fasta -I data/recal/NIST7035.bam -ERC GVCF -L chrom/nexterarapidcapture_expandedexome_targetedregions_names_adapted.bed -D chrom/GRCh37_latest_dbSNP_all.vcf.gz -O data/variance_call/all_haplotypecaller.vcf
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
-
Hello WKaiser, could you run some troubleshooting steps to determine what the issue might be?
-
Validate your Sam or Bam file with ValidateSam following this tutorial.
-
Validate your Variant File with ValidateVariants using default parameters to check with strict validation.
- Re-index your files and verify GATK is using the correct index.
Please also check that you have enough memory for these jobs
- Specify a --tmp-dir that has room for all necessary temporary files.
- Specify java memory usage using java option -Xmx.
-
-
Hallo Genevieve Brandt,
The problem is solved. I had an older, faulty and a new, correct version of the bai for the bam file in the same folder and used the older version in the command. After adjusting the code and removing the older version from the folder it worked.
Thank you very much for the quick response.
-
WKaiser thanks for the update and posting the solution, this will be helpful for other GATK users in the future! Glad the problem is solved.
Please sign in to leave a comment.
3 comments