Error in GenomicsDBImport: Invalid deflate block found
AnsweredHello GATK community!
I keep getting an error for running GenomicsDBImport on only one of my chromosomes. The other 37 chromosomes worked perfectly and wondering if the forum might be able to suggest some troubleshooting direction.
Within the GenomicsDBImport folder of the chromosome I'm having issues with, I am missing the callset.json file. All other files are present except the callset.json file.
I tried rerunning GenomicsDBImport using more memory (the current script below), but it still did not work.
a) GATK version used: GATK/4.2.3.0
b) Exact command used:
#!/bin/bash -l
#SBATCH --job-name=36gbdb
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --time 10-00:00:00
#SBATCH --mem=70GB
#SBATCH -A ctbrowngrp
#SBATCH -p bmh
#SBATCH -o /home/hennelly/projects/GATK/slurmoutJan102021/GBImport_Dec23_chr36.out
#SBATCH -e /home/hennelly/projects/GATK/slurmoutJan102021/GBImport_Dec23_chr36.err
module load R
module load maven
module load java
module load GATK
SAMPLEMAP=/home/hennelly/projects/GATK/scripts/mygcflist_Oct222021.txt
OUTDIR=/home/hennelly/projects/GATK/GenomeDBImport_Nov142021/
TEMPDIR=/home/hennelly/projects/GATK/scratchJan62022_genotype/
gatk --java-options "-Xmx70g -Xms70g" \
GenomicsDBImport \
--genomicsdb-workspace-path ${OUTDIR}chr36_gvcf_db_try2 \
--batch-size 50 \
-L chr36 \
--sample-name-map ${SAMPLEMAP} \
--tmp-dir ${TEMPDIR}
c) Entire program log:
Module R/3.6.3 loaded
Module maven/3.2.3 loaded
Module JAVA 1.8 Loaded.
Module GATK/4.2.3.0 loaded
17:30:32.366 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 23, 2022 5:30:32 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:30:32.656 INFO GenomicsDBImport - ------------------------------------------------------------
17:30:32.656 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.3.0
17:30:32.656 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
17:30:32.656 INFO GenomicsDBImport - Executing as hennelly@bm4 on Linux v5.4.0-87-generic amd64
17:30:32.657 INFO GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_20-b26
17:30:32.657 INFO GenomicsDBImport - Start Date/Time: February 23, 2022 5:30:32 PM PST
17:30:32.657 INFO GenomicsDBImport - ------------------------------------------------------------
17:30:32.657 INFO GenomicsDBImport - ------------------------------------------------------------
17:30:32.658 INFO GenomicsDBImport - HTSJDK Version: 2.24.1
17:30:32.658 INFO GenomicsDBImport - Picard Version: 2.25.4
17:30:32.658 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
17:30:32.658 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:30:32.658 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:30:32.658 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:30:32.658 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:30:32.658 INFO GenomicsDBImport - Deflater: IntelDeflater
17:30:32.658 INFO GenomicsDBImport - Inflater: IntelInflater
17:30:32.659 INFO GenomicsDBImport - GCS max retries/reopens: 20
17:30:32.659 INFO GenomicsDBImport - Requester pays: disabled
17:30:32.659 INFO GenomicsDBImport - Initializing engine
17:30:33.529 INFO IntervalArgumentCollection - Processing 30810995 bp from intervals
17:30:33.531 INFO GenomicsDBImport - Done initializing engine
17:30:33.766 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.2-71dc25d
17:30:33.771 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/hennelly/projects/GATK/GenomeDBImport_Nov142021/chr36_gvcf_db_try2/vidmap.json
17:30:33.771 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/hennelly/projects/GATK/GenomeDBImport_Nov142021/chr36_gvcf_db_try2/callset.json
17:30:33.771 INFO GenomicsDBImport - Complete VCF Header will be written to /home/hennelly/projects/GATK/GenomeDBImport_Nov142021/chr36_gvcf_db_try2/vcfheader.vcf
17:30:33.771 INFO GenomicsDBImport - Importing to workspace - /home/hennelly/projects/GATK/GenomeDBImport_Nov142021/chr36_gvcf_db_try2
17:30:45.163 WARN IntelInflater - Zero Bytes Written : 0
17:31:09.332 INFO GenomicsDBImport - Importing batch 1 with 50 samples
13:06:14.396 INFO GenomicsDBImport - Done importing batch 1/2
13:06:19.240 WARN IntelInflater - Zero Bytes Written : 0
13:06:34.992 INFO GenomicsDBImport - Importing batch 2 with 43 samples
19:16:12.135 INFO GenomicsDBImport - Shutting down engine
[February 24, 2022 7:16:12 PM PST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 1,545.66 minutes.
Runtime.totalMemory()=75139907584
java.lang.RuntimeException: Invalid deflate block found.
at com.intel.gkl.compression.IntelInflater.inflateNative(Native Method)
at com.intel.gkl.compression.IntelInflater.inflate(IntelInflater.java:174)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:145)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:241)
at htsjdk.tribble.readers.TabixReader.readLine(TabixReader.java:215)
at htsjdk.tribble.readers.TabixReader.access$300(TabixReader.java:48)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:434)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:1016)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:1007)
at org.genomicsdb.importer.GenomicsDBImporterStreamWrapper.next(GenomicsDBImporterStreamWrapper.java:110)
at org.genomicsdb.importer.GenomicsDBImporter.doSingleImport(GenomicsDBImporter.java:578)
at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$4(GenomicsDBImporter.java:730)
at org.genomicsdb.importer.GenomicsDBImporter$$Lambda$79/1034879960.get(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:476)
at java.util.concurrent.CompletableFuture$Async.run(CompletableFuture.java:428)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Using GATK jar /share/apps/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx70g -Xms70g -jar /share/apps/gatk-4.2.3.0/gatk-package-4.2.3.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /home/hennelly/projects/GATK/GenomeDBImport_Nov142021/chr36_gvcf_db_try2 --batch-size 50 -L chr36 --sample-name-map /home/hennelly/projects/GATK/scripts/mygcflist_Oct222021.txt --tmp-dir /home/hennelly/projects/GATK/scratchJan62022_genotype/
See forum topic details at forum guidelines page: https://gatk.broadinstitute.org/hc/en-us/articles/360053845952-Forum-Guidelines
-
Hi Lauren Hennelly,
Thanks for writing into the forum! Let's see if we can get this figured out.
I think there might be a problem with one of your input files, that it is malformed. You can check your input GVCFs for this chromosome with ValidateVariants. You can also try checking the logs for HaplotypeCaller when you created these GVCFs to see if there were any issues. It looks like the problem is in the 2nd batch, to narrow your search down.
Let me know what you find and if you have any other questions.
Best,
Genevieve
-
Hi GATK community,
I have a similar problem with a 120 human samples cohort, in all the chromosomes running GenomicsDBImport worked fine except for chromosome chr1. However, my command is slightly different.
GATK version used:
> gatk --version
Using GATK jar /home/cgonzalez/tools/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/cgonzalez/tools/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar --version
Picked up JAVA_TOOL_OPTIONS: -Djava.io.tmpdir=/scratch
The Genome Analysis Toolkit (GATK) v4.2.2.0
HTSJDK Version: 2.24.1
Picard Version: 2.25.4Command:
gatk GenomicsDBImport -V ${sample1} -V ${sample2} ... -V ${sampleN} --genomicsdb-workspace-path ${database} -L ${interval_file} --tmp-dir ${params.tempdir}
When I run it for the whole chromosome it gives me this output:
LOG
09:57:09.063 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/cgonzalez/tools/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 01, 2022 9:57:09 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:57:09.277 INFO GenomicsDBImport - ------------------------------------------------------------
09:57:09.279 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.2.0
09:57:09.279 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
09:57:09.280 INFO GenomicsDBImport - Executing as cgonzalez@compute-2-2.hpc.lji.org on Linux v3.10.0-1160.42.2.el7.x86_64 amd64
09:57:09.281 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_232-b09
09:57:09.282 INFO GenomicsDBImport - Start Date/Time: March 1, 2022 9:57:08 AM PST
09:57:09.283 INFO GenomicsDBImport - ------------------------------------------------------------
09:57:09.283 INFO GenomicsDBImport - ------------------------------------------------------------
09:57:09.284 INFO GenomicsDBImport - HTSJDK Version: 2.24.1
09:57:09.285 INFO GenomicsDBImport - Picard Version: 2.25.4
09:57:09.285 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
09:57:09.286 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:57:09.287 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:57:09.288 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:57:09.289 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:57:09.289 INFO GenomicsDBImport - Deflater: IntelDeflater
09:57:09.290 INFO GenomicsDBImport - Inflater: IntelInflater
09:57:09.291 INFO GenomicsDBImport - GCS max retries/reopens: 20
09:57:09.291 INFO GenomicsDBImport - Requester pays: disabled
09:57:09.292 INFO GenomicsDBImport - Initializing engine
09:58:11.755 INFO FeatureManager - Using codec BEDCodec to read file file:///home/cgonzalez/myscratch/R24/wgs/tmp/chr1.bed
09:58:11.769 INFO IntervalArgumentCollection - Processing 248956421 bp from intervals
09:58:11.774 INFO GenomicsDBImport - Done initializing engine
09:58:12.170 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.1-d59e886
09:58:12.178 INFO GenomicsDBImport - Vid Map JSON file will be written to /mnt/beegfs/lts/cgonzalez/R24/wgs/DICE_Cancer_WGS/2.Processed_data/vcf_database_chr1/vidmap.json
09:58:12.179 INFO GenomicsDBImport - Callset Map JSON file will be written to /mnt/beegfs/lts/cgonzalez/R24/wgs/DICE_Cancer_WGS/2.Processed_data/vcf_database_chr1/callset.json
09:58:12.180 INFO GenomicsDBImport - Complete VCF Header will be written to /mnt/beegfs/lts/cgonzalez/R24/wgs/DICE_Cancer_WGS/2.Processed_data/vcf_database_chr1/vcfheader.vcf
09:58:12.180 INFO GenomicsDBImport - Importing to workspace - /mnt/beegfs/lts/cgonzalez/R24/wgs/DICE_Cancer_WGS/2.Processed_data/vcf_database_chr1
09:58:53.786 INFO GenomicsDBImport - Importing batch 1 with 120 samples
14:00:06.343 INFO GenomicsDBImport - Shutting down engine
[March 6, 2022 2:00:06 PM PST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 7,442.96 minutes.
Runtime.totalMemory()=6135742464
java.lang.RuntimeException: Invalid deflate block found.
at com.intel.gkl.compression.IntelInflater.inflateNative(Native Method)
at com.intel.gkl.compression.IntelInflater.inflate(IntelInflater.java:174)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:145)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:241)
at htsjdk.tribble.readers.TabixReader.readLine(TabixReader.java:215)
at htsjdk.tribble.readers.TabixReader.access$300(TabixReader.java:48)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:434)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:984)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:975)
at org.genomicsdb.importer.GenomicsDBImporterStreamWrapper.next(GenomicsDBImporterStreamWrapper.java:110)
at org.genomicsdb.importer.GenomicsDBImporter.doSingleImport(GenomicsDBImporter.java:578)
at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$4(GenomicsDBImporter.java:730)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I tried to split the chr1 into 6 chunks and ran it again with this interval file:> cat hg38.bed
chr1 1 41492737
chr1 41492738 82985475
chr1 82985476 124478213
chr1 124478214 165970951
chr1 165970952 207463689
chr1 207463690 248956422And it is crashing importing the batch samples during the 5th interval:
LOG
15:21:05.720 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/cgonzalez/tools/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 15, 2022 3:21:05 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:21:05.889 INFO GenomicsDBImport - ------------------------------------------------------------
15:21:05.890 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.2.0
15:21:05.891 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
15:21:05.891 INFO GenomicsDBImport - Executing as cgonzalez@gpu-3-2.hpc.lji.org on Linux v3.10.0-1160.42.2.el7.x86_64 amd64
15:21:05.892 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_232-b09
15:21:05.892 INFO GenomicsDBImport - Start Date/Time: March 15, 2022 3:21:05 PM PDT
15:21:05.893 INFO GenomicsDBImport - ------------------------------------------------------------
15:21:05.893 INFO GenomicsDBImport - ------------------------------------------------------------
15:21:05.894 INFO GenomicsDBImport - HTSJDK Version: 2.24.1
15:21:05.895 INFO GenomicsDBImport - Picard Version: 2.25.4
15:21:05.896 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
15:21:05.896 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:21:05.897 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:21:05.897 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:21:05.897 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:21:05.898 INFO GenomicsDBImport - Deflater: IntelDeflater
15:21:05.899 INFO GenomicsDBImport - Inflater: IntelInflater
15:21:05.899 INFO GenomicsDBImport - GCS max retries/reopens: 20
15:21:05.900 INFO GenomicsDBImport - Requester pays: disabled
15:21:05.900 INFO GenomicsDBImport - Initializing engine
15:22:42.658 INFO FeatureManager - Using codec BEDCodec to read file file:///home/cgonzalez/myscratch/R24/wgs/tmp/hg38.bed
15:22:42.673 INFO IntervalArgumentCollection - Processing 248956416 bp from intervals
15:22:42.678 INFO GenomicsDBImport - Done initializing engine
15:22:43.300 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.1-d59e886
15:22:43.309 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/cgonzalez/myscratch/R24/wgs/tmp/vcf_database_chr1/vidmap.json
15:22:43.310 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/cgonzalez/myscratch/R24/wgs/tmp/vcf_database_chr1/callset.json
15:22:43.310 INFO GenomicsDBImport - Complete VCF Header will be written to /home/cgonzalez/myscratch/R24/wgs/tmp/vcf_database_chr1/vcfheader.vcf
15:22:43.311 INFO GenomicsDBImport - Importing to workspace - /home/cgonzalez/myscratch/R24/wgs/tmp/vcf_database_chr1
15:24:16.949 INFO GenomicsDBImport - Importing batch 1 with 120 samples
03:40:59.710 INFO GenomicsDBImport - Importing batch 1 with 120 samples
07:58:09.631 INFO GenomicsDBImport - Importing batch 1 with 120 samples
09:13:20.816 INFO GenomicsDBImport - Importing batch 1 with 120 samples
23:05:49.016 INFO GenomicsDBImport - Importing batch 1 with 120 samples
23:29:03.385 INFO GenomicsDBImport - Shutting down engine
[March 20, 2022 11:29:03 PM PDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 7,687.96 minutes.
Runtime.totalMemory()=10736893952
java.lang.RuntimeException: Invalid deflate block found.
at com.intel.gkl.compression.IntelInflater.inflateNative(Native Method)
at com.intel.gkl.compression.IntelInflater.inflate(IntelInflater.java:174)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:145)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:241)
at htsjdk.tribble.readers.TabixReader.readLine(TabixReader.java:215)
at htsjdk.tribble.readers.TabixReader.access$300(TabixReader.java:48)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:434)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:984)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:975)
at org.genomicsdb.importer.GenomicsDBImporterStreamWrapper.next(GenomicsDBImporterStreamWrapper.java:110)
at org.genomicsdb.importer.GenomicsDBImporter.doSingleImport(GenomicsDBImporter.java:578)
at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$4(GenomicsDBImporter.java:730)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I'm not sure if it is a problem with any of my input files in this specific region, would you know what could be the issue?Lauren Hennelly, could you solve it?
-
Hi Genevieve and Cristian,
With Genevieve's advice using ValidateVariants, I did end up solving the issue!I have 95 individuals that I used for the GATK pipeline. After I ran ValidateVariants on the GVCF files of all my individuals for chromosome 36, the output of ValidateVariants showed that only one of the individuals gave the error of "Invalid deflate block found." There were no other issues with any other sample for chromosome 36.
I just ended up removing that individual from my dataset, and after rerunning GenomicsDBImport without that individual, and it worked perfectly.
Here's the ValidateVariants command I used:
echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
GVCF=$(sed "${SLURM_ARRAY_TASK_ID}q;d" list.txt)
echo ${GVCF}
gatk ValidateVariants \
-V /home/hennelly/projects/GATK/GVCFfiles/${GVCF} \
-L chr36 \
-R /home/hennelly/fastqfiles/DogRefwithY/genomes/canFam3_withY.fa \
-gvcfHope that helps!
--Lauren
-
Thank you for your input and sharing how you solved the issue Lauren Hennelly! Cristian Gonzalez-Colin have you tried running ValidateVariants on your GVCF input files for Chromosome 1 to pinpoint any potential issues?
Kind regards,
Pamela
-
Thanks, Pamela Bretscher and Lauren Hennelly for your input I could find the problematic donor.
Hi Pamela Bretscher as Lauren example, the validation for this donor gave me this error:
14:24:37.501 WARN ValidateVariants - Current interval chr1:206489377-206489377 overlaps previous interval ending at 206489377
14:24:37.733 WARN ValidateVariants - Current interval chr1:206619239-206619239 overlaps previous interval ending at 206619239
14:24:38.390 WARN ValidateVariants - Current interval chr1:207100059-207100063 overlaps previous interval ending at 207100063
14:24:38.390 WARN ValidateVariants - Current interval chr1:207100091-207100091 overlaps previous interval ending at 207100094
14:24:38.390 WARN ValidateVariants - Current interval chr1:207100092-207100094 overlaps previous interval ending at 207100094
14:24:38.422 WARN ValidateVariants - Current interval chr1:207118166-207118166 overlaps previous interval ending at 207118169
14:24:38.422 WARN ValidateVariants - Current interval chr1:207118167-207118167 overlaps previous interval ending at 207118169
14:24:38.422 WARN ValidateVariants - Current interval chr1:207118168-207118169 overlaps previous interval ending at 207118169
14:24:39.254 INFO ValidateVariants - Shutting down engine
[March 24, 2022 2:24:39 PM PDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 4.32 minutes.
Runtime.totalMemory()=375697408
java.lang.RuntimeException: Invalid deflate block found.
at com.intel.gkl.compression.IntelInflater.inflateNative(Native Method)
at com.intel.gkl.compression.IntelInflater.inflate(IntelInflater.java:174)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:145)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:241)
at htsjdk.tribble.readers.TabixReader.readLine(TabixReader.java:215)
at htsjdk.tribble.readers.TabixReader.access$300(TabixReader.java:48)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:434)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.loadNextFeature(FeatureIntervalIterator.java:98)
at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.loadNextNovelFeature(FeatureIntervalIterator.java:74)
at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.next(FeatureIntervalIterator.java:62)
at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.next(FeatureIntervalIterator.java:24)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)I checked the log from the HaplotypeCaller step and didn't find any other issue when the GVCF is generated. Do you know how can I save this donor data?
Thanks again,
Cristian
-
Thank you for working through this suggestion and finding the problematic sample. There are a few things you can try to pinpoint the issue with this donor and attempt to keep the data. The first thing you can try is the --bypass-feature-reader argument when running GenomicsDBImport. However, if the issue is with the zip blocks itself, then there may not be much to do. You can try running PrintBGZFBlockInformation on this file to pinpoint where the error might be. If nothing else works, you can try to unzip and recompress the file with bgzip or reindex the file with tabix to see if GATK can import the file. These suggestions may help to find the potential error in the file, but it is most likely the simplest solution to remove this donor from the analysis. Please let me know if you have any questions.
Kind regards,
Pamela
Please sign in to leave a comment.
6 comments