GenotypeGVCFs 'Cannot decompress with GZIP: inflate error: Z_DATA_ERROR'
Hi,
I am running into this issue using GenotypeGVCFs on a GenomicsDB of 412 samples. The error seems to happen after encountering the 'Chromosome CM023249 position 134311 (TileDB column 93840333) has too many alleles' error seen below - for example when I split the chromosome into intervals. See below for version, command and log.
Running in a docker container on Ubuntu 22.04.03 LTS with 128 CPUs and 2.5TB of RAM.
Any help would be much appreciated!
Thank you for the tools and for the assistance,
Tristan
REQUIRED for all errors and issues:
a) GATK version used: v4.1.8.0
b) Exact command used:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx50g -Xms50g -
jar /gatk/gatk-package-4.1.8.0-local.jar GenotypeGVCFs --max-genotype-count 4096 --max-alternate-alleles 151 -R /my_data/cease/genomes/anstep/VectorBase-61_AstephensiUCISS2018_Genome.fasta -V gendb://CM023249_batched.db -L
CM023249 -O CM023249.vcf.gz
c) Entire program log:
18:48:50.949 INFO GenotypeGVCFs - ------------------------------------------------------------
18:48:50.949 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.8.0
18:48:50.949 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
18:48:50.949 INFO GenotypeGVCFs - Executing as root@dd8829b87f98 on Linux v6.5.7-060507-generic amd64
18:48:50.949 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
18:48:50.950 INFO GenotypeGVCFs - Start Date/Time: May 29, 2024 6:48:50 PM GMT
18:48:50.950 INFO GenotypeGVCFs - ------------------------------------------------------------
18:48:50.950 INFO GenotypeGVCFs - ------------------------------------------------------------
18:48:50.950 INFO GenotypeGVCFs - HTSJDK Version: 2.22.0
18:48:50.950 INFO GenotypeGVCFs - Picard Version: 2.22.8
18:48:50.950 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18:48:50.950 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
18:48:50.950 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
18:48:50.950 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
18:48:50.951 INFO GenotypeGVCFs - Deflater: IntelDeflater
18:48:50.951 INFO GenotypeGVCFs - Inflater: IntelInflater
18:48:50.951 INFO GenotypeGVCFs - GCS max retries/reopens: 20
18:48:50.951 INFO GenotypeGVCFs - Requester pays: disabled
18:48:50.951 INFO GenotypeGVCFs - Initializing engine
18:48:51.480 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
18:49:25.253 info NativeGenomicsDB - pid=2608 tid=2609 No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
18:49:25.253 info NativeGenomicsDB - pid=2608 tid=2609 No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
18:49:25.253 info NativeGenomicsDB - pid=2608 tid=2609 No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
18:50:18.650 INFO IntervalArgumentCollection - Processing 88747589 bp from intervals
18:50:18.657 INFO GenotypeGVCFs - Done initializing engine
18:50:18.704 INFO ProgressMeter - Starting traversal
18:50:18.704 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
18:51:02.046 INFO ProgressMeter - CM023249:3505 0.7 1000 1384.4
18:51:12.377 INFO ProgressMeter - CM023249:7236 0.9 3000 3353.7
18:51:34.532 INFO ProgressMeter - CM023249:21604 1.3 9000 7121.4
18:51:45.268 INFO ProgressMeter - CM023249:28265 1.4 13000 9010.7
18:51:55.353 INFO ProgressMeter - CM023249:34265 1.6 19000 11795.4
18:52:09.482 INFO ProgressMeter - CM023249:41265 1.8 26000 14082.2
18:52:19.911 INFO ProgressMeter - CM023249:47265 2.0 32000 15840.7
18:52:30.032 INFO ProgressMeter - CM023249:53265 2.2 38000 17361.1
18:52:41.149 INFO ProgressMeter - CM023249:55265 2.4 40000 16848.7
18:52:52.100 INFO ProgressMeter - CM023249:61265 2.6 46000 17992.6
18:53:02.827 INFO ProgressMeter - CM023249:68265 2.7 53000 19375.7
18:53:14.590 INFO ProgressMeter - CM023249:74265 2.9 59000 20126.7
18:53:25.001 INFO ProgressMeter - CM023249:81265 3.1 66000 21256.4
18:53:35.923 INFO ProgressMeter - CM023249:88265 3.3 73000 22208.8
18:53:47.238 INFO ProgressMeter - CM023249:94265 3.5 79000 22730.1
18:53:58.227 INFO ProgressMeter - CM023249:101265 3.7 86000 23505.5
18:54:09.211 INFO ProgressMeter - CM023249:107265 3.8 92000 23947.2
18:54:20.719 INFO ProgressMeter - CM023249:114267 4.0 99000 24543.9
18:54:32.706 INFO ProgressMeter - CM023249:121267 4.2 106000 25039.2
18:54:44.869 INFO ProgressMeter - CM023249:127267 4.4 112000 25247.5
18:54:55.628 INFO ProgressMeter - CM023249:133267 4.6 118000 25566.6
Chromosome CM023249 position 134311 (TileDB column 93840333) has too many alleles in the combined VCF record : 132 : current limit : 50. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
18:55:06.532 WARN MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location CM023249:134311
18:55:08.671 INFO ProgressMeter - CM023249:135267 4.8 120000 24830.4
[TileDB::utils] Error: (gzip_handle_error) Cannot decompress with GZIP: inflate error: Z_DATA_ERROR
[TileDB::Codec] Error: Could not compress with .
[TileDB::ReadState] Error: Cannot decompress tile.
terminate called after throwing an instance of 'VariantStorageManagerException'
what(): VariantStorageManagerException exception : VariantArrayCellIterator increment failed
TileDB error message : [TileDB::ReadState] Error: Cannot decompress tile
-
You may want to try running GenotypeGVCFs with latest versions of GATK however this error looks like a corrupt genomicsdb import folder therefore the only option would be to start over to recreate it.
Can you tell us how you imported your samples? Did you use incremental updates? If so did you use the consolidate option?
Please sign in to leave a comment.
1 comment