Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenotypeGVCFs error bgzf_open: Assertion `compressBound(0xff00) < 0x10000' failed.



  • Avatar
    Gökalp Çelik

    Hi Hugo DENIS

    This clearly looks like a corrupt GenomicsDBImport instance. You may need to perform the import operation again and may need to use a different destination drive/location for this one. Unfortunately we do not have a tool to check the integrity of GenomicsDB folder. You may want to try the below parameter to see if it help you get an import that works.

    --genomicsdb-shared-posixfs-optimizations true

    I hope this helps.


    Comment actions Permalink
  • Avatar
    Hugo DENIS

    Hi, thank you for your answer. 

    I tried your suggestions, changing the destination drive and adding the option, as well as increasing the number of samples, but unfortunately it does not seem to solve the issue. Here are the commands and log files. 

    gatk GenomicsDBImport --genomicsdb-workspace-path "/home/hdenis/Gatk/${CONTIG}" -L $CONTIG --sample-name-map "${INDIR}aspat_gvcf_clean.sample_map" --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp --reader-threads 2 --genomicsdb-shared-posixfs-optimizations true --batch-size 50

    gatk GenotypeGVCFs -R $REF_3 -V "gendb://${CONTIG}" -O "${OUTDIR}aspat_clean_${CONTIG}.vcf.gz" --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp
    Using GATK jar /home/hdenis/Programs/gatk-
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hdenis/Programs/gatk- GenomicsDBImport --genomicsdb-workspace-path /home/hdenis/Gatk/scaffold_1 -L scaffold_1 --sample-name-map /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/aspat_gvcf_clean.sample_map --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp --reader-threads 2 --genomicsdb-shared-posixfs-optimizations true --batch-size 50
    08:53:48.050 INFO  NativeLibraryLoader - Loading from jar:file:/home/hdenis/Programs/gatk-!/com/intel/gkl/native/
    08:53:48.168 INFO  GenomicsDBImport - ------------------------------------------------------------
    08:53:48.171 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.5.0.0
    08:53:48.171 INFO  GenomicsDBImport - For support and documentation go to
    08:53:48.171 INFO  GenomicsDBImport - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
    08:53:48.171 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
    08:53:48.172 INFO  GenomicsDBImport - Start Date/Time: June 21, 2024 at 8:53:48 AM NCT
    08:53:48.172 INFO  GenomicsDBImport - ------------------------------------------------------------
    08:53:48.172 INFO  GenomicsDBImport - ------------------------------------------------------------
    08:53:48.173 INFO  GenomicsDBImport - HTSJDK Version: 4.1.0
    08:53:48.173 INFO  GenomicsDBImport - Picard Version: 3.1.1
    08:53:48.174 INFO  GenomicsDBImport - Built for Spark Version: 3.5.0
    08:53:48.174 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:53:48.174 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:53:48.174 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:53:48.175 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:53:48.175 INFO  GenomicsDBImport - Deflater: IntelDeflater
    08:53:48.175 INFO  GenomicsDBImport - Inflater: IntelInflater
    08:53:48.175 INFO  GenomicsDBImport - GCS max retries/reopens: 20
    08:53:48.175 INFO  GenomicsDBImport - Requester pays: disabled
    08:53:48.175 INFO  GenomicsDBImport - Initializing engine
    08:53:48.363 INFO  IntervalArgumentCollection - Processing 38134904 bp from intervals
    08:53:48.364 INFO  GenomicsDBImport - Done initializing engine
    08:53:48.509 INFO  GenomicsDBLibLoader - GenomicsDB native library version : 1.5.1-84e800e
    08:53:48.510 INFO  GenomicsDBImport - Vid Map JSON file will be written to /home/hdenis/Gatk/scaffold_1/vidmap.json
    08:53:48.510 INFO  GenomicsDBImport - Callset Map JSON file will be written to /home/hdenis/Gatk/scaffold_1/callset.json
    08:53:48.511 INFO  GenomicsDBImport - Complete VCF Header will be written to /home/hdenis/Gatk/scaffold_1/vcfheader.vcf
    08:53:48.512 INFO  GenomicsDBImport - Importing to workspace - /home/hdenis/Gatk/scaffold_1
    08:53:48.719 INFO  GenomicsDBImport - Starting batch input file preload
    08:53:48.841 INFO  GenomicsDBImport - Finished batch preload
    08:53:48.843 INFO  GenomicsDBImport - Importing batch 1 with 4 samples
    08:57:18.522 INFO  GenomicsDBImport - Done importing batch 1/1
    08:57:18.526 INFO  GenomicsDBImport - Import of all batches to GenomicsDB completed!
    08:57:18.526 INFO  GenomicsDBImport - Shutting down engine
    [June 21, 2024 at 8:57:18 AM NCT] done. Elapsed time: 3.51 minutes.
    Using GATK jar /home/hdenis/Programs/gatk-
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hdenis/Programs/gatk- GenotypeGVCFs -R /nvme/disk0/lecellier_data/WGS_GBR_data/Ref_genomes/Amil_scaffolds_final_v3.fa -V gendb://scaffold_1 -O /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/Vcf_files/aspat_clean_scaffold_1.vcf.gz --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp
    08:57:20.266 INFO  NativeLibraryLoader - Loading from jar:file:/home/hdenis/Programs/gatk-!/com/intel/gkl/native/
    08:57:20.390 INFO  GenotypeGVCFs - ------------------------------------------------------------
    08:57:20.392 INFO  GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.5.0.0
    08:57:20.393 INFO  GenotypeGVCFs - For support and documentation go to
    08:57:20.393 INFO  GenotypeGVCFs - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
    08:57:20.393 INFO  GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
    08:57:20.393 INFO  GenotypeGVCFs - Start Date/Time: June 21, 2024 at 8:57:20 AM NCT
    08:57:20.393 INFO  GenotypeGVCFs - ------------------------------------------------------------
    08:57:20.393 INFO  GenotypeGVCFs - ------------------------------------------------------------
    08:57:20.395 INFO  GenotypeGVCFs - HTSJDK Version: 4.1.0
    08:57:20.395 INFO  GenotypeGVCFs - Picard Version: 3.1.1
    08:57:20.395 INFO  GenotypeGVCFs - Built for Spark Version: 3.5.0
    08:57:20.395 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:57:20.395 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:57:20.396 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:57:20.397 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:57:20.397 INFO  GenotypeGVCFs - Deflater: IntelDeflater
    08:57:20.398 INFO  GenotypeGVCFs - Inflater: IntelInflater
    08:57:20.398 INFO  GenotypeGVCFs - GCS max retries/reopens: 20
    08:57:20.398 INFO  GenotypeGVCFs - Requester pays: disabled
    08:57:20.398 INFO  GenotypeGVCFs - Initializing engine
    08:57:20.696 INFO  GenomicsDBLibLoader - GenomicsDB native library version : 1.5.1-84e800e
    java: /build/GenomicsDB/dependencies/htslib/bgzf.c:449: bgzf_open: Assertion `compressBound(0xff00) < 0x10000' failed.

    I don't see anything in the log file that suggests the GenomicDB import has failed. 

    I also tried to check the vcf files generated by HaplotypeCaller. An error is raised but it seems that is not an issue (

    gatk ValidateVariants -V /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz

    Using GATK jar /home/hdenis/Programs/gatk-
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hdenis/Programs/gatk- ValidateVariants -V /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz
    08:45:42.630 INFO  NativeLibraryLoader - Loading from jar:file:/home/hdenis/Programs/gatk-!/com/intel/gkl/native/
    08:45:42.761 INFO  ValidateVariants - ------------------------------------------------------------
    08:45:42.763 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.5.0.0
    08:45:42.763 INFO  ValidateVariants - For support and documentation go to
    08:45:42.763 INFO  ValidateVariants - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
    08:45:42.763 INFO  ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
    08:45:42.764 INFO  ValidateVariants - Start Date/Time: June 21, 2024 at 8:45:42 AM NCT
    08:45:42.764 INFO  ValidateVariants - ------------------------------------------------------------
    08:45:42.764 INFO  ValidateVariants - ------------------------------------------------------------
    08:45:42.764 INFO  ValidateVariants - HTSJDK Version: 4.1.0
    08:45:42.765 INFO  ValidateVariants - Picard Version: 3.1.1
    08:45:42.765 INFO  ValidateVariants - Built for Spark Version: 3.5.0
    08:45:42.765 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:45:42.765 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:45:42.766 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:45:42.766 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:45:42.766 INFO  ValidateVariants - Deflater: IntelDeflater
    08:45:42.766 INFO  ValidateVariants - Inflater: IntelInflater
    08:45:42.766 INFO  ValidateVariants - GCS max retries/reopens: 20
    08:45:42.766 INFO  ValidateVariants - Requester pays: disabled
    08:45:42.766 INFO  ValidateVariants - Initializing engine
    08:45:42.839 INFO  FeatureManager - Using codec VCFCodec to read file file:///nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz
    08:45:42.941 INFO  ValidateVariants - Done initializing engine
    08:45:42.942 WARN  ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
    08:45:42.942 WARN  ValidateVariants - Other possible validations will still be performed
    08:45:42.942 WARN  ValidateVariants - REF validation cannot be done because no reference file was provided
    08:45:42.942 WARN  ValidateVariants - Other possible validations will still be performed
    08:45:42.942 INFO  ProgressMeter - Starting traversal
    08:45:42.943 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
    08:45:42.952 INFO  ValidateVariants - Shutting down engine
    [June 21, 2024 at 8:45:42 AM NCT] done. Elapsed time: 0.01 minutes.

    A USER ERROR has occurred: Input /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz fails strict validation of type ALL: one or more of the ALT allele(s) for the record at position scaffold_1:75 are not observed at all in the sample genotypes

    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    Here is the content of my map file, in case you notice something wrong. 

    RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3    /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz
    RRAP-ECT01-2022-Aspat-CBHE-1719_L1_pe_aln_Amilleporav3    /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1719_L1_pe_aln_Amilleporav3.g.vcf.gz
    RRAP-ECT01-2022-Aspat-CBHE-1720_L2_pe_aln_Amilleporav3    /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1720_L2_pe_aln_Amilleporav3.g.vcf.gz
    RRAP-ECT01-2022-Aspat-CBHE-1721_L2_pe_aln_Amilleporav3    /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1721_L2_pe_aln_Amilleporav3.g.vcf.gz

    The java version I am using 

    java --version
    openjdk 17.0.2 2022-01-18
    OpenJDK Runtime Environment (build 17.0.2+8-86)
    OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)

    Is there something else I could try ? 




    Comment actions Permalink
  • Avatar
    Hugo DENIS

    Hi again, 

    I have tried running the same code and inputs on a different machine and it worked which tends to suggest that the issue is indeed related to dependencies conflicts. 

    I have seen this post that suggests a problem when htslib is used in conjunction with zlib-ng:

    Although zlib-ng is installed on the cluster, it is not loaded. I have also tried to install and load version of vanilla zlib but it did not solve the problem. 

    Is there a specific way to install gatk that would solve this conflict ? 

    I have downloaded the latest gatk version zip here:

    and installed samtools separately following gatk installation recommandations


    samtools --version
    samtools 1.20
    Using htslib 1.20
    Copyright (C) 2024 Genome Research Ltd.

    Samtools compilation details:
        Features:       build=configure curses=yes
        CC:             gcc
        CFLAGS:         -Wall -g -O2
        HTSDIR:         htslib-1.20
        CURSES_LIB:     -lncursesw

    HTSlib compilation details:
        Features:       build=configure libcurl=yes S3=yes GCS=yes libdeflate=no lzma=yes bzip2=yes plugins=no htscodecs=1.6.0
        CC:             gcc
        CFLAGS:         -Wall -g -O2 -fvisibility=hidden
        LDFLAGS:        -fvisibility=hidden

    HTSlib URL scheme handlers present:
        built-in:    preload, data, file
        S3 Multipart Upload:         s3w, s3w+https, s3w+http
        Amazon S3:   s3+https, s3+http, s3
        Google Cloud Storage:        gs+http, gs+https, gs
        libcurl:     imaps, pop3, gophers, http, smb, gopher, ftps, imap, smtp, smtps, rtsp, ftp, telnet, mqtt, https, smbs, tftp, pop3s, dict
        crypt4gh-needed:     crypt4gh
      mem:         mem

    Any help would be greatly appreciated, 

    Thank you very much !

    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    I am not sure that GATK depends on any of the system installed libraries. It uses libdeflate and libinflate from intel GKL therefore zlib-ng or vanilla being installed on the system should have nothing to do with this issue. 

    I will consult with devs around this issue. In the meantime you may try using our docker image as an alternate method for installing GATK or you may want to try running the same commands using the master branch compiled from our github source. Beware that the last recommendation is just for seeing if the issue persists on our latest code. We do not recommend directly running our master branch for production purposes unless we tell that it is OK to do so. 

    I hope this helps. 

    Comment actions Permalink
  • Avatar
    Hugo DENIS


    Thank you for your responsiveness. 

    Docker is not installed on the cluster I am working with, I have asked administrators about it. In the meantime I tried to use the last github master branch which reproduced the same error (see below). 

    Thank you for your help

    [hdenis@R740xd GenomicDB]$ /home/hdenis/gatk/gatk --java-options "-Xmx4g" GenotypeGVCFs -R /nvme/disk0/lecellier_data/WGS_GBR_data/Ref_genomes/Amil_scaffolds_final_v3.fa -V "gendb://scaffold_1" -O "/nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/Vcf_files/aspat_clean_scaffold_1.vcf.gz" --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp
    Using GATK jar /home/hdenis/gatk/build/libs/gatk-package-
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /home/hdenis/gatk/build/libs/gatk-package- GenotypeGVCFs -R /nvme/disk0/lecellier_data/WGS_GBR_data/Ref_genomes/Amil_scaffolds_final_v3.fa -V gendb://scaffold_1 -O /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/Vcf_files/aspat_clean_scaffold_1.vcf.gz --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp
    08:35:44.538 INFO  NativeLibraryLoader - Loading from jar:file:/home/hdenis/gatk/build/libs/gatk-package-!/com/intel/gkl/native/
    08:35:44.675 INFO  GenotypeGVCFs - ------------------------------------------------------------
    08:35:44.678 INFO  GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.5.0.0-40-g948cd4f-SNAPSHOT
    08:35:44.678 INFO  GenotypeGVCFs - For support and documentation go to
    08:35:44.679 INFO  GenotypeGVCFs - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
    08:35:44.679 INFO  GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
    08:35:44.679 INFO  GenotypeGVCFs - Start Date/Time: June 25, 2024 at 8:35:44 AM NCT
    08:35:44.679 INFO  GenotypeGVCFs - ------------------------------------------------------------
    08:35:44.679 INFO  GenotypeGVCFs - ------------------------------------------------------------
    08:35:44.680 INFO  GenotypeGVCFs - HTSJDK Version: 4.1.0
    08:35:44.680 INFO  GenotypeGVCFs - Picard Version: 3.1.1
    08:35:44.680 INFO  GenotypeGVCFs - Built for Spark Version: 3.5.0
    08:35:44.680 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:35:44.680 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:35:44.680 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:35:44.680 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:35:44.681 INFO  GenotypeGVCFs - Deflater: IntelDeflater
    08:35:44.681 INFO  GenotypeGVCFs - Inflater: IntelInflater
    08:35:44.681 INFO  GenotypeGVCFs - GCS max retries/reopens: 20
    08:35:44.681 INFO  GenotypeGVCFs - Requester pays: disabled
    08:35:44.681 INFO  GenotypeGVCFs - Initializing engine
    08:35:45.024 INFO  GenomicsDBLibLoader - GenomicsDB native library version : 1.5.3-b586a26
    java: /build/GenomicsDB/dependencies/htslib/bgzf.c:449: bgzf_open: Assertion `compressBound(0xff00) < 0x10000' failed.
    Comment actions Permalink
  • Avatar
    Hugo DENIS

    Dear Gökalp, 

    I managed to install docker and gatk docker image and it works perfectly now. 

    Thank you for your help, 



    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Hugo DENIS

    We are happy to hear that docker worked well for you. I am in contact with the main GenomicsDB developer and waiting for their response but our team also suggested that the zlib-ng could be the actual culprit here given that htslib is known to have issues with this library. Normally systems come with zlib1g installed as default and GATK works without any issues with any default system installation. 

    I will update here once I get the definitive answer from the GenomicsDB developers. 


    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk