GenotypeGVCFs error bgzf_open: Assertion `compressBound(0xff00) < 0x10000' failed.
Dear All:
I run a first test of using GenomicsDBImport (version gatk-4.5.0.0 ) to combine 2 gvcf files which worked fine. But when trying to perform joint genotyping a java error is raised.
The command line for GenomicsDBImport :
gatk --java-options "-Xmx4g -Xms4g" GenomicsDBImport --genomicsdb-workspace-path "${INDIR}GenomicDB/${CONTIG}" --batch-size 50 -L $CONTIG --sample-name-map "${INDIR}aspat_gvcf_clean.sample_map" --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp --reader-threads 2
Here is an example of GenomicsDBImport log file for one of the chromosomes :
Using GATK jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -Xms4g -jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/GenomicDB/scaffold_1 --batch-size 50 -L scaffold_1 --sample-name-map /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/aspat_gvcf_clean.sample_map --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp --reader-threads 2
10:57:40.472 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:57:40.609 INFO GenomicsDBImport - ------------------------------------------------------------
10:57:40.611 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.5.0.0
10:57:40.611 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
10:57:40.611 INFO GenomicsDBImport - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
10:57:40.612 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
10:57:40.612 INFO GenomicsDBImport - Start Date/Time: June 20, 2024 at 10:57:40 AM NCT
10:57:40.612 INFO GenomicsDBImport - ------------------------------------------------------------
10:57:40.612 INFO GenomicsDBImport - ------------------------------------------------------------
10:57:40.614 INFO GenomicsDBImport - HTSJDK Version: 4.1.0
10:57:40.614 INFO GenomicsDBImport - Picard Version: 3.1.1
10:57:40.614 INFO GenomicsDBImport - Built for Spark Version: 3.5.0
10:57:40.614 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:57:40.614 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:57:40.615 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:57:40.615 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:57:40.616 INFO GenomicsDBImport - Deflater: IntelDeflater
10:57:40.616 INFO GenomicsDBImport - Inflater: IntelInflater
10:57:40.616 INFO GenomicsDBImport - GCS max retries/reopens: 20
10:57:40.616 INFO GenomicsDBImport - Requester pays: disabled
10:57:40.616 INFO GenomicsDBImport - Initializing engine
10:57:40.790 INFO IntervalArgumentCollection - Processing 38134904 bp from intervals
10:57:40.792 INFO GenomicsDBImport - Done initializing engine
10:57:40.939 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.5.1-84e800e
10:57:40.940 INFO GenomicsDBImport - Vid Map JSON file will be written to /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/GenomicDB/scaffold_1/vidmap.json
10:57:40.940 INFO GenomicsDBImport - Callset Map JSON file will be written to /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/GenomicDB/scaffold_1/callset.json
10:57:40.942 INFO GenomicsDBImport - Complete VCF Header will be written to /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/GenomicDB/scaffold_1/vcfheader.vcf
10:57:40.942 INFO GenomicsDBImport - Importing to workspace - /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/GenomicDB/scaffold_1
10:57:41.143 INFO GenomicsDBImport - Starting batch input file preload
10:57:41.195 INFO GenomicsDBImport - Finished batch preload
10:57:41.197 INFO GenomicsDBImport - Importing batch 1 with 2 samples
10:59:32.529 INFO GenomicsDBImport - Done importing batch 1/1
10:59:32.530 INFO GenomicsDBImport - Import of all batches to GenomicsDB completed!
10:59:32.530 INFO GenomicsDBImport - Shutting down engine
[June 20, 2024 at 10:59:32 AM NCT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 1.87 minutes.
Runtime.totalMemory()=4294967296
The command line for GenotypeGVCFs :
cd ${INDIR}GenomicDB/
gatk --java-options "-Xmx4g" GenotypeGVCFs -R $REF_3 -V "gendb://${CONTIG}" -O "${OUTDIR}aspat_clean_${CONTIG}.vcf.gz" --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp
And the error I am getting :
Using GATK jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar GenotypeGVCFs -R /nvme/disk0/lecellier_data/WGS_GBR_data/Ref_genomes/Amil_scaffolds_final_v3.fa -V gendb://scaffold_1 -O /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/Vcf_files/aspat_clean_scaffold_1.vcf.gz --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp
11:53:41.464 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
11:53:41.581 INFO GenotypeGVCFs - ------------------------------------------------------------
11:53:41.584 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.5.0.0
11:53:41.584 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
11:53:41.584 INFO GenotypeGVCFs - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
11:53:41.584 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
11:53:41.584 INFO GenotypeGVCFs - Start Date/Time: June 20, 2024 at 11:53:41 AM NCT
11:53:41.584 INFO GenotypeGVCFs - ------------------------------------------------------------
11:53:41.584 INFO GenotypeGVCFs - ------------------------------------------------------------
11:53:41.585 INFO GenotypeGVCFs - HTSJDK Version: 4.1.0
11:53:41.585 INFO GenotypeGVCFs - Picard Version: 3.1.1
11:53:41.585 INFO GenotypeGVCFs - Built for Spark Version: 3.5.0
11:53:41.585 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:53:41.585 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:53:41.585 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:53:41.585 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:53:41.586 INFO GenotypeGVCFs - Deflater: IntelDeflater
11:53:41.586 INFO GenotypeGVCFs - Inflater: IntelInflater
11:53:41.586 INFO GenotypeGVCFs - GCS max retries/reopens: 20
11:53:41.586 INFO GenotypeGVCFs - Requester pays: disabled
11:53:41.586 INFO GenotypeGVCFs - Initializing engine
11:53:41.877 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.5.1-84e800e
java: /build/GenomicsDB/dependencies/htslib/bgzf.c:449: bgzf_open: Assertion `compressBound(0xff00) < 0x10000' failed.
The problem seems to come from dependencies of the htslib package but I haven't been able to find a solution to the issue on the forum or elsewhere online.
Thank you very much for your help
-
Hi Hugo DENIS
This clearly looks like a corrupt GenomicsDBImport instance. You may need to perform the import operation again and may need to use a different destination drive/location for this one. Unfortunately we do not have a tool to check the integrity of GenomicsDB folder. You may want to try the below parameter to see if it help you get an import that works.
--genomicsdb-shared-posixfs-optimizations true
I hope this helps.
Regards.
-
Hi, thank you for your answer.
I tried your suggestions, changing the destination drive and adding the option, as well as increasing the number of samples, but unfortunately it does not seem to solve the issue. Here are the commands and log files.
gatk GenomicsDBImport --genomicsdb-workspace-path "/home/hdenis/Gatk/${CONTIG}" -L $CONTIG --sample-name-map "${INDIR}aspat_gvcf_clean.sample_map" --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp --reader-threads 2 --genomicsdb-shared-posixfs-optimizations true --batch-size 50
gatk GenotypeGVCFs -R $REF_3 -V "gendb://${CONTIG}" -O "${OUTDIR}aspat_clean_${CONTIG}.vcf.gz" --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmpUsing GATK jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /home/hdenis/Gatk/scaffold_1 -L scaffold_1 --sample-name-map /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/aspat_gvcf_clean.sample_map --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp --reader-threads 2 --genomicsdb-shared-posixfs-optimizations true --batch-size 50
08:53:48.050 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
08:53:48.168 INFO GenomicsDBImport - ------------------------------------------------------------
08:53:48.171 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.5.0.0
08:53:48.171 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
08:53:48.171 INFO GenomicsDBImport - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
08:53:48.171 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
08:53:48.172 INFO GenomicsDBImport - Start Date/Time: June 21, 2024 at 8:53:48 AM NCT
08:53:48.172 INFO GenomicsDBImport - ------------------------------------------------------------
08:53:48.172 INFO GenomicsDBImport - ------------------------------------------------------------
08:53:48.173 INFO GenomicsDBImport - HTSJDK Version: 4.1.0
08:53:48.173 INFO GenomicsDBImport - Picard Version: 3.1.1
08:53:48.174 INFO GenomicsDBImport - Built for Spark Version: 3.5.0
08:53:48.174 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:53:48.174 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:53:48.174 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:53:48.175 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:53:48.175 INFO GenomicsDBImport - Deflater: IntelDeflater
08:53:48.175 INFO GenomicsDBImport - Inflater: IntelInflater
08:53:48.175 INFO GenomicsDBImport - GCS max retries/reopens: 20
08:53:48.175 INFO GenomicsDBImport - Requester pays: disabled
08:53:48.175 INFO GenomicsDBImport - Initializing engine
08:53:48.363 INFO IntervalArgumentCollection - Processing 38134904 bp from intervals
08:53:48.364 INFO GenomicsDBImport - Done initializing engine
08:53:48.509 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.5.1-84e800e
08:53:48.510 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/hdenis/Gatk/scaffold_1/vidmap.json
08:53:48.510 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/hdenis/Gatk/scaffold_1/callset.json
08:53:48.511 INFO GenomicsDBImport - Complete VCF Header will be written to /home/hdenis/Gatk/scaffold_1/vcfheader.vcf
08:53:48.512 INFO GenomicsDBImport - Importing to workspace - /home/hdenis/Gatk/scaffold_1
08:53:48.719 INFO GenomicsDBImport - Starting batch input file preload
08:53:48.841 INFO GenomicsDBImport - Finished batch preload
08:53:48.843 INFO GenomicsDBImport - Importing batch 1 with 4 samples
08:57:18.522 INFO GenomicsDBImport - Done importing batch 1/1
08:57:18.526 INFO GenomicsDBImport - Import of all batches to GenomicsDB completed!
08:57:18.526 INFO GenomicsDBImport - Shutting down engine
[June 21, 2024 at 8:57:18 AM NCT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 3.51 minutes.
Runtime.totalMemory()=1224736768
Using GATK jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar GenotypeGVCFs -R /nvme/disk0/lecellier_data/WGS_GBR_data/Ref_genomes/Amil_scaffolds_final_v3.fa -V gendb://scaffold_1 -O /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/Vcf_files/aspat_clean_scaffold_1.vcf.gz --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp
08:57:20.266 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
08:57:20.390 INFO GenotypeGVCFs - ------------------------------------------------------------
08:57:20.392 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.5.0.0
08:57:20.393 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
08:57:20.393 INFO GenotypeGVCFs - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
08:57:20.393 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
08:57:20.393 INFO GenotypeGVCFs - Start Date/Time: June 21, 2024 at 8:57:20 AM NCT
08:57:20.393 INFO GenotypeGVCFs - ------------------------------------------------------------
08:57:20.393 INFO GenotypeGVCFs - ------------------------------------------------------------
08:57:20.395 INFO GenotypeGVCFs - HTSJDK Version: 4.1.0
08:57:20.395 INFO GenotypeGVCFs - Picard Version: 3.1.1
08:57:20.395 INFO GenotypeGVCFs - Built for Spark Version: 3.5.0
08:57:20.395 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:57:20.395 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:57:20.396 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:57:20.397 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:57:20.397 INFO GenotypeGVCFs - Deflater: IntelDeflater
08:57:20.398 INFO GenotypeGVCFs - Inflater: IntelInflater
08:57:20.398 INFO GenotypeGVCFs - GCS max retries/reopens: 20
08:57:20.398 INFO GenotypeGVCFs - Requester pays: disabled
08:57:20.398 INFO GenotypeGVCFs - Initializing engine
08:57:20.696 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.5.1-84e800e
java: /build/GenomicsDB/dependencies/htslib/bgzf.c:449: bgzf_open: Assertion `compressBound(0xff00) < 0x10000' failed.I don't see anything in the log file that suggests the GenomicDB import has failed.
I also tried to check the vcf files generated by HaplotypeCaller. An error is raised but it seems that is not an issue (https://gatk.broadinstitute.org/hc/en-us/community/posts/360067695771-GenotypeGvcfs-has-formatting-issues-in-both-v4-1-6-0-as-v4-1-7-0)
gatk ValidateVariants -V /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz
Using GATK jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar ValidateVariants -V /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz
08:45:42.630 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/hdenis/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
08:45:42.761 INFO ValidateVariants - ------------------------------------------------------------
08:45:42.763 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.5.0.0
08:45:42.763 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
08:45:42.763 INFO ValidateVariants - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
08:45:42.763 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
08:45:42.764 INFO ValidateVariants - Start Date/Time: June 21, 2024 at 8:45:42 AM NCT
08:45:42.764 INFO ValidateVariants - ------------------------------------------------------------
08:45:42.764 INFO ValidateVariants - ------------------------------------------------------------
08:45:42.764 INFO ValidateVariants - HTSJDK Version: 4.1.0
08:45:42.765 INFO ValidateVariants - Picard Version: 3.1.1
08:45:42.765 INFO ValidateVariants - Built for Spark Version: 3.5.0
08:45:42.765 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:45:42.765 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:45:42.766 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:45:42.766 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:45:42.766 INFO ValidateVariants - Deflater: IntelDeflater
08:45:42.766 INFO ValidateVariants - Inflater: IntelInflater
08:45:42.766 INFO ValidateVariants - GCS max retries/reopens: 20
08:45:42.766 INFO ValidateVariants - Requester pays: disabled
08:45:42.766 INFO ValidateVariants - Initializing engine
08:45:42.839 INFO FeatureManager - Using codec VCFCodec to read file file:///nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz
08:45:42.941 INFO ValidateVariants - Done initializing engine
08:45:42.942 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
08:45:42.942 WARN ValidateVariants - Other possible validations will still be performed
08:45:42.942 WARN ValidateVariants - REF validation cannot be done because no reference file was provided
08:45:42.942 WARN ValidateVariants - Other possible validations will still be performed
08:45:42.942 INFO ProgressMeter - Starting traversal
08:45:42.943 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
08:45:42.952 INFO ValidateVariants - Shutting down engine
[June 21, 2024 at 8:45:42 AM NCT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=285212672
***********************************************************************
A USER ERROR has occurred: Input /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz fails strict validation of type ALL: one or more of the ALT allele(s) for the record at position scaffold_1:75 are not observed at all in the sample genotypes
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.Here is the content of my map file, in case you notice something wrong.
RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3 /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1718_L1_pe_aln_Amilleporav3.g.vcf.gz
RRAP-ECT01-2022-Aspat-CBHE-1719_L1_pe_aln_Amilleporav3 /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1719_L1_pe_aln_Amilleporav3.g.vcf.gz
RRAP-ECT01-2022-Aspat-CBHE-1720_L2_pe_aln_Amilleporav3 /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1720_L2_pe_aln_Amilleporav3.g.vcf.gz
RRAP-ECT01-2022-Aspat-CBHE-1721_L2_pe_aln_Amilleporav3 /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/RRAP-ECT01-2022-Aspat-CBHE-1721_L2_pe_aln_Amilleporav3.g.vcf.gzThe java version I am using
java --version
openjdk 17.0.2 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)Is there something else I could try ?
Cheers,
Hugo
-
Hi again,
I have tried running the same code and inputs on a different machine and it worked which tends to suggest that the issue is indeed related to dependencies conflicts.
I have seen this post that suggests a problem when htslib is used in conjunction with zlib-ng:
Although zlib-ng is installed on the cluster, it is not loaded. I have also tried to install and load version of vanilla zlib but it did not solve the problem.
Is there a specific way to install gatk that would solve this conflict ?
I have downloaded the latest gatk version zip here:
https://github.com/broadinstitute/gatk/releases
and installed samtools separately following gatk installation recommandations
samtools --version
samtools 1.20
Using htslib 1.20
Copyright (C) 2024 Genome Research Ltd.
Samtools compilation details:
Features: build=configure curses=yes
CC: gcc
CPPFLAGS:
CFLAGS: -Wall -g -O2
LDFLAGS:
HTSDIR: htslib-1.20
LIBS:
CURSES_LIB: -lncursesw
HTSlib compilation details:
Features: build=configure libcurl=yes S3=yes GCS=yes libdeflate=no lzma=yes bzip2=yes plugins=no htscodecs=1.6.0
CC: gcc
CPPFLAGS:
CFLAGS: -Wall -g -O2 -fvisibility=hidden
LDFLAGS: -fvisibility=hidden
HTSlib URL scheme handlers present:
built-in: preload, data, file
S3 Multipart Upload: s3w, s3w+https, s3w+http
Amazon S3: s3+https, s3+http, s3
Google Cloud Storage: gs+http, gs+https, gs
libcurl: imaps, pop3, gophers, http, smb, gopher, ftps, imap, smtp, smtps, rtsp, ftp, telnet, mqtt, https, smbs, tftp, pop3s, dict
crypt4gh-needed: crypt4gh
mem: memAny help would be greatly appreciated,
Thank you very much !
-
I am not sure that GATK depends on any of the system installed libraries. It uses libdeflate and libinflate from intel GKL therefore zlib-ng or vanilla being installed on the system should have nothing to do with this issue.
I will consult with devs around this issue. In the meantime you may try using our docker image as an alternate method for installing GATK or you may want to try running the same commands using the master branch compiled from our github source. Beware that the last recommendation is just for seeing if the issue persists on our latest code. We do not recommend directly running our master branch for production purposes unless we tell that it is OK to do so.
I hope this helps.
-
Hi,
Thank you for your responsiveness.
Docker is not installed on the cluster I am working with, I have asked administrators about it. In the meantime I tried to use the last github master branch which reproduced the same error (see below).
Thank you for your help
[hdenis@R740xd GenomicDB]$ /home/hdenis/gatk/gatk --java-options "-Xmx4g" GenotypeGVCFs -R /nvme/disk0/lecellier_data/WGS_GBR_data/Ref_genomes/Amil_scaffolds_final_v3.fa -V "gendb://scaffold_1" -O "/nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/Vcf_files/aspat_clean_scaffold_1.vcf.gz" --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp
Using GATK jar /home/hdenis/gatk/build/libs/gatk-package-4.5.0.0-40-g948cd4f-SNAPSHOT-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /home/hdenis/gatk/build/libs/gatk-package-4.5.0.0-40-g948cd4f-SNAPSHOT-local.jar GenotypeGVCFs -R /nvme/disk0/lecellier_data/WGS_GBR_data/Ref_genomes/Amil_scaffolds_final_v3.fa -V gendb://scaffold_1 -O /nvme/disk0/lecellier_data/WGS_GBR_data/GATK_files/Vcf_files/aspat_clean_scaffold_1.vcf.gz --include-non-variant-sites --tmp-dir /nvme/disk0/lecellier_data/WGS_GBR_data/tmp
08:35:44.538 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/hdenis/gatk/build/libs/gatk-package-4.5.0.0-40-g948cd4f-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
08:35:44.675 INFO GenotypeGVCFs - ------------------------------------------------------------
08:35:44.678 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.5.0.0-40-g948cd4f-SNAPSHOT
08:35:44.678 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
08:35:44.679 INFO GenotypeGVCFs - Executing as hdenis@R740xd on Linux v5.14.0-362.13.1.el9_3.x86_64 amd64
08:35:44.679 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17.0.2+8-86
08:35:44.679 INFO GenotypeGVCFs - Start Date/Time: June 25, 2024 at 8:35:44 AM NCT
08:35:44.679 INFO GenotypeGVCFs - ------------------------------------------------------------
08:35:44.679 INFO GenotypeGVCFs - ------------------------------------------------------------
08:35:44.680 INFO GenotypeGVCFs - HTSJDK Version: 4.1.0
08:35:44.680 INFO GenotypeGVCFs - Picard Version: 3.1.1
08:35:44.680 INFO GenotypeGVCFs - Built for Spark Version: 3.5.0
08:35:44.680 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:35:44.680 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:35:44.680 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:35:44.680 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:35:44.681 INFO GenotypeGVCFs - Deflater: IntelDeflater
08:35:44.681 INFO GenotypeGVCFs - Inflater: IntelInflater
08:35:44.681 INFO GenotypeGVCFs - GCS max retries/reopens: 20
08:35:44.681 INFO GenotypeGVCFs - Requester pays: disabled
08:35:44.681 INFO GenotypeGVCFs - Initializing engine
08:35:45.024 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.5.3-b586a26
java: /build/GenomicsDB/dependencies/htslib/bgzf.c:449: bgzf_open: Assertion `compressBound(0xff00) < 0x10000' failed. -
Dear Gökalp,
I managed to install docker and gatk docker image and it works perfectly now.
Thank you for your help,
Best
-
Hi Hugo DENIS
We are happy to hear that docker worked well for you. I am in contact with the main GenomicsDB developer and waiting for their response but our team also suggested that the zlib-ng could be the actual culprit here given that htslib is known to have issues with this library. Normally systems come with zlib1g installed as default and GATK works without any issues with any default system installation.
I will update here once I get the definitive answer from the GenomicsDB developers.
Regards.
Please sign in to leave a comment.
7 comments