GATK GenomicsDBimport Error with TileDB
AnsweredREQUIRED for all errors and issues:
a) GATK version used: gatk 4.2.4.1
b) Exact command used:
I do 2 commands
one is below (with about 800 samples)
gatk GenomicsDBImport --tmp-dir ./TMP -L /interval_directory/Homo_sapiens_assembly38.chr(1...22, X,Y,MT).intervals -R Homo_sapiens_assembly38.fa -V sample1.g.vcf.gz .. -V sample800.g.vcf.gz --batch-size 50 --consolidate true --genomicsdb-workspace-path sample_data/ --max-num-intervals-to-import-in-parallel 3 --genomicsdb-shared-posixfs-optimizations
09:05:18.986 INFO GenomicsDBImport - Importing batch 80 with 10 samples
10:21:17.311 INFO GenomicsDBImport - Done importing batch 80/82
10:21:21.109 INFO GenomicsDBImport - Importing batch 81 with 10 samples
11:45:03.739 INFO GenomicsDBImport - Done importing batch 81/82
11:45:06.764 INFO GenomicsDBImport - Importing batch 82 with 8 samples
[TileDB::ReadState] Error: Cannot read tile from file; File opening error.
06:19:38.517 erro NativeGenomicsDB - pid=46344 tid=48705 VariantStorageManagerException exception : Error while consolidating TileDB array chr1$1$248956422
TileDB error message :
terminate called after throwing an instance of 'VariantStorageManagerException'
what(): VariantStorageManagerException exception : Error while consolidating TileDB array chr1$1$248956422
TileDB error message :
and other one is
gatk --java-options "-Xmx30G -Xms30G" GenomicsDBImport --tmp-dir ./TMP -L /interval_directory/Homo_sapiens_assembly38.chr(1...22, X,Y,MT).intervals -R Homo_sapiens_assembly38.fa -V sample1.g.vcf.gz .. -V sample200.g.vcf.gz --batch-size 50 --consolidate true --genomicsdb-workspace-path sample_data/ --max-num-intervals-to-import-in-parallel 3 --genomicsdb-shared-posixfs-optimizations
(800 split to 200 * 4)
Unfortunately, they omit same error in TileDB
09:21:22.875 INFO GenomicsDBImport - Done importing batch 2/4
09:21:43.369 INFO GenomicsDBImport - Importing batch 3 with 50 samples
00:25:11.291 INFO GenomicsDBImport - Done importing batch 3/4
00:25:30.785 INFO GenomicsDBImport - Importing batch 4 with 50 samples
[TileDB::ReadState] Error: Cannot read tile from file; File opening error.
02:59:39.173 erro NativeGenomicsDB - pid=22578 tid=23756 VariantStorageManagerException exception : Error while consolidating TileDB array chr1$1$248956422
TileDB error message :
terminate called after throwing an instance of 'VariantStorageManagerException'
what(): VariantStorageManagerException exception : Error while consolidating TileDB array chr1$1$248956422
TileDB error message : (Empty)
How do I solve this problem?
I found answers like upgrade version to 4.2.0 (but my gatk version 4.2.4.1), --genomicsdb-shared-posixfs-optimizations option adds, --batch-size option reduce(50 -> 20 -> 10).. but all answers are not for me.
c) Entire program log:
gatk GenomicsDBImport --tmp-dir ./TMP -L /interval_directory/Homo_sapiens_assembly38.chr(1...22, X,Y,MT).intervals -R Homo_sapiens_assembly38.fa -V sample1.g.vcf.gz .. -V sample800.g.vcf.gz --batch-size 50 --consolidate true --genomicsdb-workspace-path sample_data/ --max-num-intervals-to-import-in-parallel 3 --genomicsdb-shared-posixfs-optimizations
09:45:50.551 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/apps/gatk/4.2.4.1/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 10, 2023 9:45:56 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:45:56.714 INFO GenomicsDBImport - ------------------------------------------------------------
09:45:56.715 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.4.1
09:45:56.715 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
09:45:56.716 INFO GenomicsDBImport - Executing as jhk0709@12tb on Linux v3.10.0-1160.11.1.el7.x86_64 amd64
09:45:56.716 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-b01
09:45:56.716 INFO GenomicsDBImport - Start Date/Time: March 10, 2023 9:45:50 AM KST
09:45:56.716 INFO GenomicsDBImport - ------------------------------------------------------------
09:45:56.716 INFO GenomicsDBImport - ------------------------------------------------------------
09:45:56.718 INFO GenomicsDBImport - HTSJDK Version: 2.24.1
09:45:56.718 INFO GenomicsDBImport - Picard Version: 2.25.4
09:45:56.718 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
09:45:56.718 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:45:56.718 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:45:56.718 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:45:56.718 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:45:56.719 INFO GenomicsDBImport - Deflater: IntelDeflater
09:45:56.719 INFO GenomicsDBImport - Inflater: IntelInflater
09:45:56.719 INFO GenomicsDBImport - GCS max retries/reopens: 20
09:45:56.719 INFO GenomicsDBImport - Requester pays: disabled
09:45:56.719 INFO GenomicsDBImport - Initializing engine
09:49:59.976 INFO IntervalArgumentCollection - Processing 248956422 bp from intervals
09:50:00.024 INFO GenomicsDBImport - Done initializing engine
09:50:00.337 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.3-6069e4a
09:50:00.339 INFO GenomicsDBImport - Vid Map JSON file will be written to
sample_data/vidmap.json
09:50:00.339 INFO GenomicsDBImport - Callset Map JSON file will be written to
sample_data/callset.json
09:50:00.339 INFO GenomicsDBImport - Complete VCF Header will be written to
sample_data/vcfheader.vcf
09:50:00.339 INFO GenomicsDBImport - Importing to workspace -
sample_data/
09:50:01.205 INFO GenomicsDBImport - Importing batch 1 with 10 samples
...
9:05:15.147 INFO GenomicsDBImport - Done importing batch 79/82
09:05:18.986 INFO GenomicsDBImport - Importing batch 80 with 10 samples
10:21:17.311 INFO GenomicsDBImport - Done importing batch 80/82
10:21:21.109 INFO GenomicsDBImport - Importing batch 81 with 10 samples
11:45:03.739 INFO GenomicsDBImport - Done importing batch 81/82
11:45:06.764 INFO GenomicsDBImport - Importing batch 82 with 8 samples
[TileDB::ReadState] Error: Cannot read tile from file; File opening error.
06:19:38.517 erro NativeGenomicsDB - pid=46344 tid=48705 VariantStorageManagerException exception : Error while consolidating TileDB array chr1$1$248956422
TileDB error message :
terminate called after throwing an instance of 'VariantStorageManagerException'
what(): VariantStorageManagerException exception : Error while consolidating TileDB array chr1$1$248956422
TileDB error message :
-
I add "--genomicsdb-shared-posixfs-optimizations true" option, it doesn't work both commands.
-
OK, I found the anwser. The temporary directory has no resource with large genomicsDBimport processing. Anyone who do this job, must be careful setting TMP directory!
Please sign in to leave a comment.
2 comments