GenomicsDBImport works in interactive shell, but not in a script
AnsweredHello,
I encountered a strange error using the GenomicsDBImport on GATK v4.2.2.0.
Attempting to create a genomicsDB workspace as a part of a script fails, while executing the exact same command interactively works.
So script execution of (some paths redacted):
/xxx/gatk --java-options "-Xmx50g -Xms50g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" GenomicsDBImport --genomicsdb-workspace-path data/genomicsdb --overwrite-existing-genomicsdb-workspace -V GSM2410670/GSM2410670_merged.g.vcf.gz -V GSM2410680/GSM2410680_merged.g.vcf.gz -V GSM2410679/GSM2410679_merged.g.vcf.gz -V GSM2410676/GSM2410676_merged.g.vcf.gz -V GSM2410671/GSM2410671_merged.g.vcf.gz -V GSM2410672/GSM2410672_merged.g.vcf.gz -V GSM2410674/GSM2410674_merged.g.vcf.gz -L GRCm38.bed --interval-padding 100 --merge-input-intervals --max-num-intervals-to-import-in-parallel 20 --tmp-dir /xxx/tmp/ --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 163840
results in the following error:
Using GATK jar /xxx/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50g -Xms50g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /xxx/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenomicsDBImport --genomicsdb-workspace-path /xxx/data/genomicdb --overwrite-existing-genomicsdb-workspace -V GSM2410670/GSM2410670_merged.g.vcf.gz -V GSM2410680/GSM2410680_merged.g.vcf.gz -V GSM2410679/GSM2410679_merged.g.vcf.gz -V GSM2410676/GSM2410676_merged.g.vcf.gz -V GSM2410671/GSM2410671_merged.g.vcf.gz -V GSM2410672/GSM2410672_merged.g.vcf.gz -V GSM2410674/GSM2410674_merged.g.vcf.gz -L /xxx/GRCm38.bed --interval-padding 100 --merge-input-intervals --max-num-intervals-to-import-in-parallel 20 --tmp-dir /xxx/tmp/ --genomicsdb-segment-size 10485760 --genomicsdb-vcf-buffer-size 163840
00:42:19.182 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/xxx/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
00:42:19.486 INFO GenomicsDBImport - ------------------------------------------------------------
00:42:19.487 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.6.1
00:42:19.487 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
00:42:19.487 INFO GenomicsDBImport - Executing as guyshapira@compute-0-3 on Linux v3.10.0-1160.45.1.el7.x86_64 amd64
00:42:19.487 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_302-b08
00:42:19.487 INFO GenomicsDBImport - Start Date/Time: July 22, 2022 12:42:19 AM IDT
00:42:19.487 INFO GenomicsDBImport - ------------------------------------------------------------
00:42:19.487 INFO GenomicsDBImport - ------------------------------------------------------------
00:42:19.488 INFO GenomicsDBImport - HTSJDK Version: 2.24.1
00:42:19.488 INFO GenomicsDBImport - Picard Version: 2.27.1
00:42:19.488 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
00:42:19.488 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
00:42:19.488 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
00:42:19.488 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
00:42:19.488 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
00:42:19.488 INFO GenomicsDBImport - Deflater: IntelDeflater
00:42:19.488 INFO GenomicsDBImport - Inflater: IntelInflater
00:42:19.488 INFO GenomicsDBImport - GCS max retries/reopens: 20
00:42:19.488 INFO GenomicsDBImport - Requester pays: disabled
00:42:19.488 INFO GenomicsDBImport - Initializing engine
00:42:20.082 INFO FeatureManager - Using codec BEDCodec to read file file:///xxx/GRCm38.bed
00:42:20.089 INFO IntervalArgumentCollection - Processing 2725537669 bp from intervals
00:42:20.097 INFO GenomicsDBImport - Done initializing engine
00:42:20.386 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.3-6069e4a
[TileDB::FileSystem] Error: (create_dir) Cannot create directory; Directory already exists path=/xxx/data/genomicdb
00:42:20.388 INFO GenomicsDBImport - Shutting down engine
[July 22, 2022 12:42:20 AM IDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=51450478592
***********************************************************************A USER ERROR has occurred: Error creating GenomicsDB workspace: /xxx/data/genomicdb
***********************************************************************
org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$UnableToCreateGenomicsDBWorkspace: Error creating GenomicsDB workspace: /xxx/data/genomicdb
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.overwriteCreateOrCheckWorkspace(GenomicsDBImport.java:1070)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onTraversalStart(GenomicsDBImport.java:708)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1083)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
The log claims that the target directory already exists, but it does not. I tried switching --genomicsdb-workspace-path, but it didn't help.
Running the exact same command interactively works just fine, which is strange.
Any ideas?
- Guy
-
Hi Guy,
Thanks for writing into the GATK forum with this issue, I hope we can help you figure out what is going wrong here!
First, I noticed that you wrote you were using GATK 4.2.2.0 but in the program log it looks like it is running 4.2.6.1. Could you verify that you are running the same GATK version on both the interactive shell and script?
What order of operations are you running in your script that might be different from the interactive shell? Are you creating the GenomicsDB workspace first and trying to add to it or are you wanting to create a new workspace?
It would also be helpful for us to see more of the script you are running, in addition to your command and program log from the shell command.
Best,
Genevieve
Please sign in to leave a comment.
1 comment