Using GenotypeGVCFs with GenomicsDB
Hello, I am using Terra to run GenotypeGVCFs on a GenomicsDB that exists in Google Cloud and therefore is a `gs://` URI. I generated it by using `GenomicsDBImport` with the output specified as `gs://BUCKET_URL/genomics_db/chr22` as per the specification stating that users can update a genomicsDB in-place as a GCP URI. I have successfully managed to create the genomicsDB, and am now wondering how to use it as an input for GenotypeGVCFs. Thanks!
a) GATK version used: 4.1.9.0
b) Exact command used:
GATK --java-options -Xmx8G GenotypeGVCFs -R /cromwell_root/fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.fasta -O chr22.10000001_11000001.margined.genotyped.vcf -L chr22:10000001-11000001 -V gendb:///gs://fc-1f86e464-457c-4114-a07d-268cb41f9efe/genomics_db/chr22
c) Entire error log:
2021/03/19 18:56:28 Starting container setup. 2021/03/19 18:56:30 Done container setup. 2021/03/19 18:56:31 Starting localization. 2021/03/19 18:56:37 Localization script execution started... 2021/03/19 18:56:37 Localizing input gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.fasta.fai -> /cromwell_root/fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.fasta.fai 2021/03/19 18:56:38 Localizing input gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.dict -> /cromwell_root/fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.dict 2021/03/19 18:56:38 Localizing input gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.fasta -> /cromwell_root/fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.fasta Copying gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.dict... / [0/2 files][ 0.0 B/ 3.0 GiB] 0% Done Copying gs://fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.fasta... / [0/2 files][ 0.0 B/ 3.0 GiB] 0% Done / [1/2 files][ 5.0 KiB/ 3.0 GiB] 0% Done - - [1/2 files][353.7 MiB/ 3.0 GiB] 11% Done \ | | [1/2 files][848.2 MiB/ 3.0 GiB] 28% Done / - - [1/2 files][ 1.3 GiB/ 3.0 GiB] 44% Done \ \ [1/2 files][ 1.8 GiB/ 3.0 GiB] 60% Done | / / [1/2 files][ 2.2 GiB/ 3.0 GiB] 74% Done - \ \ [1/2 files][ 2.6 GiB/ 3.0 GiB] 87% Done | | [1/2 files][ 2.9 GiB/ 3.0 GiB] 98% Done / / [2/2 files][ 3.0 GiB/ 3.0 GiB] 100% Done Operation completed over 2 objects/3.0 GiB. 2021/03/19 18:56:46 Localizing input gs://fc-1f86e464-457c-4114-a07d-268cb41f9efe/d677897d-9e4b-49f4-a7bf-13d0eeb1aa9d/t2t_interval_calling/5059d6fe-a94b-4523-8844-aae424726e0a/call-genotypeInterval/script -> /cromwell_root/script 2021/03/19 18:56:48 Localization script execution complete. 2021/03/19 18:58:08 Done localization. 2021/03/19 18:58:09 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash szarate/t2t_variants@sha256:96fe8147b4a946ae5b687c4c32b2004da1ded105091fab8fb5f6eac0f5ac89dd /cromwell_root/script Using GATK jar /gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -jar /gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenotypeGVCFs -R /cromwell_root/fc-47de7dae-e8e6-429c-b760-b4ba49136eee/t2t-chm13.20200921.withGRCh38chrY.chrEBV.chrYKI270740v1r.fasta -O chr22.10000001_11000001.margined.genotyped.vcf -L chr22:10000001-11000001 -V gendb:///gs://fc-1f86e464-457c-4114-a07d-268cb41f9efe/genomics_db/chr22 Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.ae432eaf 18:58:12.985 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 18:58:13.148 INFO GenotypeGVCFs - ------------------------------------------------------------ 18:58:13.148 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.9.0 18:58:13.148 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/ 18:58:13.148 INFO GenotypeGVCFs - Executing as root@24a30c1028db on Linux v5.4.89+ amd64 18:58:13.148 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12 18:58:13.149 INFO GenotypeGVCFs - Start Date/Time: March 19, 2021 6:58:12 PM GMT 18:58:13.149 INFO GenotypeGVCFs - ------------------------------------------------------------ 18:58:13.149 INFO GenotypeGVCFs - ------------------------------------------------------------ 18:58:13.149 INFO GenotypeGVCFs - HTSJDK Version: 2.23.0 18:58:13.149 INFO GenotypeGVCFs - Picard Version: 2.23.3 18:58:13.149 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2 18:58:13.149 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 18:58:13.149 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 18:58:13.149 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 18:58:13.149 INFO GenotypeGVCFs - Deflater: IntelDeflater 18:58:13.150 INFO GenotypeGVCFs - Inflater: IntelInflater 18:58:13.150 INFO GenotypeGVCFs - GCS max retries/reopens: 20 18:58:13.150 INFO GenotypeGVCFs - Requester pays: disabled 18:58:13.150 INFO GenotypeGVCFs - Initializing engine 18:58:13.516 INFO GenotypeGVCFs - Shutting down engine [March 19, 2021 6:58:13 PM GMT] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=438304768 java.nio.file.ProviderNotFoundException: Provider "gendb" not found at java.nio.file.FileSystems.newFileSystem(FileSystems.java:341) at org.broadinstitute.hellbender.engine.GATKPath.toPath(GATKPath.java:57) at org.broadinstitute.hellbender.engine.FeatureDataSource.getCodecForFeatureInput(FeatureDataSource.java:352) at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:334) at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:282) at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76) at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67) at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:709) at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) 2021/03/19 18:58:14 Starting delocalization. 2021/03/19 18:58:15 Delocalization script execution started... 2021/03/19 18:58:15 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-1f86e464-457c-4114-a07d-268cb41f9efe/d677897d-9e4b-49f4-a7bf-13d0eeb1aa9d/t2t_interval_calling/5059d6fe-a94b-4523-8844-aae424726e0a/call-genotypeInterval/memory_retry_rc 2021/03/19 18:58:15 Delocalizing output /cromwell_root/rc -> gs://fc-1f86e464-457c-4114-a07d-268cb41f9efe/d677897d-9e4b-49f4-a7bf-13d0eeb1aa9d/t2t_interval_calling/5059d6fe-a94b-4523-8844-aae424726e0a/call-genotypeInterval/rc 2021/03/19 18:58:16 Delocalizing output /cromwell_root/stdout -> gs://fc-1f86e464-457c-4114-a07d-268cb41f9efe/d677897d-9e4b-49f4-a7bf-13d0eeb1aa9d/t2t_interval_calling/5059d6fe-a94b-4523-8844-aae424726e0a/call-genotypeInterval/stdout 2021/03/19 18:58:18 Delocalizing output /cromwell_root/stderr -> gs://fc-1f86e464-457c-4114-a07d-268cb41f9efe/d677897d-9e4b-49f4-a7bf-13d0eeb1aa9d/t2t_interval_calling/5059d6fe-a94b-4523-8844-aae424726e0a/call-genotypeInterval/stderr 2021/03/19 18:58:19 Delocalizing output /cromwell_root/chr22.10000001_11000001.margined.genotyped.vcf -> gs://fc-1f86e464-457c-4114-a07d-268cb41f9efe/d677897d-9e4b-49f4-a7bf-13d0eeb1aa9d/t2t_interval_calling/5059d6fe-a94b-4523-8844-aae424726e0a/call-genotypeInterval/chr22.10000001_11000001.margined.genotyped.vcf Required file output '/cromwell_root/chr22.10000001_11000001.margined.genotyped.vcf' does not exist.
-
Update: I resolved this error by replacing `gendb:///gs://` with `gendb.gs://`. If this could be added to the documentation, that would be very helpful!
-
Hi Samantha Zarate,
Thank you for the update! I have put in a request for the information to be added so it is easier for users in the future.
Best,
Genevieve
Please sign in to leave a comment.
2 comments