Issues while running GenotypeGVCFs
AnsweredDear GATK Experts,
I am having the following error while running GenotypeGVCFs:
"java.nio.file.ProviderNotFoundException: Provider "ST4.03ch01" not found"
The required details are provided below:
a) GATK version used: Version:4.2.0.0
b) Exact command used: gatk --java-options "-Xmx8G -XX:+UseSerialGC" GenotypeGVCFs -R $refSequence -L $4 -O $4.$1.$3.final.variants.vcf.gz -V $1.$2.final.variants.g.vcf.gz --tmp-dir ./temp.$1.$2.$3.$4
The variables in the command corresponds to:
$1: divPanelWEC
$2: ST4.03ch01
$3: WEC
$4: ST4.03ch01:1-1000000
c) Entire error log:
===============================================================
script_6b_GenotypeGVCFs_performJointGenotyping_plusConvertGVCFs2VCFs_eachChromSeparately_GVCFsProducedAcrossSingleOrMultipleLib.sh STARTED
START TIME Thu 24 Jun 19:32:17 BST 2021
=================================================================
Hostname: n19-32-192-mandarin
=============================================
Running the GenotypeGVCFs
start time Thu 24 Jun 19:32:27 BST 2021
===============================================
Using GATK jar /mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -XX:+UseSerialGC -jar /mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar GenotypeGVCFs -R /home/ssharma/reference_DM_PM4.03_G3_all_bowtie2/refForNGExomeCaptureOID42180_without_myb73_like/DM_v4.03_G3_allplusMyb73.fasta -L ST4.03ch01:1-1000000 -O ST4.03ch01:1-1000000.divPanelWEC.WEC.final.variants.vcf.gz -V divPanelWEC.ST4.03ch01.final.variants.g.vcf.gz --tmp-dir ./temp.divPanelWEC.ST4.03ch01.WEC.ST4.03ch01:1-1000000
19:32:34.407 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 24, 2021 7:32:34 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
19:32:34.760 INFO GenotypeGVCFs - ------------------------------------------------------------
19:32:34.760 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.0.0
19:32:34.761 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
19:32:34.762 INFO GenotypeGVCFs - Executing as ssharma@n19-32-192-mandarin.hpc.hutton.ac.uk on Linux v4.18.0-240.22.1.el8_3.x86_64 amd64
19:32:34.762 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
19:32:34.762 INFO GenotypeGVCFs - Start Date/Time: 24 June 2021 19:32:34 BST
19:32:34.763 INFO GenotypeGVCFs - ------------------------------------------------------------
19:32:34.763 INFO GenotypeGVCFs - ------------------------------------------------------------
19:32:34.764 INFO GenotypeGVCFs - HTSJDK Version: 2.24.0
19:32:34.765 INFO GenotypeGVCFs - Picard Version: 2.25.0
19:32:34.765 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
19:32:34.765 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
19:32:34.765 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:32:34.766 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
19:32:34.766 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:32:34.766 INFO GenotypeGVCFs - Deflater: IntelDeflater
19:32:34.767 INFO GenotypeGVCFs - Inflater: IntelInflater
19:32:34.767 INFO GenotypeGVCFs - GCS max retries/reopens: 20
19:32:34.767 INFO GenotypeGVCFs - Requester pays: disabled
19:32:34.768 INFO GenotypeGVCFs - Initializing engine
19:32:35.968 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/shared/scratch/ssharma/diversityCaptureAnalysis_scratch/chromosomewiseCombinedGvcfs.divPanelWEC/divPanelWEC.ST4.03ch01.final.variants.g.vcf.gz
19:32:36.052 INFO IntervalArgumentCollection - Processing 1000000 bp from intervals
19:32:36.056 INFO GenotypeGVCFs - Done initializing engine
19:32:36.069 INFO GenotypeGVCFs - Shutting down engine
[24 June 2021 19:32:36 BST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=194707456
java.nio.file.ProviderNotFoundException: Provider "ST4.03ch01" not found
at java.nio.file.FileSystems.newFileSystem(FileSystems.java:341)
at org.broadinstitute.hellbender.engine.GATKPath.toPath(GATKPath.java:77)
at org.broadinstitute.hellbender.engine.GATKTool.createVCFWriter(GATKTool.java:862)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.onTraversalStart(GenotypeGVCFs.java:266)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
=============================================
GenotypeGVCFs run complete
end time Thu 24 Jun 19:32:36 BST 2021
===============================================
cp: cannot stat './ST4.03ch01:1-1000000.divPanelWEC.WEC.final.variants.vcf.gz.tbi': No such file or directory
mv: cannot stat './ST4.03ch01:1-1000000.divPanelWEC.WEC.final.variants.vcf.gz': No such file or directory
mv: cannot stat './ST4.03ch01:1-1000000.divPanelWEC.WEC.final.variants.vcf.gz.tbi': No such file or directory
===============================================================
script_6b_GenotypeGVCFs_performJointGenotyping_plusConvertGVCFs2VCFs_eachChromSeparately_GVCFsProducedAcrossSingleOrMultipleLib.sh STARTED
END TIME Thu 24 Jun 19:32:36 BST 2021
=================================================================
###################################################################
The GVCF files (for 96 whole-exome capture potato samples) were generated using HaplotypeCaller and combined - separately for each chromosome - using CombineGVCFs. The commands used are provided below:
gatk HaplotypeCaller -R $refSequence -I $1.recalibrated.bam -O $1.final.variants.g.vcf.gz -ERC GVCF -ploidy $ploidy --dont-use-soft-clipped-bases --native-pair-hmm-threads $3 --native-pair-hmm-use-double-precision true --disable-read-filter MappingQualityReadFilter
gatk --java-options "-Xmx4G -XX:+UseSerialGC" CombineGVCFs -R $refSequence -L $3 -O ../chromosomewiseCombinedGvcfs.$2/$1.$3.final.variants.g.vcf.gz -V ./temp.$1.$2.$3/$1.CombineGVCFsInputFile.list --tmp-dir ./temp.$1.$2.$3
These commands have worked absolutely fine using previous GATK version (v4.1.7.0).
I've tried various permutations and combinations of possible causes but nothing worked. This includes trying different genomic intervals, providing absolute and relative paths for -V argument and increasing memory to 96 G.
I can also send the subset of the combined GVCF file for the specified genomic interval for testing the issue at your end, please let me know if this would be needed.
Kind regards,
Sanjeev
-
Hi sanjeevksh,
Most likely this issue is coming from your file names, the colons will confuse the system into looking for a schema. Try adding file:// in front of any of your file names involving colons and hyphens and let us know if it works out then.
Best,
Genevieve
-
Hi Genevieve,
Thank you for your feedback. There are no colons and hyphens in the input files. The output file format does include colons and hyphens but these don't get produced. Nevertheless, I have tried adding 'file://' to (a) input file name only, (b) output file name only, and (c) input and output file names both. None of these options worked, error logs from all three options are provided below this message.
As I mentioned in my previous message, this whole analysis is the exact repeat of what I did using GATK version v4.1.7.0 including the command structure, input and output file names, starting data, etc. and then it worked absolutely fine in.
The only difference is the previous analysis was done on our old cluster which employed job scheduling through SGE and the the current analysis is on our new cluster which uses SLURM job scheduler but I don't think this should make any difference.
I don't wish to go back to the older version but if I revert back to GATK version v4.1.7.0, would the GVCF files produced using the latest version be back compatible with GenotypeGVCFs in the older version?
Kind regards,
Sanjeev
################################
adding file:// for '-V' argument input files
################################
=============================================
Running the GenotypeGVCFs
start time Tue 29 Jun 00:07:24 BST 2021
===============================================
Using GATK jar /mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -XX:+UseSerialGC -jar /mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar GenotypeGVCFs -R /home/ssharma/reference_DM_PM4.03_G3_all_bowtie2/refForNGExomeCaptureOID42180_without_myb73_like/DM_v4.03_G3_allplusMyb73.fasta -L ST4.03ch01:1000001-2000000 -O ST4.03ch01:1000001-2000000.divPanelWEC.WEC.final.variants.vcf.gz -V file://divPanelWEC.ST4.03ch01.final.variants.g.vcf.gz --tmp-dir ./temp.divPanelWEC.ST4.03ch01.WEC.ST4.03ch01:1000001-2000000
00:07:36.001 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 29, 2021 12:07:36 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
00:07:36.623 INFO GenotypeGVCFs - ------------------------------------------------------------
00:07:36.628 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.0.0
00:07:36.632 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
00:07:36.637 INFO GenotypeGVCFs - Executing as ssharma@n19-32-192-spiderman.hpc.hutton.ac.uk on Linux v4.18.0-240.22.1.el8_3.x86_64 amd64
00:07:36.641 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
00:07:36.645 INFO GenotypeGVCFs - Start Date/Time: 29 June 2021 00:07:35 BST
00:07:36.649 INFO GenotypeGVCFs - ------------------------------------------------------------
00:07:36.653 INFO GenotypeGVCFs - ------------------------------------------------------------
00:07:36.660 INFO GenotypeGVCFs - HTSJDK Version: 2.24.0
00:07:36.666 INFO GenotypeGVCFs - Picard Version: 2.25.0
00:07:36.670 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
00:07:36.675 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
00:07:36.680 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
00:07:36.685 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
00:07:36.689 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
00:07:36.695 INFO GenotypeGVCFs - Deflater: IntelDeflater
00:07:36.700 INFO GenotypeGVCFs - Inflater: IntelInflater
00:07:36.705 INFO GenotypeGVCFs - GCS max retries/reopens: 20
00:07:36.709 INFO GenotypeGVCFs - Requester pays: disabled
00:07:36.713 INFO GenotypeGVCFs - Initializing engine
00:07:38.005 INFO GenotypeGVCFs - Shutting down engine
[29 June 2021 00:07:38 BST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=194707456
java.lang.IllegalArgumentException: URI has an authority component
at sun.nio.fs.UnixUriUtils.fromUri(UnixUriUtils.java:53)
at sun.nio.fs.UnixFileSystemProvider.getPath(UnixFileSystemProvider.java:98)
at java.nio.file.Paths.get(Paths.java:138)
at htsjdk.io.HtsPath.toPath(HtsPath.java:158)
at org.broadinstitute.hellbender.engine.GATKPath.toPath(GATKPath.java:70)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getCodecForFeatureInput(FeatureDataSource.java:354)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:336)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:284)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:707)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)=============================================
GenotypeGVCFs run complete
end time Tue 29 Jun 00:07:38 BST 2021
===============================================############################################################################
################################
adding file:// for '-O' argument output files
################################
=============================================
Running the GenotypeGVCFs
start time Tue 29 Jun 00:23:06 BST 2021
===============================================
Using GATK jar /mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -XX:+UseSerialGC -jar /mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar GenotypeGVCFs -R /home/ssharma/reference_DM_PM4.03_G3_all_bowtie2/refForNGExomeCaptureOID42180_without_myb73_like/DM_v4.03_G3_allplusMyb73.fasta -L ST4.03ch01:3000001-4000000 -O file://ST4.03ch01:3000001-4000000.divPanelWEC.WEC.final.variants.vcf.gz -V divPanelWEC.ST4.03ch01.final.variants.g.vcf.gz --tmp-dir ./temp.divPanelWEC.ST4.03ch01.WEC.ST4.03ch01:3000001-4000000
00:23:17.998 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 29, 2021 12:23:18 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
00:23:18.593 INFO GenotypeGVCFs - ------------------------------------------------------------
00:23:18.598 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.0.0
00:23:18.604 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
00:23:18.610 INFO GenotypeGVCFs - Executing as ssharma@n19-32-192-thor.hpc.hutton.ac.uk on Linux v4.18.0-240.22.1.el8_3.x86_64 amd64
00:23:18.614 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
00:23:18.619 INFO GenotypeGVCFs - Start Date/Time: 29 June 2021 00:23:17 BST
00:23:18.623 INFO GenotypeGVCFs - ------------------------------------------------------------
00:23:18.628 INFO GenotypeGVCFs - ------------------------------------------------------------
00:23:18.635 INFO GenotypeGVCFs - HTSJDK Version: 2.24.0
00:23:18.639 INFO GenotypeGVCFs - Picard Version: 2.25.0
00:23:18.645 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
00:23:18.650 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
00:23:18.654 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
00:23:18.659 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
00:23:18.663 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
00:23:18.668 INFO GenotypeGVCFs - Deflater: IntelDeflater
00:23:18.673 INFO GenotypeGVCFs - Inflater: IntelInflater
00:23:18.678 INFO GenotypeGVCFs - GCS max retries/reopens: 20
00:23:18.683 INFO GenotypeGVCFs - Requester pays: disabled
00:23:18.688 INFO GenotypeGVCFs - Initializing engine
00:23:20.575 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/shared/scratch/ssharma/diversityCaptureAnalysis_scratch/chromosomewiseCombinedGvcfs.divPanelWEC/divPanelWEC.ST4.03ch01.final.variants.g.vcf.gz
00:23:20.700 INFO IntervalArgumentCollection - Processing 1000000 bp from intervals
00:23:20.710 INFO GenotypeGVCFs - Done initializing engine
00:23:20.726 INFO GenotypeGVCFs - Shutting down engine
[29 June 2021 00:23:20 BST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=194707456
java.lang.IllegalArgumentException: URI has an authority component
at sun.nio.fs.UnixUriUtils.fromUri(UnixUriUtils.java:53)
at sun.nio.fs.UnixFileSystemProvider.getPath(UnixFileSystemProvider.java:98)
at java.nio.file.Paths.get(Paths.java:138)
at htsjdk.io.HtsPath.toPath(HtsPath.java:158)
at org.broadinstitute.hellbender.engine.GATKPath.toPath(GATKPath.java:70)
at org.broadinstitute.hellbender.engine.GATKTool.createVCFWriter(GATKTool.java:862)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.onTraversalStart(GenotypeGVCFs.java:266)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)=============================================
GenotypeGVCFs run complete
end time Tue 29 Jun 00:23:20 BST 2021
===============================================###########################################################################
########################################################
adding file:// for '-V' argument input files and '-O' argument output files
########################################################
=============================================
Running the GenotypeGVCFs
start time Tue 29 Jun 00:31:03 BST 2021
===============================================
Using GATK jar /mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -XX:+UseSerialGC -jar /mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar GenotypeGVCFs -R /home/ssharma/reference_DM_PM4.03_G3_all_bowtie2/refForNGExomeCaptureOID42180_without_myb73_like/DM_v4.03_G3_allplusMyb73.fasta -L ST4.03ch01:1-1000000 -O file://ST4.03ch01:1-1000000.divPanelWEC.WEC.final.variants.vcf.gz -V file://divPanelWEC.ST4.03ch01.final.variants.g.vcf.gz --tmp-dir ./temp.divPanelWEC.ST4.03ch01.WEC.ST4.03ch01:1-1000000
00:31:12.114 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/shared/scratch/ssharma/apps/conda/envs/gatk4tools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 29, 2021 12:31:12 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
00:31:12.436 INFO GenotypeGVCFs - ------------------------------------------------------------
00:31:12.436 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.0.0
00:31:12.437 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
00:31:12.438 INFO GenotypeGVCFs - Executing as ssharma@n19-32-192-spiderman.hpc.hutton.ac.uk on Linux v4.18.0-240.22.1.el8_3.x86_64 amd64
00:31:12.439 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
00:31:12.440 INFO GenotypeGVCFs - Start Date/Time: 29 June 2021 00:31:12 BST
00:31:12.441 INFO GenotypeGVCFs - ------------------------------------------------------------
00:31:12.442 INFO GenotypeGVCFs - ------------------------------------------------------------
00:31:12.444 INFO GenotypeGVCFs - HTSJDK Version: 2.24.0
00:31:12.445 INFO GenotypeGVCFs - Picard Version: 2.25.0
00:31:12.446 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
00:31:12.446 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
00:31:12.447 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
00:31:12.448 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
00:31:12.448 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
00:31:12.449 INFO GenotypeGVCFs - Deflater: IntelDeflater
00:31:12.450 INFO GenotypeGVCFs - Inflater: IntelInflater
00:31:12.450 INFO GenotypeGVCFs - GCS max retries/reopens: 20
00:31:12.451 INFO GenotypeGVCFs - Requester pays: disabled
00:31:12.452 INFO GenotypeGVCFs - Initializing engine
00:31:13.611 INFO GenotypeGVCFs - Shutting down engine
[29 June 2021 00:31:13 BST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=194707456
java.lang.IllegalArgumentException: URI has an authority component
at sun.nio.fs.UnixUriUtils.fromUri(UnixUriUtils.java:53)
at sun.nio.fs.UnixFileSystemProvider.getPath(UnixFileSystemProvider.java:98)
at java.nio.file.Paths.get(Paths.java:138)
at htsjdk.io.HtsPath.toPath(HtsPath.java:158)
at org.broadinstitute.hellbender.engine.GATKPath.toPath(GATKPath.java:70)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getCodecForFeatureInput(FeatureDataSource.java:354)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:336)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:284)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:707)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)=============================================
GenotypeGVCFs run complete
end time Tue 29 Jun 00:31:13 BST 2021
===============================================###########################################################################
-
Hi sanjeevksh,
Thank you for testing these scenarios and following up with examples. Your error message actually changed even though it still failed which gives us more information to solve the problem. The reason that previous GATK versions could have not had these same issues is because we have been working to comply with multiple file standards and our file IO code has been changing behind the scenes. Even though the output files are not produced, they can definitely cause these error messages at this time in the program.
We did some testing and found that you'll need to update any file names containing colons with one of these two solutions:
- Use -O file:////full/path/to/file:name.vcf (4 slashes because the full path starts with a slash)
- Use -O ./relative/path/to/file:name.vcf
The previous solution regarding file:// was not quite right when there are colons. For your command, you need to do this with your -O and --tmp-dir files, it's not necessary for your -V file.
Let me know if this solves the issue!
Genevieve
-
Hi Genevieve,
Thank you so much for investigating this issue at your end and suggesting the possible solutions. I have tried option 2 and it worked absolutely fine.
As a backup plan I also started combining my GVCFs using GenomicsDBImport. Previously this tool was not compatible with data from polyploids but I see this note removed from the tool description. Does this mean GenomicsDBImport works fine for all ploidy levels now?
Thanks for your help again,
Kind regards,
Sanjeev
-
Hi sanjeevksh,
So glad that the solution worked! Thanks for letting us know. As far as I know, GenomicsDBImport still only supports diploid data. I'll follow up with the developers to make sure I haven't missed a release though. It might be a few weeks until I follow up, this weekend is a holiday.
Best,
Genevieve
-
Hi Genevieve,
This is good that I checked this with you because I see no mention of GenomicsDBImport compatibility with diploids only in the tool description.
Hopefully, the next steps would also work fine as if you remember last time I had issues with the GatherVcfs step. Due to the major changes on our cluster and several other things I just decided to start afresh.
Thanks again for your help,
Have a nice holiday,
Kind regards,
Sanjeev
-
There is some mention here in our glossary article: https://gatk.broadinstitute.org/hc/en-us/articles/360035891051-GenomicsDB
Best of luck!
-
Thanks again for confirming this, I will stick to CombineGVCFs then :-)
Kind regards,
Sanjeev
-
Hi sanjeevksh,
I spoke with the developer team regarding this issue and I found that there has been a change so that GenomicsDB should work with non-diploid data. Here are the updates to GenomicsDB where this was implemented: one and two. I am going to put in a request to have that GenomicsDB article updated as well.
I'm sorry for leading you astray on this question but I'm glad to have it clarified for myself too!
Best,
Genevieve
-
Hi Genevieve,
That's excellent! thank you for chasing this up and the update.
Kind regards,
Sanjeev
-
No problem! Thank you for your patience while I looked into it.
Please sign in to leave a comment.
11 comments