Illegal character in hostname at index 9
AnsweredHi GATK team,
I tried running the following command, however it throws the listed error. I suspect it's because of the underscores in the google bucket name, however just wanted to confirm if GATK has had previously known issues like this.
a) GATK version used:
The Genome Analysis Toolkit (GATK) v4.2.6.1
b) Exact command used:
gatk --java-options "-Xms63g -XX:+UseParallelGC -XX:ParallelGCThreads=3" \ VariantRecalibrator \ -V gs://daly_schema2_gnomad_subset_vds/rye-test/outputs/subset_50_2000-vqsr-ready.vcf.bgz \ -O ${BATCH_TMPDIR}/VQSR__SNPsVariantRecalibratorScattered-CkEY1/recalibration \ --tranches-file ${BATCH_TMPDIR}/VQSR__SNPsVariantRecalibratorScattered-CkEY1/tranches \ --trust-all-polymorphic \ -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 -tranche 99.6 -tranche 99.5 -tranche 99.4 -tranche 99.3 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 \ -an AS_QD -an AS_MQRankSum -an AS_ReadPosRankSum -an AS_FS -an AS_MQ \ -mode SNP \ --max-gaussians 6 \ -resource:hapmap,known=false,training=true,truth=true,prior=15 gs://gcp-public-data--broad-references/hg38/v0/hapmap_3.3.hg38.vcf.gz \ -resource:omni,known=false,training=true,truth=true,prior=12 gs://gcp-public-data--broad-references/hg38/v0/1000G_omni2.5.hg38.vcf.gz \ -resource:1000G,known=false,training=true,truth=false,prior=10 gs://gcp-public-data--broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz \ -resource:dbsnp,known=true,training=false,truth=false,prior=7 gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.gz \ --use-allele-specific-annotations
c) Entire program log:
15:55:52.774 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so 15:55:52.797 INFO VariantRecalibrator - ------------------------------------------------------------ 15:55:52.798 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.6.1 15:55:52.798 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/ 15:55:52.798 INFO VariantRecalibrator - Executing as root@hostname-8809556844 on Linux v5.4.0-1042-gcp amd64 15:55:52.798 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 15:55:52.798 INFO VariantRecalibrator - Start Date/Time: November 21, 2022 3:55:52 PM GMT 15:55:52.798 INFO VariantRecalibrator - ------------------------------------------------------------ 15:55:52.798 INFO VariantRecalibrator - ------------------------------------------------------------ 15:55:52.798 INFO VariantRecalibrator - HTSJDK Version: 2.24.1 15:55:52.799 INFO VariantRecalibrator - Picard Version: 2.27.1 15:55:52.799 INFO VariantRecalibrator - Built for Spark Version: 2.4.5 15:55:52.799 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2 15:55:52.799 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 15:55:52.799 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 15:55:52.799 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 15:55:52.799 INFO VariantRecalibrator - Deflater: IntelDeflater 15:55:52.799 INFO VariantRecalibrator - Inflater: IntelInflater 15:55:52.799 INFO VariantRecalibrator - GCS max retries/reopens: 20 15:55:52.799 INFO VariantRecalibrator - Requester pays: disabled 15:55:52.799 INFO VariantRecalibrator - Initializing engine 15:55:54.309 INFO FeatureManager - Using codec VCFCodec to read file gs://gcp-public-data--broad-references/hg38/v0/hapmap_3.3.hg38.vcf.gz 15:55:56.132 INFO FeatureManager - Using codec VCFCodec to read file gs://gcp-public-data--broad-references/hg38/v0/1000G_omni2.5.hg38.vcf.gz 15:55:58.182 INFO FeatureManager - Using codec VCFCodec to read file gs://gcp-public-data--broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz 15:56:00.610 INFO FeatureManager - Using codec VCFCodec to read file gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.gz 15:56:02.051 INFO VariantRecalibrator - Shutting down engine [November 21, 2022 3:56:02 PM GMT] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.15 minutes. Runtime.totalMemory()=64827162624 Exception in thread "main" java.lang.AssertionError: java.net.URISyntaxException: Illegal character in hostname at index 9: gs://daly_schema2_gnomad_subset_vds/rye-test/outputs/subset_50_2000-vqsr-ready.vcf.bgz at com.google.cloud.storage.contrib.nio.CloudStoragePath.toUri(CloudStoragePath.java:374) at org.broadinstitute.hellbender.exceptions.UserException$CouldNotReadInputFile.<init>(UserException.java:79) at org.broadinstitute.hellbender.utils.io.IOUtils.assertFileIsReadable(IOUtils.java:857) at org.broadinstitute.hellbender.engine.FeatureDataSource.getCodecForFeatureInput(FeatureDataSource.java:396) at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:373) at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319) at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291) at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:225) at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$initializeDrivingVariants$0(MultiVariantWalker.java:86) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.broadinstitute.hellbender.engine.MultiVariantWalker.initializeDrivingVariants(MultiVariantWalker.java:76) at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67) at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726) at org.broadinstitute.hellbender.engine.MultiVariantWalker.onStartup(MultiVariantWalker.java:49) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) Caused by: java.net.URISyntaxException: Illegal character in hostname at index 9: gs://daly_schema2_gnomad_subset_vds/rye-test/outputs/subset_50_2000-vqsr-ready.vcf.bgz at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.parseHostname(URI.java:3387) at java.net.URI$Parser.parseServer(URI.java:3236) at java.net.URI$Parser.parseAuthority(URI.java:3155) at java.net.URI$Parser.parseHierarchical(URI.java:3097) at java.net.URI$Parser.parse(URI.java:3053) at java.net.URI.<init>(URI.java:673) at java.net.URI.<init>(URI.java:774) at com.google.cloud.storage.contrib.nio.CloudStoragePath.toUri(CloudStoragePath.java:372) ... 20 more Using GATK jar /gatk/gatk-package-4.2.6.1-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms63g -XX:+UseParallelGC -XX:ParallelGCThreads=3 -jar /gatk/gatk-package-4.2.6.1-local.jar VariantRecalibrator -V gs://daly_schema2_gnomad_subset_vds/rye-test/outputs/subset_50_2000-vqsr-ready.vcf.bgz -O /io/batch/54f332/VQSR__SNPsVariantRecalibratorScattered-CkEY1/recalibration --tranches-file /io/batch/54f332/VQSR__SNPsVariantRecalibratorScattered-CkEY1/tranches --trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 -tranche 99.6 -tranche 99.5 -tranche 99.4 -tranche 99.3 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 -an AS_QD -an AS_MQRankSum -an AS_ReadPosRankSum -an AS_FS -an AS_MQ -mode SNP --max-gaussians 6 -resource:hapmap,known=false,training=true,truth=true,prior=15 gs://gcp-public-data--broad-references/hg38/v0/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=true,prior=12 gs://gcp-public-data--broad-references/hg38/v0/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10 gs://gcp-public-data--broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=7 gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.gz --use-allele-specific-annotations
-
Hi Bob Ye Lindo Nkambule,
Thanks for writing into the forum with this issue! Hopefully we can help out and we can get this resolved for you.
You're right that the underscores are causing an issue. Underscores are not allowed for URI host names (aka your gs://daly_schema2_gnomad_subset_vds is a URI host). GATK follows the URI spec for this, which you can read more about it here: https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names.
It's a legal Google Cloud bucket name, but when you refer to the bucket in a URI, it causes problems there. Ideally, you can contact the owner of the bucket and they can rename the bucket to use hyphens instead of underscores.
Let me know if you have any other questions about this.
Best,
Genevieve
Please sign in to leave a comment.
1 comment