MergeVcfs issue: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
AnsweredGATK version: 4.2.5.0
Commend used:
gatk --java-options "-Xmx128G" \
MergeVcfs \
I=${interval_list} \
O=${output_Mutect}
${interval_list} is a .list file containing all the interval files' paths. All splited vcfs were created by Mutect2, and they belonged to the same sample.
Running:
16:00:38.794 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/dssg/home/acct-medkwf/medkwf4/.conda/envs/dna/share/picard-2.27.4-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Aug 03 16:00:38 CST 2022] MergeVcfs INPUT=[/dssg/home/acct-medkwf/medkwf4/results/MRD/CC_data/CC-H029C/Muetct2_test/interval.list] OUTPUT=/dssg/home/acct-medkwf/medkwf4/results/MRD/CC_data/CC-H029C/Muetct2_test/CC-H029C.mutect2.vcf.gz VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=true CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Aug 03 16:00:38 CST 2022] Executing as medkwf4@node030.pi.sjtu.edu.cn on Linux 4.18.0-240.el8.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.4-SNAPSHOT
[Wed Aug 03 16:00:38 CST 2022] picard.vcf.MergeVcfs done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=514850816
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
at java.net.URI.create(URI.java:852)
at htsjdk.samtools.util.IOUtil.getPath(IOUtil.java:1228)
at htsjdk.samtools.util.IOUtil.lambda$unrollPaths$1(IOUtil.java:1182)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at htsjdk.samtools.util.IOUtil.unrollPaths(IOUtil.java:1179)
at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:171)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.net.URISyntaxException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parse(URI.java:3067)
at java.net.URI.<init>(URI.java:588)
at java.net.URI.create(URI.java:850)
... 18 more
For troubleshooting, I used two specific vcf files in the list and reran the command
picard MergeVcfs I=/dssg/home/acct-medkwf/medkwf4/results/MRD/CC_data/CC-H029C/Muetct2_test/CC-H029C.mutect2.vcf.gz.0001-scattered.interval I=/dssg/home/acct-medkwf/medkwf4/results/MRD/CC_data/CC-H029C/Muetct2_test/CC-H029C.mutect2.vcf.gz.0002-scattered.interval O=test.txt
Running:
INFO 2022-08-03 15:26:49 MergeVcfs
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
**********
https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** MergeVcfs -I /dssg/home/acct-medkwf/medkwf4/results/MRD/CC_data/CC-H029C/Muetct2_test/CC-H029C.mutect2.vcf.gz.0001-scattered.interval -I /dssg/home/acct-medkwf/medkwf4/results/MRD/CC_data/CC-H029C/Muetct2_test/CC-H029C.mutect2.vcf.gz.0002-scattered.interval -O test.txt
**********
15:26:50.431 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/dssg/home/acct-medkwf/medkwf4/.conda/envs/dna/share/picard-2.27.4-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Aug 03 15:26:50 CST 2022] MergeVcfs INPUT=[/dssg/home/acct-medkwf/medkwf4/results/MRD/CC_data/CC-H029C/Muetct2_test/CC-H029C.mutect2.vcf.gz.0001-scattered.interval, /dssg/home/acct-medkwf/medkwf4/results/MRD/CC_data/CC-H029C/Muetct2_test/CC-H029C.mutect2.vcf.gz.0002-scattered.interval] OUTPUT=test.txt VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=true CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Aug 03 15:26:50 CST 2022] Executing as medkwf4@sylogin1.pi.sjtu.edu.cn on Linux 4.18.0-240.el8.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.4-SNAPSHOT
[Wed Aug 03 15:26:50 CST 2022] picard.vcf.MergeVcfs done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=514850816
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
at java.net.URI.create(URI.java:852)
at htsjdk.samtools.util.IOUtil.getPath(IOUtil.java:1228)
at htsjdk.samtools.util.IOUtil.lambda$unrollPaths$1(IOUtil.java:1182)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at htsjdk.samtools.util.IOUtil.unrollPaths(IOUtil.java:1179)
at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:171)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.net.URISyntaxException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parse(URI.java:3067)
at java.net.URI.<init>(URI.java:588)
at java.net.URI.create(URI.java:850)
... 18 more
It looks like issues with my VCF file header. Any idea about how to solve this problem?
-
Hi Liyang Zhang,
Thank you for writing to the GATK forum! I hope that we can help you sort this out.
To start, it looks like the .list file contains multiple file names. Within one of the VCF paths in this interval list file, there is likely a character that is not legal in the URI. I suggest searching for any weird characters in the URI. GATK seems to be choking on the file name rather than the actual contents of the file.
I hope this helps! Please let me know what you find. Feel free to reach out with any further questions in the meantime.
Best,
Anthony -
Hi Liyang, how did you resolve this? Thanks
Please sign in to leave a comment.
2 comments