A USER ERROR has occurred: af-only-gnomad.hg38.vcf.gz because no suitable codecs found
AnsweredI am trying to run recently published mutect2 jointcall in gatk4 and I get the following error.
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: gatk4
b) Exact command used:
gatk Mutect2 -R reference/hg38.fa -I R-HT72-p0_merged.sorted.bam -I R-HT72-p1_merged.sorted.bam -I R-HT72-P2_merged.sorted.bam -I R-HT77-p0_merged.sorted.bam -I R-HT77-p1_merged.sorted.bam -I R-HT77-p2_merged.sorted.bam -I R-HT72-at-diagnosis-FFPE_merged.sorted.bam -I R-HT77_merged.sorted.bam -I R-HT72_merged.sorted.bam -normal R-HT72_merged.sorted.bam -normal R-HT77_merged.sorted.bam --germline-resource af-only-gnomad.hg38.vcf.gz --panel-of-normals 1000g_pon.hg38.vcf.gz -O somatic.vcf.gz
c) Entire error log:
Using GATK jar /geode2/soft/hps/rhel7/gatk/4.1.7.0/gatk-package-4.1.7.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /geode2/soft/hps/rhel7/gatk/4.1.7.0/gatk-package-4.1.7.0-local.jar Mutect2 -R ../../../reference/hg38.fa -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT72-p0_merged.sorted.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT72-p1_merged.sorted.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT72-P2_merged.sorted.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT77-p0_merged.sorted.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT77-p1_merged.sorted.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT77-p2_merged.sorted.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT77-p3_merged.sorted_reheadered.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/HT77-P3-Cell-line_merged.sorted.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT72-at-diagnosis-FFPE_merged.sorted.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT77_merged.sorted.bam -I /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT72_merged.sorted.bam -normal /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT72_merged.sorted.bam -normal /N/project/phi_nygcpdx/WGS_results/Sentieon_VariantCalling/align/Merged_BAMs/R-HT77_merged.sorted.bam --germline-resource /N/project/phi_nygcpdx/PON/af-only-gnomad.hg38.vcf.gz --panel-of-normals /N/project/phi_nygcpdx/PON/1000g_pon.hg38.vcf.gz -O somatic.vcf.gz
19:02:57.035 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/geode2/soft/hps/rhel7/gatk/4.1.7.0/gatk-package-4.1.7.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Oct 01, 2020 7:02:57 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
19:02:57.200 INFO Mutect2 - ------------------------------------------------------------
19:02:57.201 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.7.0
19:02:57.201 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
19:02:57.201 INFO Mutect2 - Executing as asjannu@som8.carbonate.uits.iu.edu on Linux v3.10.0-1127.19.1.el7.x86_64 amd64
19:02:57.201 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_262-b10
19:02:57.201 INFO Mutect2 - Start Date/Time: October 1, 2020 7:02:57 PM EDT
19:02:57.201 INFO Mutect2 - ------------------------------------------------------------
19:02:57.201 INFO Mutect2 - ------------------------------------------------------------
19:02:57.202 INFO Mutect2 - HTSJDK Version: 2.21.2
19:02:57.202 INFO Mutect2 - Picard Version: 2.21.9
19:02:57.202 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
19:02:57.202 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:02:57.202 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
19:02:57.202 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:02:57.202 INFO Mutect2 - Deflater: IntelDeflater
19:02:57.202 INFO Mutect2 - Inflater: IntelInflater
19:02:57.202 INFO Mutect2 - GCS max retries/reopens: 20
19:02:57.202 INFO Mutect2 - Requester pays: disabled
19:02:57.202 INFO Mutect2 - Initializing engine
19:02:57.655 INFO Mutect2 - Shutting down engine
[October 1, 2020 7:02:57 PM EDT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2296905728
***********************************************************************
A USER ERROR has occurred: Cannot read file:///N/project/phi_nygcpdx/PON/1000g_pon.hg38.vcf.gz because no suitable codecs found
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
I have the tbi index files for both PON vcf and gnomad vcf. The indexes are in the same folder as the vcfs. Is anything wrong with my command?
-
Hi Genevieve,
I tried what Field-ye Tan did but that alone wasn't enough to solve the problem. I believe I have found the solution
First I did what Field did by renaming the file to vcf.gz then extracted the vcf file. Next
I installed 'tabix' in order to recompress the vcf file using bgzip instead of gzip
$ sudo apt install tabix
$ bgzip somatic-hg38_1000g_pon.hg38.vcffinally I used bcftools to re-index the compressed vcf file using the -t argument
bcftools index -t somatic-hg38_1000g_pon.hg38.vcf.gz
currently running Mutect2. Hopefully this will work.
-
Asha The file may be corrupted or overwritten as a different format. Try to re-download it and see if it works.
-
Hi Genevieve,
I met a rather similar problem running MuTect2.
I found from other threads that both files of
1000g_pon.hg38.vcf.gz and
af-only-gnomad.hg38.vcf.gz
are not strictly required but will help.
I have downloaded both files from the link
For one thing, I can read nothing but meanless characters by reading them in Excel.
For another thing, I got the error message of the following.
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /home/field/shared/GATK_files/somatic-hg38_1000g_pon.hg38.vcf
I wonder if I've downloaded the encrypted version of the files?If so, what's proper link to download.
Thank you so much.
-
Hi Field -Ye Tian, did you unzip the 1000g_pon.hg38.vcf.gz file before trying to view it? Files with .gz are compressed and are not readable with excel.
-
Hi Genevieve,
For some weird reason, I saw the file listed under the name "1000g_pon.hg38.vcf.gz" but when I downloaded it, I automatically got the file "somatic-hg38_1000g_pon.hg38.vcf".
Similar thing happens when I downloaded "af-only-gnomad.hg38.vcf.gz"
My apologies that I forgot to mention.
I would also invite a few friends to check out. Would you please also take a look?
Thank you very much.
Field
-
Hi Genevieve,
The problem I posted can be solved by changing the downloaded file's suffix to .vcf.gz and then unzip it.
Although I have encountered another issue, namely
Input files reference and features have incompatible contigs: No overlapping contigs found.
That would be a separate problem.
Best.
-
Field -Ye Tian great, thank you for the update and glad you were able to solve the issue!
You can post the separate problem in a different post for support, though I believe that same issue has been solved on the forum before, so please search the forum and see if the solution already exists.
-
Hi all,
Running against similar issues using the somatic-hg38_1000g_pon.hg38.vcf.gz file.
Using GATK jar /home/svu/phaei/.conda/miniconda/envs/biotools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/svu/phaei/.conda/miniconda/envs/biotools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar Mutect2 -R /hpctmp/biodata/igenomes/references/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa -I BAM_files/CHC1885_sorted_rmdup2.bam -I BAM_files/CHC1884_sorted_rmdup2.bam -normal 2850_N --germline-resource gnomad/somatic-hg38_af-only-gnomad.hg38.vcf.gz --panel-of-normals gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz -O vcf_files/2850_somatic.vcf.gz
15:10:52.504 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/svu/phaei/.conda/miniconda/envs/biotools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 26, 2021 3:10:52 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:10:52.746 INFO Mutect2 - ------------------------------------------------------------
15:10:52.747 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.2.0.0
15:10:52.747 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
15:10:52.748 INFO Mutect2 - Executing as phaei@tiger2-c36.hpc.local on Linux v3.10.0-862.el7.x86_64 amd64
15:10:52.748 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
15:10:52.748 INFO Mutect2 - Start Date/Time: April 26, 2021 3:10:52 PM SGT
15:10:52.749 INFO Mutect2 - ------------------------------------------------------------
15:10:52.749 INFO Mutect2 - ------------------------------------------------------------
15:10:52.750 INFO Mutect2 - HTSJDK Version: 2.24.0
15:10:52.750 INFO Mutect2 - Picard Version: 2.25.0
15:10:52.750 INFO Mutect2 - Built for Spark Version: 2.4.5
15:10:52.750 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:10:52.750 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:10:52.750 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:10:52.750 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:10:52.750 INFO Mutect2 - Deflater: IntelDeflater
15:10:52.750 INFO Mutect2 - Inflater: IntelInflater
15:10:52.751 INFO Mutect2 - GCS max retries/reopens: 20
15:10:52.751 INFO Mutect2 - Requester pays: disabled
15:10:52.751 INFO Mutect2 - Initializing engine
15:10:53.386 INFO FeatureManager - Using codec VCFCodec to read file file:///hpctmp/phaei/MUX11418/EXOME/gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz
15:10:53.531 INFO Mutect2 - Shutting down engine
[April 26, 2021 3:10:53 PM SGT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=984088576
org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:385)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:337)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:284)
at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:246)
at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:209)
at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:156)
at org.broadinstitute.hellbender.engine.GATKTool.initializeFeatures(GATKTool.java:486)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:707)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:79)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: /hpctmp/phaei/MUX11418/EXOME/gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz has invalid uncompressedLength: -2141253336, for input source: gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97)
at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:82)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:382)
... 14 more
Caused by: htsjdk.samtools.util.RuntimeIOException: /hpctmp/phaei/MUX11418/EXOME/gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz has invalid uncompressedLength: -2141253336
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:543)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:257)
at htsjdk.tribble.readers.PositionalBufferedStream.fill(PositionalBufferedStream.java:132)
at htsjdk.tribble.readers.PositionalBufferedStream.read(PositionalBufferedStream.java:84)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at htsjdk.tribble.readers.LongLineBufferedReader.fill(LongLineBufferedReader.java:140)
at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:300)
at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:356)
at htsjdk.tribble.readers.SynchronousLineReader.readLine(SynchronousLineReader.java:51)
at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:24)
at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:11)
at htsjdk.samtools.util.AbstractIterator.hasNext(AbstractIterator.java:44)
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:89)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:95)
... 17 moreIt seems to be an issue with the vcf file. I ran ValidateVariants and got a similar error
Any suggestions would be very much appreciated.
-
Hi elhadi iich,
What troubleshooting steps have you tried so far?
It looks like Field -Ye Tian was able to get it to work by renaming the .vcf file to .vcf.gz after downloading.
I tried to replicate this issue on my own by downloading both the 1000g_pon.hg38.vcf.gz and 1000g_pon.hg38.vcf.gz.tbi files. I had to change the name (as Field -Ye Tian suggested) with .gz and rename the .tbi file to match the naming of the 1000g_pon.hg38.vcf.gz file. Once I had done those two steps, ValidateVariants worked fine.
Let me know if there is something else going on.
Best,
Genevieve
Please sign in to leave a comment.
9 comments