Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

A USER ERROR has occurred: af-only-gnomad.hg38.vcf.gz because no suitable codecs found

Answered
0

9 comments

  • Avatar
    Genevieve Brandt (she/her)

    Asha The file may be corrupted or overwritten as a different format. Try to re-download it and see if it works.

    0
    Comment actions Permalink
  • Avatar
    Field -Ye Tian

    Hi Genevieve, 

    I met a rather similar problem running MuTect2. 

    I found from other threads that both files of 

    1000g_pon.hg38.vcf.gz and

    af-only-gnomad.hg38.vcf.gz

    are not strictly required but will help. 

    I have downloaded both files from the link 

    https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38;tab=objects?prefix=&forceOnObjectsSortingFiltering=false

    For one thing, I can read nothing but meanless characters by reading them in Excel.

    For another thing, I got the error message of the following. 

    Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /home/field/shared/GATK_files/somatic-hg38_1000g_pon.hg38.vcf

    I wonder if I've downloaded the encrypted version of the files?If so, what's proper link to download. 

     

    Thank you so much.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Field -Ye Tian, did you unzip the 1000g_pon.hg38.vcf.gz file before trying to view it? Files with .gz are compressed and are not readable with excel. 

    0
    Comment actions Permalink
  • Avatar
    Field -Ye Tian

    Hi Genevieve,

    For some weird reason, I saw the file listed under the name "1000g_pon.hg38.vcf.gz" but when I downloaded it, I automatically got the file "somatic-hg38_1000g_pon.hg38.vcf". 

    Similar thing happens when I downloaded "af-only-gnomad.hg38.vcf.gz"

    My apologies that I forgot to mention. 

    I would also invite a few friends to check out. Would you please also take a look?

    Thank you very much.

    Field

    0
    Comment actions Permalink
  • Avatar
    Field -Ye Tian

    Hi Genevieve,

    The problem I posted can be solved by changing the downloaded file's suffix to .vcf.gz and then unzip it. 

    Although I have encountered another issue, namely

    Input files reference and features have incompatible contigs: No overlapping contigs found.

    That would be a separate problem.

    Best.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Field -Ye Tian great, thank you for the update and glad you were able to solve the issue!

    You can post the separate problem in a different post for support, though I believe that same issue has been solved on the forum before, so please search the forum and see if the solution already exists.

    0
    Comment actions Permalink
  • Avatar
    elhadi iich

    Hi all,

    Running against similar issues using the somatic-hg38_1000g_pon.hg38.vcf.gz file.

    Using GATK jar /home/svu/phaei/.conda/miniconda/envs/biotools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/svu/phaei/.conda/miniconda/envs/biotools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar Mutect2 -R /hpctmp/biodata/igenomes/references/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa -I BAM_files/CHC1885_sorted_rmdup2.bam -I BAM_files/CHC1884_sorted_rmdup2.bam -normal 2850_N --germline-resource gnomad/somatic-hg38_af-only-gnomad.hg38.vcf.gz --panel-of-normals gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz -O vcf_files/2850_somatic.vcf.gz
    15:10:52.504 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/svu/phaei/.conda/miniconda/envs/biotools/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Apr 26, 2021 3:10:52 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    15:10:52.746 INFO Mutect2 - ------------------------------------------------------------
    15:10:52.747 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.2.0.0
    15:10:52.747 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
    15:10:52.748 INFO Mutect2 - Executing as phaei@tiger2-c36.hpc.local on Linux v3.10.0-862.el7.x86_64 amd64
    15:10:52.748 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_282-b08
    15:10:52.748 INFO Mutect2 - Start Date/Time: April 26, 2021 3:10:52 PM SGT
    15:10:52.749 INFO Mutect2 - ------------------------------------------------------------
    15:10:52.749 INFO Mutect2 - ------------------------------------------------------------
    15:10:52.750 INFO Mutect2 - HTSJDK Version: 2.24.0
    15:10:52.750 INFO Mutect2 - Picard Version: 2.25.0
    15:10:52.750 INFO Mutect2 - Built for Spark Version: 2.4.5
    15:10:52.750 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    15:10:52.750 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    15:10:52.750 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    15:10:52.750 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    15:10:52.750 INFO Mutect2 - Deflater: IntelDeflater
    15:10:52.750 INFO Mutect2 - Inflater: IntelInflater
    15:10:52.751 INFO Mutect2 - GCS max retries/reopens: 20
    15:10:52.751 INFO Mutect2 - Requester pays: disabled
    15:10:52.751 INFO Mutect2 - Initializing engine
    15:10:53.386 INFO FeatureManager - Using codec VCFCodec to read file file:///hpctmp/phaei/MUX11418/EXOME/gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz
    15:10:53.531 INFO Mutect2 - Shutting down engine
    [April 26, 2021 3:10:53 PM SGT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=984088576
    org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:385)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:337)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:284)
    at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:246)
    at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:209)
    at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:156)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeFeatures(GATKTool.java:486)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:707)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:79)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: /hpctmp/phaei/MUX11418/EXOME/gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz has invalid uncompressedLength: -2141253336, for input source: gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz
    at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97)
    at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:82)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:382)
    ... 14 more
    Caused by: htsjdk.samtools.util.RuntimeIOException: /hpctmp/phaei/MUX11418/EXOME/gnomad/somatic-hg38_1000g_pon.hg38.vcf.gz has invalid uncompressedLength: -2141253336
    at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:543)
    at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
    at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
    at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
    at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
    at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
    at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:257)
    at htsjdk.tribble.readers.PositionalBufferedStream.fill(PositionalBufferedStream.java:132)
    at htsjdk.tribble.readers.PositionalBufferedStream.read(PositionalBufferedStream.java:84)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at htsjdk.tribble.readers.LongLineBufferedReader.fill(LongLineBufferedReader.java:140)
    at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:300)
    at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:356)
    at htsjdk.tribble.readers.SynchronousLineReader.readLine(SynchronousLineReader.java:51)
    at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:24)
    at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:11)
    at htsjdk.samtools.util.AbstractIterator.hasNext(AbstractIterator.java:44)
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:89)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:95)
    ... 17 more

    It seems to be an issue with the vcf file. I ran ValidateVariants and got a similar error

    Any suggestions would be very much appreciated.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi elhadi iich,

    What troubleshooting steps have you tried so far?

    It looks like Field -Ye Tian was able to get it to work by renaming the .vcf file to .vcf.gz after downloading.

    I tried to replicate this issue on my own by downloading both the 1000g_pon.hg38.vcf.gz and 1000g_pon.hg38.vcf.gz.tbi files. I had to change the name (as Field -Ye Tian suggested) with .gz and rename the .tbi file to match the naming of the 1000g_pon.hg38.vcf.gz file. Once I had done those two steps, ValidateVariants worked fine.

    Let me know if there is something else going on.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    elhadi iich

    Hi Genevieve,

    I tried what Field-ye Tan did but that alone wasn't enough to solve the problem. I believe I have found the solution

    First I did what Field did by renaming the file to vcf.gz then extracted the vcf file. Next

    I installed 'tabix' in order to recompress the vcf file using bgzip instead of gzip

    $ sudo apt install tabix
    $ bgzip somatic-hg38_1000g_pon.hg38.vcf

    finally I used bcftools to re-index the compressed vcf file using the -t argument

    bcftools index -t somatic-hg38_1000g_pon.hg38.vcf.gz

    currently running Mutect2. Hopefully this will work.

    2
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk