Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Error initializing feature reader for path somatic-hg38_1000g_pon.hg38.vcf.gz

Answered
0

8 comments

  • Avatar
    Bhanu Gandham

    Hi Tim, you are using a very old version of GATK which we do not support anymore. Could you try running with the latest version, 4.1.9.0

    0
    Comment actions Permalink
  • Avatar
    Tim Bishop

    Hi Bhanu,

    I ran the same code with version 4.1.9.0 and received the same error message.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Can you please post the new error log.

    0
    Comment actions Permalink
  • Avatar
    Tim Bishop

    Using GATK jar /opt/applications/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /opt/applications/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar Mutect2 -R /gpfs/home/michaelerb/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -I /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/bamFolder/MOLM13_2R_pe.sorted.bam -I /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/bamFolder/MOLM13_WT_pe.sorted.bam -intervals /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/chr_int/chr2_int.bed -tumor MOLM13_2R -normal MOLM13_WT --panel-of-normals /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/somatic-hg38_1000g_pon.hg38.vcf.gz -O /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/MOLM13_2R_sommut_chr2.vcf.gz -bamout /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/MOLM13_2R_sommut_chr2.bam
    10:02:09.935 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/applications/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Feb 12, 2021 10:02:10 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    10:02:10.542 INFO Mutect2 - ------------------------------------------------------------
    10:02:10.542 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.9.0
    10:02:10.542 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
    10:02:10.543 INFO Mutect2 - Executing as tbishop@emb0725.cluster.net on Linux v3.10.0-1127.10.1.el7.x86_64 amd64
    10:02:10.543 INFO Mutect2 - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_261-b12
    10:02:10.543 INFO Mutect2 - Start Date/Time: February 12, 2021 10:02:09 AM PST
    10:02:10.543 INFO Mutect2 - ------------------------------------------------------------
    10:02:10.543 INFO Mutect2 - ------------------------------------------------------------
    10:02:10.544 INFO Mutect2 - HTSJDK Version: 2.23.0
    10:02:10.544 INFO Mutect2 - Picard Version: 2.23.3
    10:02:10.544 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    10:02:10.544 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    10:02:10.544 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    10:02:10.544 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    10:02:10.544 INFO Mutect2 - Deflater: IntelDeflater
    10:02:10.544 INFO Mutect2 - Inflater: IntelInflater
    10:02:10.544 INFO Mutect2 - GCS max retries/reopens: 20
    10:02:10.545 INFO Mutect2 - Requester pays: disabled
    10:02:10.545 INFO Mutect2 - Initializing engine
    10:02:11.356 INFO FeatureManager - Using codec VCFCodec to read file file:///gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/somatic-hg38_1000g_pon.hg38.vcf.gz
    10:02:11.635 INFO Mutect2 - Shutting down engine
    [February 12, 2021 10:02:11 AM PST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=2041511936
    org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/somatic-hg38_1000g_pon.hg38.vcf.gz
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:383)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:335)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:282)
    at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:246)
    at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:209)
    at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:156)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeFeatures(GATKTool.java:488)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:709)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:79)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/somatic-hg38_1000g_pon.hg38.vcf.gz has invalid uncompressedLength: -2141253336, for input source: /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/somatic-hg38_1000g_pon.hg38.vcf.gz
    at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97)
    at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:82)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:380)
    ... 14 more
    Caused by: htsjdk.samtools.util.RuntimeIOException: /gpfs/home/tbishop/fastq/20200805_MOLM13_ER_WES/somatic-hg38_1000g_pon.hg38.vcf.gz has invalid uncompressedLength: -2141253336
    at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:543)
    at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
    at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
    at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
    at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
    at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
    at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:257)
    at htsjdk.tribble.readers.PositionalBufferedStream.fill(PositionalBufferedStream.java:132)
    at htsjdk.tribble.readers.PositionalBufferedStream.read(PositionalBufferedStream.java:84)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at htsjdk.tribble.readers.LongLineBufferedReader.fill(LongLineBufferedReader.java:140)
    at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:300)
    at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:356)
    at htsjdk.tribble.readers.SynchronousLineReader.readLine(SynchronousLineReader.java:51)
    at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:24)
    at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:11)
    at htsjdk.samtools.util.AbstractIterator.hasNext(AbstractIterator.java:44)
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:89)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:95)
    ... 17 more

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Tim Bishop

    Looks like the somatic-hg38_1000g_pon.hg38.vcf.gz file is malformed. How was this file generated? Can you run ValidateVariants following the directions provided here to help identify the cause of the error. Can you also post the header of the vcf.

    0
    Comment actions Permalink
  • Avatar
    elhadi iich

    Running against the same issue. Does anyone have a solution yet?

    0
    Comment actions Permalink
  • Avatar
    elhadi iich

    I believe I have found the solution

    First I did what Field did by renaming the file to vcf.gz then extracted the vcf file. Next

    I installed 'tabix' in order to recompress the vcf file using bgzip instead of gzip

    $ sudo apt install tabix
    $ bgzip somatic-hg38_1000g_pon.hg38.vcf

    finally I used bcftools to re-index the compressed vcf file using the -t argument

    bcftools index -t somatic-hg38_1000g_pon.hg38.vcf.gz

    currently running Mutect2. Hopefully this will work.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for posting your solution, elhadi iich!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk