Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

"htsjdk.samtools.SAMFormatException: Invalid GZIP header" in VariantRecalibrator

Answered
0

18 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hello zhaol, thank you for running ValidateVariants already!

    I am wondering if this is a samtools index problem, could you try to view the file with samtools view then samtools index?

    Also, the file that is named test1.gz, you may want to give a proper extension, for example test1.vcf.gz

    -1
    Comment actions Permalink
  • Avatar
    zhaol

    sorry, I am not quite understand. "view the file with samtools view then samtools index" means I need run the step about convert sam to bam and index bam?

    And the extension may be not affecting the result because the error remains when I use a proper extension. But I have not used the proper extension is a big problem for me, thanks for your reminder.

    Finally, I used just six samples to run my VariantRecalibrator to test my workflow, making sure I didn't find errors in the workflow after all the examples generated VCF files. I don't know if this information is helpful to you?

    Thank for your reply!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi zhaol, I apologize for the confusion. I have some more questions:

    • Is this the entire stack trace or is there more information? Could you post the stack trace in text?
    • Did you run ValidateVariants with both the zipped and unzipped versions?
    • Once everything is properly named and you have confirmed that the files do not have issues, re-index the file with IndexFeatureFile so that gatk is not using a different index that is remaining from a previous step. Make sure to delete the old index files.
    0
    Comment actions Permalink
  • Avatar
    zhaol

    Using GATK jar /home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar VariantRecalibrator -R /data2/zhaol/ref/hg38.fasta -V /data1/zhaol/work/pca_test/f1_result/f1.HC.vcf.gz --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 /data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 /data2/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /data2/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz -an QD -an DP -an MQankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -O /data1/zhaol/work/pca_test/f1_result/output.recal --tranches-file /data1/zhaol/work/pca_test/f1_result/output.tranches --rscript-file /data1/zhaol/work/pca_test/f1_result/output.plots.R
    08:12:35.946 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Oct 11, 2020 8:12:46 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    08:12:46.767 INFO VariantRecalibrator - ------------------------------------------------------------
    08:12:46.767 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.8.0
    08:12:46.767 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
    08:12:51.781 INFO VariantRecalibrator - Executing as zhaol@node01 on Linux v3.10.0-693.5.2.el7.x86_64 amd64
    08:12:51.781 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_151-b12
    08:12:51.782 INFO VariantRecalibrator - Start Date/Time: October 11, 2020 8:12:35 AM EDT
    08:12:51.782 INFO VariantRecalibrator - ------------------------------------------------------------
    08:12:51.782 INFO VariantRecalibrator - ------------------------------------------------------------
    08:12:51.782 INFO VariantRecalibrator - HTSJDK Version: 2.22.0
    08:12:51.782 INFO VariantRecalibrator - Picard Version: 2.22.8
    08:12:51.782 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:12:51.783 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:12:51.783 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:12:51.783 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:12:51.783 INFO VariantRecalibrator - Deflater: IntelDeflater
    08:12:51.783 INFO VariantRecalibrator - Inflater: IntelInflater
    08:12:51.783 INFO VariantRecalibrator - GCS max retries/reopens: 20
    08:12:51.783 INFO VariantRecalibrator - Requester pays: disabled
    08:12:51.783 INFO VariantRecalibrator - Initializing engine
    08:12:52.508 INFO FeatureManager - Using codec VCFCodec to read file file:///data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz
    08:12:52.893 INFO FeatureManager - Using codec VCFCodec to read file file:///data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz
    08:12:53.154 INFO FeatureManager - Using codec VCFCodec to read file file:///data2/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz
    08:12:53.377 INFO FeatureManager - Using codec VCFCodec to read file file:///data2/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz
    08:12:53.675 INFO FeatureManager - Using codec VCFCodec to read file file:///data1/zhaol/work/pca_test/f1_result/f1.HC.vcf.gz
    08:12:54.327 INFO VariantRecalibrator - Done initializing engine
    08:12:54.353 INFO TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
    08:12:54.353 INFO TrainingSet - Found omni track: Known = false Training = true Truth = false Prior = Q12.0
    08:12:54.353 INFO TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0
    08:12:54.354 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
    08:12:54.379 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
    08:12:54.442 INFO ProgressMeter - Starting traversal
    08:12:54.442 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    08:12:55.778 INFO VariantRecalibrator - Shutting down engine
    [October 11, 2020 8:12:55 AM EDT] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.33 minutes.
    Runtime.totalMemory()=3858759680
    htsjdk.samtools.SAMFormatException: Invalid GZIP header
    at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:121)
    at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
    at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
    at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
    at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
    at htsjdk.samtools.util.BlockCompressedInputStream.seek(BlockCompressedInputStream.java:380)
    at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:427)
    at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
    at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
    at htsjdk.tribble.TabixFeatureReader$FeatureIterator.<init>(TabixFeatureReader.java:159)
    at htsjdk.tribble.TabixFeatureReader.query(TabixFeatureReader.java:133)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:567)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:536)
    at org.broadinstitute.hellbender.engine.FeatureManager.getFeatures(FeatureManager.java:352)
    at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:173)

    at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:125)
    at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:240)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantDataManager.parseTrainingSets(VariantDataManager.java:392)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.addDatum(VariantRecalibrator.java:614)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.addVariantDatum(VariantRecalibrator.java:571)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.lambda$consumeQueuedVariants$0(VariantRecalibrator.java:542)
    at java.util.ArrayList.forEach(ArrayList.java:1255)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.consumeQueuedVariants(VariantRecalibrator.java:542)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.apply(VariantRecalibrator.java:521)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:120)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:118)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)

     

    This is the result of one of my runs of the VariantRecalibrator, so it differs from the picture. But the error message is the same, except for the name of the file.

    I've only been running ValidateVariants with zipped file and I will try to run run ValidateVariants with unzipped file. In addition, I will try to reindex the file with IndexFeatureFile.

    Thanks!

     

    0
    Comment actions Permalink
  • Avatar
    zhaol

    when I run ValidateVariants with both the zipped and unzipped versions, the result both are A USER ERROR has occurred: Input f1.hc.g.vcf.genotyped.gz fails strict validation of type ALL: one or more of the ALT allele(s) for the record at position chr3:138889149 are not observed at all in the sample genotypes

    and I then reindex the input file for VariantRecalibrator, but the error is remains: Invalid GZIP header.

    May I run after HaplotypeCaller with the parameter -L and GatherVcfs the vcf files, I index the gathered vcf with tabix? 

    0
    Comment actions Permalink
  • Avatar
    zhaol

    Hi, Genevieve Brandt.

    Thank you for your help. I just solved the error. As you mentioned above, I have some problems with my index because I use tabix to index my VCF files from the HaplotypeCaller, which should be indexed by the IndexFeatureFile.

    Thank you very much!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi zhaol, thank you for the update, and glad you fixed the problem! Thank you as well for posting the solution for other users.

    0
    Comment actions Permalink
  • Avatar
    zhaol

    Hi Genevieve Brandt!

    When I rerun this workflow, I encountered a similar error --htsjdk.samtools.SAMFormatException: Invalid GZIP header(still occur when running VariantRecalibrator). I remember I solved the error a month ago, but this time I still got the error. Maybe I haven't solve the error?

    - I ran ValidateVariants to validate zipped and unzipped VCF file which was output from HaplotypeCaller and GenotypeGVCFs, and they did not report an error.

    - I ran IndexFeatureFile to index the VCF files which was output from HaplotypeCaller and GenotypeGVCFs, and still got the error--Invalid GZIP header

    my stack trace is

    [zhaol@node04 387]$ gatk VariantRecalibrator -R /home/zhaol/ref/hg38.fasta -V 387.g.vcf.gz --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /home/zhaol/annotation/hapmap_3.3.hg38.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 /home/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 /home/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /home/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz -an QD -an DP -an MQankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -O output.recal --tranches-file output.tranches --rscript-file output.plots.R
    Using GATK jar /home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar VariantRecalibrator -R /home/zhaol/ref/hg38.fasta -V 387.g.vcf.gz --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /home/zhaol/annotation/hapmap_3.3.hg38.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 /home/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 /home/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /home/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz -an QD -an DP -an MQankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -O output.recal --tranches-file output.tranches --rscript-file output.plots.R
    16:13:09.591 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Nov 11, 2020 4:13:09 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    16:13:09.975 INFO VariantRecalibrator - ------------------------------------------------------------
    16:13:09.975 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.8.0
    16:13:09.975 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:13:09.975 INFO VariantRecalibrator - Executing as zhaol@node04 on Linux v3.10.0-862.el7.x86_64 amd64
    16:13:09.975 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-b04
    16:13:09.975 INFO VariantRecalibrator - Start Date/Time: 2020年11月11日 下午04时13分09秒
    16:13:09.975 INFO VariantRecalibrator - ------------------------------------------------------------
    16:13:09.975 INFO VariantRecalibrator - ------------------------------------------------------------
    16:13:09.976 INFO VariantRecalibrator - HTSJDK Version: 2.22.0
    16:13:09.976 INFO VariantRecalibrator - Picard Version: 2.22.8
    16:13:09.976 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:13:09.976 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:13:09.976 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:13:09.976 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:13:09.976 INFO VariantRecalibrator - Deflater: IntelDeflater
    16:13:09.976 INFO VariantRecalibrator - Inflater: IntelInflater
    16:13:09.976 INFO VariantRecalibrator - GCS max retries/reopens: 20
    16:13:09.976 INFO VariantRecalibrator - Requester pays: disabled
    16:13:09.977 INFO VariantRecalibrator - Initializing engine
    16:13:10.635 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/annotation/hapmap_3.3.hg38.vcf.gz
    16:13:11.125 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz
    16:13:11.511 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz
    16:13:11.873 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz
    16:13:12.000 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/work/387/387.g.vcf.gz
    16:13:12.248 INFO VariantRecalibrator - Done initializing engine
    16:13:12.252 INFO TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
    16:13:12.252 INFO TrainingSet - Found omni track: Known = false Training = true Truth = false Prior = Q12.0
    16:13:12.252 INFO TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0
    16:13:12.252 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
    16:13:12.260 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
    16:13:12.285 INFO ProgressMeter - Starting traversal
    16:13:12.285 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    16:13:12.746 INFO VariantRecalibrator - Shutting down engine
    [2020年11月11日 下午04时13分12秒] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.05 minutes.
    Runtime.totalMemory()=2164260864
    htsjdk.samtools.SAMFormatException: Invalid GZIP header
    at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:121)
    at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
    at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
    at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
    at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
    at htsjdk.samtools.util.BlockCompressedInputStream.seek(BlockCompressedInputStream.java:380)
    at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:427)
    at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
    at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
    at htsjdk.tribble.TabixFeatureReader$FeatureIterator.<init>(TabixFeatureReader.java:159)
    at htsjdk.tribble.TabixFeatureReader.query(TabixFeatureReader.java:133)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:567)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:536)
    at org.broadinstitute.hellbender.engine.FeatureManager.getFeatures(FeatureManager.java:352)
    at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:173)
    at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:125)
    at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:240)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantDataManager.parseTrainingSets(VariantDataManager.java:392)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.addDatum(VariantRecalibrator.java:614)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.addVariantDatum(VariantRecalibrator.java:571)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.lambda$consumeQueuedVariants$0(VariantRecalibrator.java:542)
    at java.util.ArrayList.forEach(ArrayList.java:1257)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.consumeQueuedVariants(VariantRecalibrator.java:542)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.apply(VariantRecalibrator.java:521)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:120)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:118)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)

    0
    Comment actions Permalink
  • Avatar
    zhaol

    I don't know if this information is helpful to you.

    I directly ran VQSR with the VCF outputed by HaplotypeCaller, and no errors occurred. So, I guess the error happened because of GenotypeGVCFs?

    And I reindex VCF file outputed by GenotypeGVCFs, the error remain; then I unzipped the VCF file and zipped it with bgzip, and reindex it with IndexFeatureFile, the error remain.

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    zhaol did you delete the old indexed files? 

    0
    Comment actions Permalink
  • Avatar
    zhaol

    Yes, I first delete the old index files and then reindex with IndexFeatureFile. But as I mentioned above, the error remains.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi zhaol, could you run the GATK tool PrintBGZFBlockInformation and paste the output here to find more information about the issue?

    0
    Comment actions Permalink
  • Avatar
    zhaol

    Hi Genevieve Brandt, the output is too long for this comment. So I choose a part of output-file and paste here.

    For my vcf.gz created by HaplotypeCaller:

    Block #742905 at file offset 9660186303
    - compressed size: 19375
    - uncompressed size: 65498

    Block #742906 at file offset 9660205678
    - compressed size: 20054
    - uncompressed size: 65498

    Block #742907 at file offset 9660225732
    - compressed size: 20320
    - uncompressed size: 65498

    Block #742908 at file offset 9660246052
    - compressed size: 20181
    - uncompressed size: 65498

    Block #742909 at file offset 9660266233
    - compressed size: 17637
    - uncompressed size: 65498

    Block #742910 at file offset 9660283870
    - compressed size: 17759
    - uncompressed size: 65498

    Block #742911 at file offset 9660301629
    - compressed size: 4676
    - uncompressed size: 14985

    Block #742912 at file offset 9660306305
    - compressed size: 3436
    - uncompressed size: 12779

    Block #742913 at file offset 9660309741
    - compressed size: 28
    - uncompressed size: 0

    ***************************************************************************
    Final BGZF 0-byte terminator block FOUND as expected at block number 742913
    ***************************************************************************

     

     

    For my vcf.gz created by genotype:

    Block #15747 at file offset 257598249
    - compressed size: 20430
    - uncompressed size: 65498

    Block #15748 at file offset 257618679
    - compressed size: 20369
    - uncompressed size: 65498

    Block #15749 at file offset 257639048
    - compressed size: 19019
    - uncompressed size: 65498

    Block #15750 at file offset 257658067
    - compressed size: 19407
    - uncompressed size: 65498

    Block #15751 at file offset 257677474
    - compressed size: 19218
    - uncompressed size: 65498

    Block #15752 at file offset 257696692
    - compressed size: 19834
    - uncompressed size: 65498

    Block #15753 at file offset 257716526
    - compressed size: 19648
    - uncompressed size: 65498

    Block #15754 at file offset 257736174
    - compressed size: 19754
    - uncompressed size: 65498

    Block #15755 at file offset 257755928
    - compressed size: 19961
    - uncompressed size: 65498

    Block #15756 at file offset 257775889
    - compressed size: 13389
    - uncompressed size: 43439

    Block #15757 at file offset 257789278
    - compressed size: 28
    - uncompressed size: 0

    ***************************************************************************
    Final BGZF 0-byte terminator block FOUND as expected at block number 15757
    ***************************************************************************

     

    I hope this information will help you.

    Thank you.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi zhaol,

    I spoke with one of the developers about this issue and we believe it may be coming from one of your GVCF side inputs, since VariantRecalibrator has multiple GVCF inputs (for example, /data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz etc). The stack trace does not specify which GVCF file has the GZIP header issue, so it could be any of those files. In order to solve the issue, try these steps:

    1. Re-generate the indexes for all of the .vcf.gz files that are input to VariantRecalibrator
    2. If step #1 does not solve the problem, re-run VariantRecalibrator removing each GVCF input individually and when you do not have the error message, you will be able to find which file is the issue. Then, you could download the file again and generate a new index.
    0
    Comment actions Permalink
  • Avatar
    zhaol

    Hi, Genevieve Brandt.

    I re-generate the indexes for all of the vcf.gz files that are input to VariantRecalibrator and the issue has been solved.

    Thank you and the developer very much.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Great news zhaol, glad it has been solved!

    0
    Comment actions Permalink
  • Avatar
    Aziz

    Hi Genevieve,

    I want to generate vcf files from several bam files. I got this error. Would you please let me know how to fix this problem.

    htsjdk.samtools.SAMFormatException: Invalid GZIP header

     

    Thanks,

    Aziz

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Aziz,

    Please try the steps outlined in the thread above this message and see if that solves your problem. If it does not solve the issue, please provide more details so that I know more about your issue.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk