"htsjdk.samtools.SAMFormatException: Invalid GZIP header" in VariantRecalibrator
AnsweredI am getting this error when I running VariantRecalibrator(my gatk version is 4.1.8.0):
htsjdk.samtools.SAMFormatException: Invalid GZIP header
the version of java is "1.8.0_151"
and I unzip and compress my input vcf file again, but this error remains
for ensure my input vcf is not malformed, I run ValidateVariants with only format and get Traversal complete. Processed 9596403 total variants in 2.7 minutes. So I guess my input vcf is not malformed.
I try to run ValidateVariants with VCF format and all strict validations. I get this: Input 101.hc.vcf.gz fails strict validation of type ALL: one or more of the ALT allele(s) for the record at position chr1:******(* is used instead of a specific number) are not observed at all in the sample genotypes.
So, that is the reason why my gzip header is invalid?
If not, how can I do to solve it?
Thanks!
-
Hello zhaol, thank you for running ValidateVariants already!
I am wondering if this is a samtools index problem, could you try to view the file with samtools view then samtools index?
Also, the file that is named test1.gz, you may want to give a proper extension, for example test1.vcf.gz
-
sorry, I am not quite understand. "view the file with samtools view then samtools index" means I need run the step about convert sam to bam and index bam?
And the extension may be not affecting the result because the error remains when I use a proper extension. But I have not used the proper extension is a big problem for me, thanks for your reminder.
Finally, I used just six samples to run my VariantRecalibrator to test my workflow, making sure I didn't find errors in the workflow after all the examples generated VCF files. I don't know if this information is helpful to you?
Thank for your reply!
-
Hi zhaol, I apologize for the confusion. I have some more questions:
- Is this the entire stack trace or is there more information? Could you post the stack trace in text?
- Did you run ValidateVariants with both the zipped and unzipped versions?
- Once everything is properly named and you have confirmed that the files do not have issues, re-index the file with IndexFeatureFile so that gatk is not using a different index that is remaining from a previous step. Make sure to delete the old index files.
-
Using GATK jar /home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar VariantRecalibrator -R /data2/zhaol/ref/hg38.fasta -V /data1/zhaol/work/pca_test/f1_result/f1.HC.vcf.gz --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 /data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 /data2/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /data2/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz -an QD -an DP -an MQankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -O /data1/zhaol/work/pca_test/f1_result/output.recal --tranches-file /data1/zhaol/work/pca_test/f1_result/output.tranches --rscript-file /data1/zhaol/work/pca_test/f1_result/output.plots.R
08:12:35.946 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Oct 11, 2020 8:12:46 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:12:46.767 INFO VariantRecalibrator - ------------------------------------------------------------
08:12:46.767 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.8.0
08:12:46.767 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
08:12:51.781 INFO VariantRecalibrator - Executing as zhaol@node01 on Linux v3.10.0-693.5.2.el7.x86_64 amd64
08:12:51.781 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_151-b12
08:12:51.782 INFO VariantRecalibrator - Start Date/Time: October 11, 2020 8:12:35 AM EDT
08:12:51.782 INFO VariantRecalibrator - ------------------------------------------------------------
08:12:51.782 INFO VariantRecalibrator - ------------------------------------------------------------
08:12:51.782 INFO VariantRecalibrator - HTSJDK Version: 2.22.0
08:12:51.782 INFO VariantRecalibrator - Picard Version: 2.22.8
08:12:51.782 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:12:51.783 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:12:51.783 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:12:51.783 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:12:51.783 INFO VariantRecalibrator - Deflater: IntelDeflater
08:12:51.783 INFO VariantRecalibrator - Inflater: IntelInflater
08:12:51.783 INFO VariantRecalibrator - GCS max retries/reopens: 20
08:12:51.783 INFO VariantRecalibrator - Requester pays: disabled
08:12:51.783 INFO VariantRecalibrator - Initializing engine
08:12:52.508 INFO FeatureManager - Using codec VCFCodec to read file file:///data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz
08:12:52.893 INFO FeatureManager - Using codec VCFCodec to read file file:///data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz
08:12:53.154 INFO FeatureManager - Using codec VCFCodec to read file file:///data2/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz
08:12:53.377 INFO FeatureManager - Using codec VCFCodec to read file file:///data2/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz
08:12:53.675 INFO FeatureManager - Using codec VCFCodec to read file file:///data1/zhaol/work/pca_test/f1_result/f1.HC.vcf.gz
08:12:54.327 INFO VariantRecalibrator - Done initializing engine
08:12:54.353 INFO TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
08:12:54.353 INFO TrainingSet - Found omni track: Known = false Training = true Truth = false Prior = Q12.0
08:12:54.353 INFO TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0
08:12:54.354 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
08:12:54.379 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
08:12:54.442 INFO ProgressMeter - Starting traversal
08:12:54.442 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
08:12:55.778 INFO VariantRecalibrator - Shutting down engine
[October 11, 2020 8:12:55 AM EDT] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.33 minutes.
Runtime.totalMemory()=3858759680
htsjdk.samtools.SAMFormatException: Invalid GZIP header
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:121)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.seek(BlockCompressedInputStream.java:380)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:427)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.<init>(TabixFeatureReader.java:159)
at htsjdk.tribble.TabixFeatureReader.query(TabixFeatureReader.java:133)
at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:567)
at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:536)
at org.broadinstitute.hellbender.engine.FeatureManager.getFeatures(FeatureManager.java:352)
at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:173)at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:125)
at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:240)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantDataManager.parseTrainingSets(VariantDataManager.java:392)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.addDatum(VariantRecalibrator.java:614)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.addVariantDatum(VariantRecalibrator.java:571)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.lambda$consumeQueuedVariants$0(VariantRecalibrator.java:542)
at java.util.ArrayList.forEach(ArrayList.java:1255)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.consumeQueuedVariants(VariantRecalibrator.java:542)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.apply(VariantRecalibrator.java:521)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:120)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:118)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)This is the result of one of my runs of the VariantRecalibrator, so it differs from the picture. But the error message is the same, except for the name of the file.
I've only been running ValidateVariants with zipped file and I will try to run run ValidateVariants with unzipped file. In addition, I will try to reindex the file with IndexFeatureFile.
Thanks!
-
when I run ValidateVariants with both the zipped and unzipped versions, the result both are A USER ERROR has occurred: Input f1.hc.g.vcf.genotyped.gz fails strict validation of type ALL: one or more of the ALT allele(s) for the record at position chr3:138889149 are not observed at all in the sample genotypes
and I then reindex the input file for VariantRecalibrator, but the error is remains: Invalid GZIP header.
May I run after HaplotypeCaller with the parameter -L and GatherVcfs the vcf files, I index the gathered vcf with tabix?
-
Hi, Genevieve Brandt.
Thank you for your help. I just solved the error. As you mentioned above, I have some problems with my index because I use tabix to index my VCF files from the HaplotypeCaller, which should be indexed by the IndexFeatureFile.
Thank you very much!
-
Hi zhaol, thank you for the update, and glad you fixed the problem! Thank you as well for posting the solution for other users.
-
Hi Genevieve Brandt!
When I rerun this workflow, I encountered a similar error --htsjdk.samtools.SAMFormatException: Invalid GZIP header(still occur when running VariantRecalibrator). I remember I solved the error a month ago, but this time I still got the error. Maybe I haven't solve the error?
- I ran ValidateVariants to validate zipped and unzipped VCF file which was output from HaplotypeCaller and GenotypeGVCFs, and they did not report an error.
- I ran IndexFeatureFile to index the VCF files which was output from HaplotypeCaller and GenotypeGVCFs, and still got the error--Invalid GZIP header。
my stack trace is
[zhaol@node04 387]$ gatk VariantRecalibrator -R /home/zhaol/ref/hg38.fasta -V 387.g.vcf.gz --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /home/zhaol/annotation/hapmap_3.3.hg38.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 /home/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 /home/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /home/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz -an QD -an DP -an MQankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -O output.recal --tranches-file output.tranches --rscript-file output.plots.R
Using GATK jar /home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar VariantRecalibrator -R /home/zhaol/ref/hg38.fasta -V 387.g.vcf.gz --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /home/zhaol/annotation/hapmap_3.3.hg38.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 /home/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 /home/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /home/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz -an QD -an DP -an MQankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -O output.recal --tranches-file output.tranches --rscript-file output.plots.R
16:13:09.591 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/zhaol/biosoft/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 11, 2020 4:13:09 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:13:09.975 INFO VariantRecalibrator - ------------------------------------------------------------
16:13:09.975 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.8.0
16:13:09.975 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
16:13:09.975 INFO VariantRecalibrator - Executing as zhaol@node04 on Linux v3.10.0-862.el7.x86_64 amd64
16:13:09.975 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-b04
16:13:09.975 INFO VariantRecalibrator - Start Date/Time: 2020年11月11日 下午04时13分09秒
16:13:09.975 INFO VariantRecalibrator - ------------------------------------------------------------
16:13:09.975 INFO VariantRecalibrator - ------------------------------------------------------------
16:13:09.976 INFO VariantRecalibrator - HTSJDK Version: 2.22.0
16:13:09.976 INFO VariantRecalibrator - Picard Version: 2.22.8
16:13:09.976 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:13:09.976 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:13:09.976 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:13:09.976 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:13:09.976 INFO VariantRecalibrator - Deflater: IntelDeflater
16:13:09.976 INFO VariantRecalibrator - Inflater: IntelInflater
16:13:09.976 INFO VariantRecalibrator - GCS max retries/reopens: 20
16:13:09.976 INFO VariantRecalibrator - Requester pays: disabled
16:13:09.977 INFO VariantRecalibrator - Initializing engine
16:13:10.635 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/annotation/hapmap_3.3.hg38.vcf.gz
16:13:11.125 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz
16:13:11.511 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/annotation/1000G_phase1.snps.high_confidence.hg38.vcf.gz
16:13:11.873 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/annotation/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz
16:13:12.000 INFO FeatureManager - Using codec VCFCodec to read file file:///home/zhaol/work/387/387.g.vcf.gz
16:13:12.248 INFO VariantRecalibrator - Done initializing engine
16:13:12.252 INFO TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
16:13:12.252 INFO TrainingSet - Found omni track: Known = false Training = true Truth = false Prior = Q12.0
16:13:12.252 INFO TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0
16:13:12.252 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
16:13:12.260 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
16:13:12.285 INFO ProgressMeter - Starting traversal
16:13:12.285 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:13:12.746 INFO VariantRecalibrator - Shutting down engine
[2020年11月11日 下午04时13分12秒] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=2164260864
htsjdk.samtools.SAMFormatException: Invalid GZIP header
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:121)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.seek(BlockCompressedInputStream.java:380)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:427)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.<init>(TabixFeatureReader.java:159)
at htsjdk.tribble.TabixFeatureReader.query(TabixFeatureReader.java:133)
at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:567)
at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:536)
at org.broadinstitute.hellbender.engine.FeatureManager.getFeatures(FeatureManager.java:352)
at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:173)
at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:125)
at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:240)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantDataManager.parseTrainingSets(VariantDataManager.java:392)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.addDatum(VariantRecalibrator.java:614)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.addVariantDatum(VariantRecalibrator.java:571)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.lambda$consumeQueuedVariants$0(VariantRecalibrator.java:542)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.consumeQueuedVariants(VariantRecalibrator.java:542)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.apply(VariantRecalibrator.java:521)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:120)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:118)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292) -
I don't know if this information is helpful to you.
I directly ran VQSR with the VCF outputed by HaplotypeCaller, and no errors occurred. So, I guess the error happened because of GenotypeGVCFs?
And I reindex VCF file outputed by GenotypeGVCFs, the error remain; then I unzipped the VCF file and zipped it with bgzip, and reindex it with IndexFeatureFile, the error remain.
Thank you!
-
zhaol did you delete the old indexed files?
-
Yes, I first delete the old index files and then reindex with IndexFeatureFile. But as I mentioned above, the error remains.
-
Hi zhaol, could you run the GATK tool PrintBGZFBlockInformation and paste the output here to find more information about the issue?
-
Hi Genevieve Brandt, the output is too long for this comment. So I choose a part of output-file and paste here.
For my vcf.gz created by HaplotypeCaller:
Block #742905 at file offset 9660186303
- compressed size: 19375
- uncompressed size: 65498Block #742906 at file offset 9660205678
- compressed size: 20054
- uncompressed size: 65498Block #742907 at file offset 9660225732
- compressed size: 20320
- uncompressed size: 65498Block #742908 at file offset 9660246052
- compressed size: 20181
- uncompressed size: 65498Block #742909 at file offset 9660266233
- compressed size: 17637
- uncompressed size: 65498Block #742910 at file offset 9660283870
- compressed size: 17759
- uncompressed size: 65498Block #742911 at file offset 9660301629
- compressed size: 4676
- uncompressed size: 14985Block #742912 at file offset 9660306305
- compressed size: 3436
- uncompressed size: 12779Block #742913 at file offset 9660309741
- compressed size: 28
- uncompressed size: 0***************************************************************************
Final BGZF 0-byte terminator block FOUND as expected at block number 742913
***************************************************************************For my vcf.gz created by genotype:
Block #15747 at file offset 257598249
- compressed size: 20430
- uncompressed size: 65498Block #15748 at file offset 257618679
- compressed size: 20369
- uncompressed size: 65498Block #15749 at file offset 257639048
- compressed size: 19019
- uncompressed size: 65498Block #15750 at file offset 257658067
- compressed size: 19407
- uncompressed size: 65498Block #15751 at file offset 257677474
- compressed size: 19218
- uncompressed size: 65498Block #15752 at file offset 257696692
- compressed size: 19834
- uncompressed size: 65498Block #15753 at file offset 257716526
- compressed size: 19648
- uncompressed size: 65498Block #15754 at file offset 257736174
- compressed size: 19754
- uncompressed size: 65498Block #15755 at file offset 257755928
- compressed size: 19961
- uncompressed size: 65498Block #15756 at file offset 257775889
- compressed size: 13389
- uncompressed size: 43439Block #15757 at file offset 257789278
- compressed size: 28
- uncompressed size: 0***************************************************************************
Final BGZF 0-byte terminator block FOUND as expected at block number 15757
***************************************************************************I hope this information will help you.
Thank you.
-
Hi zhaol,
I spoke with one of the developers about this issue and we believe it may be coming from one of your GVCF side inputs, since VariantRecalibrator has multiple GVCF inputs (for example, /data2/zhaol/annotation/1000G_omni2.5.hg38.vcf.gz etc). The stack trace does not specify which GVCF file has the GZIP header issue, so it could be any of those files. In order to solve the issue, try these steps:
- Re-generate the indexes for all of the .vcf.gz files that are input to VariantRecalibrator
- If step #1 does not solve the problem, re-run VariantRecalibrator removing each GVCF input individually and when you do not have the error message, you will be able to find which file is the issue. Then, you could download the file again and generate a new index.
-
Hi, Genevieve Brandt.
I re-generate the indexes for all of the vcf.gz files that are input to VariantRecalibrator and the issue has been solved.
Thank you and the developer very much.
-
Great news zhaol, glad it has been solved!
-
Hi Genevieve,
I want to generate vcf files from several bam files. I got this error. Would you please let me know how to fix this problem.
htsjdk.samtools.SAMFormatException: Invalid GZIP header
Thanks,
Aziz
-
Hi Aziz,
Please try the steps outlined in the thread above this message and see if that solves your problem. If it does not solve the issue, please provide more details so that I know more about your issue.
Best,
Genevieve
Please sign in to leave a comment.
18 comments