BaseRecalibrator raised htsjdk.tribble.TribbleException$MalformedFeatureFile
a) GATK version used:
gatk-4.1.6.0
b) Exact GATK commands used :
/export/servers/tools/gatk-4.1.6.0/gatk BaseRecalibrator -R /export/servers/wenhao/gene_test/BRCA/input/fasta/GRCh38_latest_genomic.fna -I /export/servers/wenhao/gene_test/BRCA/output/BRCA_test.sorted.markdup.bam --known-sites /export/servers/data/gene/gatk_bundle/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf --known-sites /export/servers/data/gene/gatk_bundle/resources_broad_hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf -O /export/servers/wenhao/gene_test/BRCA/output/BRCA_test.sorted.markdup.recal_data.table
c) The entire error log:
Using GATK jar /export/servers/tools/gatk-4.1.6.0/gatk-package-4.1.6.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /export/servers/tools/gatk-4.1.6.0/gatk-package-4.1.6.0-local.jar BaseRecalibrator -R /export/servers/wenhao/gene_test/BRCA/input/fasta/GRCh38_latest_genomic.fna -I /export/servers/wenhao/gene_test/BRCA/output/BRCA_test.sorted.markdup.bam --known-sites /export/servers/data/gene/gatk_bundle/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf --known-sites /export/servers/data/gene/gatk_bundle/resources_broad_hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf -O /export/servers/wenhao/gene_test/BRCA/output/BRCA_test.sorted.markdup.recal_data.table
15:49:28.631 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/export/servers/tools/gatk-4.1.6.0/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 28, 2020 3:49:28 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:49:28.821 INFO BaseRecalibrator - ------------------------------------------------------------
15:49:28.822 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.6.0
15:49:28.822 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
15:49:28.822 INFO BaseRecalibrator - Executing as root@A01-R02-I159-24-2JY3F22.JD.LOCAL on Linux v3.10.0-327.28.3.el7.x86_64 amd64
15:49:28.822 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-b08
15:49:28.822 INFO BaseRecalibrator - Start Date/Time: 2020年3月28日 下午03时49分28秒
15:49:28.823 INFO BaseRecalibrator - ------------------------------------------------------------
15:49:28.823 INFO BaseRecalibrator - ------------------------------------------------------------
15:49:28.823 INFO BaseRecalibrator - HTSJDK Version: 2.21.2
15:49:28.823 INFO BaseRecalibrator - Picard Version: 2.21.9
15:49:28.823 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:49:28.823 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:49:28.824 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:49:28.824 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:49:28.824 INFO BaseRecalibrator - Deflater: IntelDeflater
15:49:28.824 INFO BaseRecalibrator - Inflater: IntelInflater
15:49:28.824 INFO BaseRecalibrator - GCS max retries/reopens: 20
15:49:28.824 INFO BaseRecalibrator - Requester pays: disabled
15:49:28.824 INFO BaseRecalibrator - Initializing engine
15:49:29.218 INFO FeatureManager - Using codec VCFCodec to read file file:///export/servers/data/gene/gatk_bundle/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf
15:49:29.230 INFO BaseRecalibrator - Shutting down engine
[2020年3月28日 下午03时49分29秒] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1726480384
org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path /export/servers/data/gene/gatk_bundle/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:383)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:335)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:282)
at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:247)
at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:210)
at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:157)
at org.broadinstitute.hellbender.engine.ReadWalker.initializeFeatures(ReadWalker.java:68)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:706)
at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:50)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /export/servers/data/gene/gatk_bundle/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:102)
at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:127)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:380)
... 14 more
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:115)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:261)
... 18 more
-
I've found the problem myself.
The .vcf.gz files I downloaded (using my laptop, win10, chrome80) from https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/ "become" .vcf files.
-
HI wenh06,
Thank you for posting your solution for the benefit of the community!
-
Hi All, I have my issue is that when I use the BaseRecalibrator, I get this error message
resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf must support random access to enable queries by interval. If it's a file, please index it using the bundled tool IndexFeatureFile
I also noticed that the said file does not come with .idx file but rather tbi . Could that be the reason why"
-
Please start a new post in a new thread with exact command you are using, the version info and the entire error log.
-
You have to gzip the VCF file (if not already done, if it is unzipped, then why?) and use
tabix <file.vcf.gz>
or
bcftools index --tbi <filename>
The tbi index is perfectly fine. You may download it from the resource bundle along with the vcf.gz files.
-
I had the same error and I fixed it by downloading files for --known-sites from ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/
-
Thank you for posting your solution Mohd Khairul Nizam Mohd Khalid!
-
wenh06 can you be more specific?
I also downloaded from ttps://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/,
You said they became .vcf files, but they should be .vcf files, no? How did you made it work?
Thanks in advance!
-
Felipe Padilla what worked for me was to unzip the files
-
I have same error using GATK4.1.9.0 docker in HPC.
I did not understand previous comments completely. Please let me make it sure.
My reference directory has:
hg38.fasta with index, dic files
hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf
hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbiUsed command is like this:
gatk BaseRecalibrator \
-I Sample1.bam \
-R hg38.fasta \
--known-sites hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf \
-O Sample1.recal_data.tableShould I make .vcf.tbi file? (unzip),
Or, should I make .vcf.gz (paired with .vcf.gz.tib)? -
Hi Ashi, these users have reported that this vcf file (hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf) is actually zipped. Change the name to hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz and see if that works for you.
-
Hi,
Thank you for clarification.
Now, it works!!
Please sign in to leave a comment.
12 comments