I do not understand why the #CHROM line is not seen. After reading previous error discussions, I have ensured that every header line before the #CHROM line has 2 pound symbols ##, and I have tried the command with vcf files gzipped and unzipped.
REQUIRED for all errors and issues:
a) GATK version used: 4.1.9.0
b) Exact command used:
gatk --java-options "-Xmx4g -Xms4g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" \
GenomicsDBImport \
--genomicsdb-workspace-path my_database/ \
--batch-size 60 \
--sample-name-map bamDatabase.sample_map \
--reader-threads 5 \
--exclude-intervals "../exclusions.bed"
c) Entire program log:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/scratch/local/55759398
11:54:17.907 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 23, 2023 11:54:19 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:54:19.943 INFO GenomicsDBImport - ------------------------------------------------------------
11:54:19.943 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
11:54:19.943 INFO GenomicsDBImport - Executing as {redacted}@c0702a-s3.ufhpc on Linux v3.10.0-1160.80.1.el7.x86_64 amd64
11:54:19.943 INFO GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_31-b13
11:54:19.944 INFO GenomicsDBImport - Start Date/Time: January 23, 2023 11:54:17 AM EST
11:54:19.944 INFO GenomicsDBImport - ------------------------------------------------------------
11:54:19.944 INFO GenomicsDBImport - ------------------------------------------------------------
11:54:19.944 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
11:54:19.944 INFO GenomicsDBImport - Picard Version: 2.23.3
11:54:19.944 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:54:19.944 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:54:19.944 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:54:19.944 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:54:19.944 INFO GenomicsDBImport - Deflater: IntelDeflater
11:54:19.944 INFO GenomicsDBImport - Inflater: IntelInflater
11:54:19.944 INFO GenomicsDBImport - GCS max retries/reopens: 20
11:54:19.944 INFO GenomicsDBImport - Requester pays: disabled
11:54:19.944 INFO GenomicsDBImport - Initializing engine
11:54:20.048 INFO GenomicsDBImport - Shutting down engine
[January 23, 2023 11:54:20 AM EST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=4116185088
***********************************************************************
A USER ERROR has occurred: Failed to create reader from file:///{redacted}/nextflowscripts/final/test/272b_S94_L002-aln-pe-sorted-marked.bam
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException: Failed to create reader from file:///blue/nseraphin/zubayrahmad/nextflowscripts/final/test/272b_S94_L002-aln-pe-sorted-marked.bam
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:880)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getHeaderFromPath(GenomicsDBImport.java:521)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.initializeHeaderAndSampleMappings(GenomicsDBImport.java:489)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onStartup(GenomicsDBImport.java:420)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: file:///blue/nseraphin/zubayrahmad/nextflowscripts/final/test/272b_S94_L002-aln-pe-sorted-marked.bam
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:102)
at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:127)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:833)
... 9 more
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:115)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:261)
... 13 more
Using GATK jar /apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -Xms4g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path my_database/ --batch-size 60 --sample-name-map bamDatabase.sample_map --reader-threads 5 --exclude-intervals ../exclusions.bed
VCF file in question
##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this locat
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order list
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the a
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as define
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --emit-ref-confidence GVCF --output 272b_S94
##GVCFBlock0-1=minGQ=0(inclusive),maxGQ=1(exclusive)
##GVCFBlock1-2=minGQ=1(inclusive),maxGQ=2(exclusive)
##GVCFBlock10-11=minGQ=10(inclusive),maxGQ=11(exclusive)
##GVCFBlock11-12=minGQ=11(inclusive),maxGQ=12(exclusive)
##GVCFBlock12-13=minGQ=12(inclusive),maxGQ=13(exclusive)
##GVCFBlock13-14=minGQ=13(inclusive),maxGQ=14(exclusive)
##GVCFBlock14-15=minGQ=14(inclusive),maxGQ=15(exclusive)
##GVCFBlock15-16=minGQ=15(inclusive),maxGQ=16(exclusive)
##GVCFBlock16-17=minGQ=16(inclusive),maxGQ=17(exclusive)
..
##GVCFBlock9-10=minGQ=9(inclusive),maxGQ=10(exclusive)
##GVCFBlock90-99=minGQ=90(inclusive),maxGQ=99(exclusive)
##GVCFBlock99-100=minGQ=99(inclusive),maxGQ=100(exclusive)
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref bas
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygos
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genoty
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read m
##INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of squared MQ and total depth) for impr
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref r
##contig=<ID=NC_000962.3,length=4411532>
##source=HaplotypeCaller
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SM_272b_S94_L002
NC_000962.3 1 . T <NON_REF> . . END=29 GT:DP:GQ:MIN_DP:PL 0/0:59:
NC_000962.3 30 . C G,<NON_REF> 0 . BaseQRankSum=-1.054;DP=21;ExcessHet=3.0
2 comments