BQSR and VQSR resource sequences
Hi,
When I am trying to do the BQSR and VQSR, both has the same error message:
The commands for BQSR:
java -jar $gatk BaseRecalibrator \
-R ./Homo_sapiens_assembly38.fasta \
-I ./germline_analysis/germline_markdups.bam \
--known-sites Homo_sapiens_assembly38.dbsnp138.vcf \
-O ./germline_analysis/recal_data.table
The commmands for VQSR:
java -jar $gatk VariantRecalibrator
-R ./data/germline/ref/ref.fasta \
-V ./germline_analysis/trio_jointcalls_hc.vcf.gz \
--resource:hapmap,known=false,training=true,truth=true,prior=15.0 \
./hapmap_3.3.hg38.vcf
--resource:1000G,known=false,training=true,truth=false,prior=12.0 \
./1000G_omni2.5.hg38.vcf.gz \
--resource:1000G,known=false,training=true,truth=false,prior=10.0 \
./high_confidence.hg38.vcf.gz \
--resource:dbsnp,known=true,training=false,truth=false,prior=2.0 \
./Homo_sapiens_assembly38.dbsnp138.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP \
-O ./germline_analysis/output.recal
--tranches-file ./germline_analysis/output.tranches
The error message is
12:40:31.578 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jingchun/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 14, 2021 12:40:32 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
12:40:32.466 INFO VariantRecalibrator - ------------------------------------------------------------
12:40:32.467 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.2.0
12:40:32.467 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
12:40:32.472 INFO VariantRecalibrator - Executing as jingchun@jingchun-VirtualBox on Linux v5.11.0-41-generic amd64
12:40:32.473 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v17.0.1+12-Ubuntu-120.04
12:40:32.478 INFO VariantRecalibrator - Start Date/Time: December 14, 2021 at 12:40:31 PM EST
12:40:32.478 INFO VariantRecalibrator - ------------------------------------------------------------
12:40:32.479 INFO VariantRecalibrator - ------------------------------------------------------------
12:40:32.480 INFO VariantRecalibrator - HTSJDK Version: 2.24.1
12:40:32.480 INFO VariantRecalibrator - Picard Version: 2.25.4
12:40:32.481 INFO VariantRecalibrator - Built for Spark Version: 2.4.5
12:40:32.481 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:40:32.481 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:40:32.482 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:40:32.482 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:40:32.482 INFO VariantRecalibrator - Deflater: IntelDeflater
12:40:32.482 INFO VariantRecalibrator - Inflater: IntelInflater
12:40:32.484 INFO VariantRecalibrator - GCS max retries/reopens: 20
12:40:32.487 INFO VariantRecalibrator - Requester pays: disabled
12:40:32.488 INFO VariantRecalibrator - Initializing engine
12:40:33.017 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jingchun/GATK/./hapmap_3.3.hg38.vcf
12:40:33.271 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jingchun/GATK/./1000G_omni2.5.hg38.vcf.gz
12:40:33.938 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jingchun/GATK/./high_confidence.hg38.vcf.gz
12:40:34.717 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jingchun/GATK/./Homo_sapiens_assembly38.dbsnp138.vcf
12:40:34.882 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jingchun/GATK/./germline_analysis/trio_jointcalls_hc.vcf.gz
12:40:34.920 WARN IntelInflater - Zero Bytes Written : 0
12:40:34.937 WARN IntelInflater - Zero Bytes Written : 0
12:40:38.909 INFO VariantRecalibrator - Shutting down engine
[December 14, 2021 at 12:40:38 PM EST] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.13 minutes.
Runtime.totalMemory()=248512512
***********************************************************************A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
reference contigs = [20]
I guess both are due to the resource sequences. I downloaded these sequences from
'genomics-public-data/resources/broad/hg38/v0'.
I want to know which sequences I should use. Another question is why it has this error so I can understand the problem better.
Thanks,
Jingchun
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.2.2.0
b) Exact command used:
c) Entire error log:
-
Hi jingchun liu,
This type of error generally comes from contigs having a different naming convention in your reference files (Chr20 vs. 20). Could you take a look at this article about this issue as well as the suggestions in this previous forum post to see if it is helpful in adjusting the naming of your contigs?
Kind regards,
Pamela
Please sign in to leave a comment.
1 comment