Difference in contigs between hg19_v0_Homo_sapiens_assembly19.fasta and hg19_v0_1000G_omni2.5.b37.vcf.gz and Mills_and_1000G_gold_standard.indels.hg19.sites.vcf files?
Answereda) GATK version used: v4.2.5
b) Exact command used: BaseRecalibrator
I am receiving an error for incompatible contigs. I think both files use for known variants, hg19_v0_1000G_omni2.5.b37.vcf.gz and Mills_and_1000G_gold_standard.indels.hg19.sites.vcf, are incompatible with the reference fasta file. Is my observation on the incompatibility of both files correct? How can I fix it? Thank you.
Files source: gcp bundle
c) Entire program log:
ref='./reference_genome/hg19_v0_Homo_sapiens_assembly19.fasta'
ref_snps='./reference_genome/hg19_v0_1000G_omni2.5.b37.vcf.gz'
ref_indels='./reference_genome/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf'
sample="./data/exome_alignment/*.bam"
sample_name=`echo $sample | grep -P 'HG(\d+)(?=.chrom)' -o`
bam_marked_dup="./data/exome_alignment/$sample_name.marked_duplicates.bam"
bam_marked_dup_sorted="./data/exome_alignment/$sample_name.marked_dup_sorted.bam"
recal_table="./data/exome_alignment/$sample_name.recal_data.table"
# Mark Duplicates
./gatk MarkDuplicates \
-I $sample \
-O $bam_marked_dup \
-M "$d/exome_alignment/$sample_name.marked_dup_metrics.txt" \
./gatk SortSam \
-I $bam_marked_dup \
-O $bam_marked_dup_sorted \
-SO coordinate \
# Base Quality Score Recalibration
./gatk BaseRecalibrator \
-I $bam_marked_dup \
-R $ref \
--known-sites $ref_indels \
--known-sites $ref_snps \
-O $recal_table \
Error:
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /home/user/gatk_project/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/user/gatk_project/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar BaseRecalibrator -I ./data/HG00337/exome_alignment/HG00337.marked_duplicates.bam -R ./reference_genome/hg19_v0_Homo_sapiens_assembly19.fasta --known-sites ./reference_genome/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf --known-sites ./reference_genome/hg19_v0_1000G_omni2.5.b37.vcf.gz -O ./data/HG00337/exome_alignment/HG00337.recal_data.table
11:14:03.594 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/user/gatk_project/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 09, 2022 11:14:03 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:14:03.744 INFO BaseRecalibrator - ------------------------------------------------------------
11:14:03.744 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.5.0
11:14:03.744 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
11:14:03.744 INFO BaseRecalibrator - Executing as user@uoa-bi-training-2021-2022-5 on Linux v5.4.0-90-generic amd64
11:14:03.744 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v11.0.13+8-Ubuntu-0ubuntu1.20.04
11:14:03.745 INFO BaseRecalibrator - Start Date/Time: February 9, 2022 at 11:14:03 AM UTC
11:14:03.745 INFO BaseRecalibrator - ------------------------------------------------------------
11:14:03.745 INFO BaseRecalibrator - ------------------------------------------------------------
11:14:03.746 INFO BaseRecalibrator - HTSJDK Version: 2.24.1
11:14:03.746 INFO BaseRecalibrator - Picard Version: 2.25.4
11:14:03.746 INFO BaseRecalibrator - Built for Spark Version: 2.4.5
11:14:03.746 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:14:03.746 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:14:03.746 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:14:03.746 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:14:03.746 INFO BaseRecalibrator - Deflater: IntelDeflater
11:14:03.746 INFO BaseRecalibrator - Inflater: IntelInflater
11:14:03.746 INFO BaseRecalibrator - GCS max retries/reopens: 20
11:14:03.746 INFO BaseRecalibrator - Requester pays: disabled
11:14:03.746 INFO BaseRecalibrator - Initializing engine
11:14:04.027 INFO FeatureManager - Using codec VCFCodec to read file file:///home/user/gatk_project/gatk-4.2.5.0/./reference_genome/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
11:14:04.042 INFO FeatureManager - Using codec VCFCodec to read file file:///home/user/gatk_project/gatk-4.2.5.0/./reference_genome/hg19_v0_1000G_omni2.5.b37.vcf.gz
11:14:04.174 INFO BaseRecalibrator - Shutting down engine
[February 9, 2022 at 11:14:04 AM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1193279488
***********************************************************************
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1, NC_007605]
features contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
-
Hi Rea Kalampaliki,
Yes you're correct here. You'll need to find a different $ref_snps file that matches your hg19 reference. You can read more about the reference versions in our article here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890951-Human-genome-reference-builds-GRCh38-or-hg38-b37-hg19
There are a lot of forum posts on here discussing which files to use, so I would recommend searching around to see what other users are doing. All of our resources are outlined in this resource bundle article: https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle
Best,
Genevieve
-
Genevieve Brandt (she/her) thank you very much,
I face no contigs incompatability issues, when I set:
$ref_indels: hg19_v0_Mills_and_1000G_gold_standard.indels.b37.sites.vcf$ref_snps: file comes after using the UpdateVCFSequenceDictionary tool on the hg19_v0_Homo_sapiens_assembly19.dbsnp.vcf
-
Great! Glad you solved the issue, thanks for posting your solution!
Please sign in to leave a comment.
3 comments