Genome Analysis Toolkit

Niranjan Shirgaonkar · May 23, 2020 17:03

Hello

I have been struggling with GATK 4 and feeling an acute lack of the command line pipelines available in previous versions. I took a look at the Tool Index and tried to make out the commands available from the https://github.com/gatk-workflows/gatk4-rnaseq-germline-snps-indels/blob/master/gatk4-rna-best-practices.wdl section

I have RNASeq data and I have used a GrCh38.p13 genomic reference not in the GATK resource bucket and currently I have processed around 600 samples with this same reference and followed till the Split n Cigar step with no problem

In the BaseRecalibrator step I get the following error :

a) GATK version

The Genome Analysis Toolkit (GATK) v4.1.7.0
HTSJDK Version: 2.21.2
Picard Version: 2.21.9

b) Exact GATK commands used

GATK BaseRecalibrator
-I input.bam
-R reference.fasta
--known-sites resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf
--known-sites resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf
--known-sites resources_broad_hg38_v0_Homo_sapiens_assembly38.known_indels.vcf
--known-sites resources_broad_hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf
-O recal_data.table

c) The entire error log if applicable.

A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
reference contigs = [NC_000001.11, NT_187361.1, NT_187362.1, NT_187363.1, NT_187364.1, NT_187365.1, NT_187366.1, NT_187367.1, NT_187368.1, NT_187369.1, NC_000002.12, NT_187370.1, NT_187371.1, NC_000003.12, NT_167215.1, NC_000004.12, NT_113793.3

I understand that the Reference is not matching to the dbSNP features

features contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM, chr1_KI270706v1_random, chr1_KI270707v1_random, chr1_KI270708v1_random, chr1_KI270709v1_random, chr1_KI270710v1_random, .....

The dbSNP files were downloaded from the Resource Bundle.Since re-mapping at this stage is not preferable I am feeling hopeless is there a way out of this?

Please help. Thank you very much

Genome Analysis Toolkit

Need Help?

Community Forum

BaseRecalibrator SNP databases

2 comments

Welcome

Didn't find what you were looking for?

Quick Links

Recent GATK News

About the GATK community