I have several questions related to running BQSR. I use GATK184.108.40.206, reference genome for alignment is GRCh38Decoy, and my commands are below:
Path/to/gatk BaseRecalibrator \
-R /iGenomes/Homo_sapiens/NCBI/GRCh38Decoy/Sequence/WholeGenomeFasta/genome.fa \
-I Sample.markdup.bam \
--known-sites Path/to/Know-Site/dbsnp_146.b38.vcf.gz \
--known-sites Path/to/Know-Site/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
--known-sites Path/to/Know-Site/resources_broad_hg38_v0_1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf \
a) When I run the commands listed above, I got an error: "A USER ERROR has occurred: Input files reference and features have incompatible contigs: Found contigs with the same name but different lengths:
contig reference = chr15 / 101991189
contig features = chr15 / 90338345."
and a warn: " IndexUtils - Feature file "/Input/Know-Site/dbsnp_146.hg38.vcf.gz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file"
What are the possible reasons for the error and need I deal with the warning?
b) May I know the recommended known-site reference input files for the genome reference (GRCh38Decoy) I used?
c) As the latest release from dbSNP is dbsnp153 but I only found dbsnp146 from the ftp site of GTAK resource bundle, would you provide the latest dbSNP reference?
d) For 1000G, I found "1000G_phase1.snps.high_confidence.hg38.vcf" from Broad institute google cloud platform Genomics-public-data, may I know whether there is a reference for phase3 as I think it is the latest version?
Thank you very much in advance!
Please sign in to leave a comment.