Base Quality Score Recalibration doubt
REQUIRED for all errors and issues:
a) GATK version used:
b) Exact command used:
c) Entire program log:
I have 35 samples from different genotypes and some from different sequencing batches, so my question when running this command
gatk BaseRecalibrator \
-R ref.fa \
-I sorted_dedup_reads.bam \
--known-sites bqsr_snps.vcf \
--known-sites bqsr_indels.vcf \
-O recal_data.table
is when creating the recal data table, do I use for each sample all the --known-sites of all samples, or just corresponding to that sample?
Thank you for your attention,
Paulo
-
Known sites file must be a variant file which contains representative variants for the species that you are working on. If this is from a human sample our resource bundle contains known sites for BQSR and VQSR and some other tools that require it. If you are working with a non-model organism we recommend performing bootstrapping for BQSR which you may need to collect high quality variant sites from many of the samples and use that collection of variants for all your bam files to perform multiple levels of BQSR and variant calling steps until reaching a convergence for basecalling scores and variant calls. If you only use the variants to perform BQSR on the same sample it will not result in a proper convergence and each of your samples will produce a different level of recalibration as a result of self-recalibration.
I hope this helps.
-
hi Gökalp Çelik
And to detect the somatic variants, how do I perform bootstrapping? Do I first call the variants with haplotype caller or mutect2? Because I called with mutect2 and tried to perform hard-filtering to get the most truthful variants. However it didn't worked using these parameters, since I had no QD, etc. when running on Tumor only. So my question is do I calibrate the bam files with the variants called with haplotype caller and only then I run Mutect2 on the calibrated bam files to obtain the somatic variants
-V snps.vcf.gz \ -filter "QD < 2.0" --filter-name "QD2" \ -filter "QUAL < 30.0" --filter-name "QUAL30" \ -filter "SOR > 3.0" --filter-name "SOR3" \ -filter "FS > 60.0" --filter-name "FS60" \ -filter "MQ < 40.0" --filter-name "MQ40" \ -filter "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" \ -filter "ReadPosRankSum < -8.0" --filter-name "ReadPosRankSum-8" \ -O snps_filtered.vcf.gz
-
Hi again.
Zygosity of your variants is not important. The only important thing is to get high quality germline sites. You can use HaplotypeCaller for that.
-
Thank you very much for your help!
Please sign in to leave a comment.
4 comments