Quality Drop in BAM Files After BaseRecalibrator
I've noticed an issue in my Nextflow pipeline, and I'm hoping to get some insights from the community. After running FastQC on a BAM file generated by the splitNCigarReads process, the quality metrics look good. However, when I run FastQC again after the baseRecalibrator process, the quality seems to drop significantly.
Does anyone know why this might be happening? Is this an expected outcome of the recalibration process, or could there be an issue with how I'm applying BQSR? I'd really appreciate any advice or suggestions on how to interpret this change in quality or how to troubleshoot it.
Thanks in advance for your help!
REQUIRED for all errors and issues:
a) GATK version used: gatk4-4.5.0.0
b) Exact command used:
gatk BaseRecalibrator \
-R ${params.ref_genome} \
-I ${bam} \
--known-sites ${params.dbsnp} \
--known-sites ${params.known_indels} \
--known-sites ${params.gold_standard_indels} \
-O ${sample_id}.recal_data.table
gatk ApplyBQSR \
-R ${params.ref_genome} \
-I ${bam} \
--bqsr-recal-file ${recal_table} \
-O ${sample_id}.recal.bam
c) Entire program log:
Base quality BEFORE Basecalibration and after splitNCigarReads
Base quality AFTER Basecalibration
-
Hi Rez1
Purpose of the base recalibrator is not to increase the basecalling qualities but to bring them closer to the empirical values. You may be able to check the result of recalibration using AnalyzeCovariates tool. Observed values should be close to the expected empirical values.
You maybe able to check our archived documentation on how BQSR works.
Current Illumina devices bin basecalling qualities to 4 bins which may sometimes over estimate the actual quality of a base value therefore the result you observe here is quite what can be expected of BQSR.
I hope this helps.
Regards.
Please sign in to leave a comment.
1 comment