options used during base recalibration
Dear GATK team,
In the generic data preprocessing pipeline that you published on github you use the --use-original-qualities option in both the BaseRecalibrator and the ApplyBQSR command.
I'm confused about this option:
either it means: use the original qualities for recalibration and then I don't understand what else you would use
or it means: use the original qualities after recalibration and then I don't see why you do the recalibration
In ApplyBQSR I also do not understand the --static-quantized-quals option.
The documentation states: Use static quantized quality scores to a given number of levels.
I really don't understand what it stands for nor how to determine its values: 10, 20, 30 in the generic pipeline
Kind regards,
Janick
Can you please provide
a) GATK version used: GATK 4.1.4
b) Exact GATK commands used
gatk BaseRecalibrator -I bwa_mappings/HG001_chr22_mrkdup_srt_22only.bam \
-R reference/Homo_sapiens_assembly38_chr22.fa \
--known-sites reference/dbsnp_138.hg38.vcf.gz \
--known-sites reference/Homo_sapiens_assembly38.known_indels.vcf.gz \
-O gatk_preprocessing/recal_data.table \
--use-original-qualities
gatk ApplyBQSR -I bwa_mappings/HG001_chr22_mrkdup_srt_22only.bam \
-R reference/Homo_sapiens_assembly38_chr22.fa \
-bqsr gatk_preprocessing/recal_data.table \
-O gatk_preprocessing/HG001_chr22_mrkdup_srt_recal.bam \
--add-output-sam-program-record \
--create-output-bam-md5 \
--use-original-qualities \
--static-quantized-quals 10 \
--static-quantized-quals 20 \
--static-quantized-quals 30
c) The entire error log if applicable.
-
--use-original-qualities is used if a samples has already been processed by BQSR and we are re-running BQSR, we want the tool to use the original qualities.
--static-quantized-quals determines which values determines which values the quals should be rounded off to. For example: if you set these values to 10,20,30 it will round off all the quals to one of these three values.
-
Hello,
Thank you for your reply. What is the purpose of rounding off the quals? Why are we doing this. Thank you. -
Hi mk
You can read more about this argument and other arguments in the tool docs here: https://gatk.broadinstitute.org/hc/en-us/articles/360040507871-ApplyBQSR#--static-quantized-quals
-
Hi, thank you. I read the docs before posting. It explains what the parameter does. My question was why should we do this (as is done in the implementation of the best practices). Thank you.
-
Hello,
I found a page in Legacy GATK forum that may be related to this option.
I cited a paragraph in this page below:
----
Static binning of base quality scores. In a nutshell, binning (or quantizing) the base qualities in a BAM file means that instead of recording all possible quality values separately, we group them into bins represented by a single value (by default, 10, 20, 30 or 40). By doing this we end up having to record fewer separate numbers, which through the magic of BAM compression yields substantially smaller files. The idea is that we don’t actually need to be able to differentiate between quality scores at a very high resolution -- if the binning scheme is set up appropriately, it doesn’t make any difference to the variant discovery process downstream. This is not a new concept, but now the GATK engine has an argument to enable binning quality scores during the base recalibration (BQSR) process using a static binning scheme that we have determined produces optimal results in our hands. The level of compression is of course adjustable if you’d like to set your own tradeoff between compression and base quality resolution. We have validated that this type of binning (with our chosen default parameters) does not have any noticeable adverse effect on germline variant discovery. However we are still looking into some possible effects on somatic variant discovery, so we can’t yet recommend binning for that application.
-
Thanks for posting the info you found, Shinichi Namba and helping out other GATK users!
Please sign in to leave a comment.
6 comments