I am having problems with the BaseRecalibrator during the running process
I am analyzing whole genome sequencing data in saffron(C.cartwrightianus) plants. However, the VCF file for BaseRecalibrator required by the GATK program is unavailable for this plant. Can you please guide how to proceed?
Thanks in advance
-
BQSR without any public variant dataset is still possible with a method we name as bootstrapping.
Our BQSR documentation summarizes this method under the No excuses part.
https://gatk.broadinstitute.org/hc/en-us/articles/360035890531-Base-Quality-Score-Recalibration-BQSR
Basically you need to call variants from your uncalibrated bam file and select very high quality variants. Using these variants you can perform a first round of BQSR and check covariates and see if you are reaching a convergence where your reported quality scores are getting closer to empirical scores. You can repeat calling and recalibrating steps multiple times to reach convergence. Keep in mind that if your samples have too much heterogeneity then reaching a convergence may become harder.
I hope this helps.
Please sign in to leave a comment.
1 comment