Scatter / Gather for BaseRecalibrator on a single human WES dataset
Hello everyone! Please excuse me if this question is naïve: I'm still new to bioinformatics and GATK.
I am using the GATK4 suite to ultimately call germline variants on whole exome sequencing data, obtained from an Illumina NextSeq 550 sequencer. For a variety of reasons I cannot use the WDL/Cromwell setup recommended by the Best Practices, so I am trying to replicate the recommended workflow in Bash.
I would like to speed up the BQSR step by employing the Scatter / Gather strategy. However, studying this article (https://gatk.broadinstitute.org/hc/en-us/articles/360035890531-Base-Quality-Score-Recalibration-BQSR-), I've realized that BaseRecalibrator requires a lot of data to build a proper statistical model.
My question: is it okay to scatter the BaseCalibrator job by chromosome if I analyze just one WES sample at a time? (I know that downstream I will need to perform joint genotyping with 30+ samples, but at the moment I'm preparing single-sample BAM files one-by-one).
The article above says that BaseRecalibrator expects each read group to have at least 100M bases. Calculated naively, PF_HQ_ALIGNED_BASES / 23 = 215+ megabases (the metric is from the CollectAlignmentSummaryMetrics output).
Thank you!
— Alex.
-
Hi ,
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
Bhanu, thank you for the response.
If anyone from the community has any insight or guidance to offer here, I'd appreciate it.
Please sign in to leave a comment.
2 comments