BQSR: Do read filters used by BQSR tools need to be applied to data prior to assessing the number of bases per read group?
AnsweredDear GATK Team,
In the Base Quality Score Recalibration (BQSR) documentation, the following is described:
We usually expect to see more than 100M bases per read group; as a rule of thumb, larger numbers will work better.
With default read filters (listed below) being applied when running BaseRecalibrator and ApplyBQSR, should the number of bases per read group be calculated after these read filters are applied to the data? If so, how do I calculate this figure to check enough data is present to run BQSR, taking into account the read filters?
BaseRecalibrator:
- NotSecondaryAlignmentReadFilter
- PassesVendorQualityCheckReadFilter
- MappedReadFilter
- MappingQualityAvailableReadFilter
- NotDuplicateReadFilter
- MappingQualityNotZeroReadFilter
- WellformedReadFilter
ApplyBQSR:
Thank you for your time and help.
Kind regards.
-
Official comment
Hi ISmolicz,
The 100M bases per read group rule is not a hard cut off. If you have around 100M bases per read group but then lose a lot of reads in read filtering, then you might see issues. If you keep most of your reads, then BQSR should run fine. You can check the plots to see how well the recalibration worked if you are worried about your number of reads.
Best,
Genevieve
Comment actions -
Hi ISmolicz,
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
Please sign in to leave a comment.
2 comments