Hello, I’m confused about whether interval list should be used in the data pre-processing workflow for whole exome sequencing data.
According to the GATK documentation When should I restrict my analysis to specific intervals?, exome analysis should include list of targets with padding, so as to exclude off-target noise. The documentation also listed BQSR as one of the steps that should be run with interval list, because “the off-target sequencing data is uninformative and is a source of noise, therefore it should be eliminated.”
However, when I provided an interval list (from the manufacturer, with 100bp padding) for both BaseRecalibrator and ApplyBQSR, the resulting BAM failed to pass ValidateSamFile and threw multiple errors like this:
ERROR::MATE_NOT_FOUND:Read name (Some read name), Mate not found for paired read
While I was searching for solution, I saw this post in which someone has met the similar problem. I’m still confused after reading the answers in that post. What I got from the answers in that post is:
- Provides interval list in BQSR is just for scatter-gather parallelism, not for the subsetting of specific genomic regions.
- BQSR should be run on all reads that contribute to the model.
If I didn’t get it wrong, aren’t these points a bit contradictory to the GATK documentation regarding interval restriction?
Additional questions regarding this issue:
- If the off-target reads are merely noises, should I still include them in the BQSR model?
- If a read was included in the intervals while its mate was not, can I eliminate both reads, just like what “—SANITIZE true” will do in RevertSam?
- Is there any tool other than RevertSam that can eliminate reads with missing mates? I’ve tried PrintReads with “ProperlyPairedReadFilter”, followed by FixMateInformation, but the outputs from these two steps still report the missing mate error.
If I ran BQSR and ApplyBQSR without intervals, then no error occurred, however, the time of processing become extremely long.
Thank you for your time, any help will be much appreciated!
Please sign in to leave a comment.