How do adapter sequences not contribute to alignment if the base quality is set to 2 with SamToFastq but BWA-MEM does not consider base quality scores?
AnsweredDear GATK Team,
In the tutorial (How to) Map and clean up short read sequence data efficiently, the documentation states that changing the base quality scores to 2 for a clipped region 'effectively removes the adapter portion of sequences from contributing to downstream read alignment and alignment scoring metrics'. However, from my understanding, BWA-MEM does not take into account base quality scores. Therefore, how does this work?
Thank you for your time and help.
Kind regards.
-
Hi ISmolicz,
Yes, it looks like BWA-MEM does not take into account base quality scores.
In the tutorial, there is this wording:
By specifying
CLIPPING_ATTRIBUTE
=XT andCLIPPING_ACTION
=2, SamToFastq changes the quality scores of bases marked by XT to two--a rather low score in the Phred scale. This effectively removes the adapter portion of sequences from contributing to downstream read alignment and alignment scoring metrics.A quality score of 2 does not affect BWA, but it does affect the downstream HaplotypeCaller realignment and variant calling. You can read more about why this works in the BWA forum: https://sourceforge.net/p/bio-bwa/mailman/message/34410817/.
Best,
Genevieve
-
Thank you for your reply Genevieve Brandt (she/her).
I think it would be useful to clarify in the documentation that the adapters would not be removed when processing data through BWA-MEM when applying CLIPPING_ACTION = 2 and specify the GATK tools where base quality is considered.
From the options available, it appears as if there if not an option in the workflow (How to) Map and clean up short read sequence data efficiently to prevent adapters affecting both alignment to the reference genome and downstream steps.
If one removes adapters with CLIPPING_ACTION = X, this will prevent interference with alignment to the reference genome but not downstream steps, as hard-clips are changed to soft-clips with MergeBamAlignment and original sequences are restored. However, if base quality scores for adapters are reduced with CLIPPING_ACTION = 2, adapters will not be removed during alignment to the reference but the lower scores will affect downstream analyses (if lower scores are maintained post-MergeBamAlignment - awaiting confirmation in a separate post).
It seems that one would need to remove adapters using an external tool prior to commencing the above workflow and generating the unmapped BAM to fully remove adapters in all steps.
-
Thanks for the feedback ISmolicz. I'm still trying to track down more details about why the options you have brought up are recommended and I will respond to your other post when I am able to get more answers.
However, there is mention in the tutorial in the MergeBamAlignment step of how to clip adapters for the final clean bam. MergeBamAlignment in default state has CLIP_ADAPTERS=true, which will clip adapters, not just adjust the quality. So if you want to clip the adapters you should not change CLIP_ADAPTERS to false, as it does in the tutorial.
-
Thank you for your reply Genevieve Brandt (she/her). I completely understand it is taking time to answer my other query so will await an update once information is available.
Although MergeBamAlignment has the option CLIP_ADAPTERS=true as you have mentioned, from my understanding this is only soft-clipping and therefore, ultimately the adapters would still be present and not removed? Or is CLIP_ADAPTERS=true in fact specifying hard-clipping? The MergeBamAlignment documentation states:
- CLIP_ADAPTERS -- Whether to (soft-)clip the ends of the reads that are identified as belonging to adapters
Thank you again.
-
Yes, MergeBamAlignment soft clips adapters with CLIP_ADAPTERS=true.
In our best practices pipeline for data pre-processing, we do not recommend changing the CLIP_ADAPTERS parameter, though what you ultimately decide depends on your research and your data.
I am looking into the tutorial, unfortunately the original author is not around for me to figure out why exactly that parameter was changed, so I'm not sure when I will be able to find out. I can recommend that our team looks over the tutorial sometime in the future to verify the methods are up to date.
Our most updated best practices are in WDL (workflow description language) format and can be found on our gatk-workflows github.
Best,
Genevieve
Please sign in to leave a comment.
5 comments