Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

How do adapter sequences not contribute to alignment if the base quality is set to 2 with SamToFastq but BWA-MEM does not consider base quality scores?

Answered
0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi ISmolicz,

    Yes, it looks like BWA-MEM does not take into account base quality scores.

    In the tutorial, there is this wording:

    By specifying CLIPPING_ATTRIBUTE=XT and CLIPPING_ACTION=2, SamToFastq changes the quality scores of bases marked by XT to two--a rather low score in the Phred scale. This effectively removes the adapter portion of sequences from contributing to downstream read alignment and alignment scoring metrics.

    A quality score of 2 does not affect BWA, but it does affect the downstream HaplotypeCaller realignment and variant calling. You can read more about why this works in the BWA forum: https://sourceforge.net/p/bio-bwa/mailman/message/34410817/.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    ISmolicz

    Thank you for your reply Genevieve Brandt (she/her).

    I think it would be useful to clarify in the documentation that the adapters would not be removed when processing data through BWA-MEM when applying CLIPPING_ACTION = 2 and specify the GATK tools where base quality is considered.

    From the options available, it appears as if there if not an option in the workflow (How to) Map and clean up short read sequence data efficiently to prevent adapters affecting both alignment to the reference genome and downstream steps.

    If one removes adapters with CLIPPING_ACTION = X, this will prevent interference with alignment to the reference genome but not downstream steps, as hard-clips are changed to soft-clips with MergeBamAlignment and original sequences are restored. However, if base quality scores for adapters are reduced with CLIPPING_ACTION = 2, adapters will not be removed during alignment to the reference but the lower scores will affect downstream analyses (if lower scores are maintained post-MergeBamAlignment - awaiting confirmation in a separate post).

    It seems that one would need to remove adapters using an external tool prior to commencing the above workflow and generating the unmapped BAM to fully remove adapters in all steps.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for the feedback ISmolicz. I'm still trying to track down more details about why the options you have brought up are recommended and I will respond to your other post when I am able to get more answers.

    However, there is mention in the tutorial in the MergeBamAlignment step of how to clip adapters for the final clean bam. MergeBamAlignment in default state has CLIP_ADAPTERS=true, which will clip adapters, not just adjust the quality. So if you want to clip the adapters you should not change CLIP_ADAPTERS to false, as it does in the tutorial.

    0
    Comment actions Permalink
  • Avatar
    ISmolicz

    Thank you for your reply Genevieve Brandt (she/her). I completely understand it is taking time to answer my other query so will await an update once information is available.

    Although MergeBamAlignment has the option CLIP_ADAPTERS=true as you have mentioned, from my understanding this is only soft-clipping and therefore, ultimately the adapters would still be present and not removed? Or is CLIP_ADAPTERS=true in fact specifying hard-clipping? The MergeBamAlignment documentation states: 

    • CLIP_ADAPTERS -- Whether to (soft-)clip the ends of the reads that are identified as belonging to adapters

    Thank you again.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yes, MergeBamAlignment soft clips adapters with CLIP_ADAPTERS=true.

    In our best practices pipeline for data pre-processing, we do not recommend changing the CLIP_ADAPTERS parameter, though what you ultimately decide depends on your research and your data.

    I am looking into the tutorial, unfortunately the original author is not around for me to figure out why exactly that parameter was changed, so I'm not sure when I will be able to find out. I can recommend that our team looks over the tutorial sometime in the future to verify the methods are up to date.

    Our most updated best practices are in WDL (workflow description language) format and can be found on our gatk-workflows github

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk