Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

How do adapter sequences not contribute to alignment if the base quality is set to 2 with SamToFastq but BWA-MEM does not consider base quality scores?

Answered
0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi ISmolicz,

    Yes, it looks like BWA-MEM does not take into account base quality scores.

    In the tutorial, there is this wording:

    By specifying CLIPPING_ATTRIBUTE=XT and CLIPPING_ACTION=2, SamToFastq changes the quality scores of bases marked by XT to two--a rather low score in the Phred scale. This effectively removes the adapter portion of sequences from contributing to downstream read alignment and alignment scoring metrics.

    A quality score of 2 does not affect BWA, but it does affect the downstream HaplotypeCaller realignment and variant calling. You can read more about why this works in the BWA forum: https://sourceforge.net/p/bio-bwa/mailman/message/34410817/.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    ISmolicz

    Thank you for your reply Genevieve Brandt (she/her).

    I think it would be useful to clarify in the documentation that the adapters would not be removed when processing data through BWA-MEM when applying CLIPPING_ACTION = 2 and specify the GATK tools where base quality is considered.

    From the options available, it appears as if there if not an option in the workflow (How to) Map and clean up short read sequence data efficiently to prevent adapters affecting both alignment to the reference genome and downstream steps.

    If one removes adapters with CLIPPING_ACTION = X, this will prevent interference with alignment to the reference genome but not downstream steps, as hard-clips are changed to soft-clips with MergeBamAlignment and original sequences are restored. However, if base quality scores for adapters are reduced with CLIPPING_ACTION = 2, adapters will not be removed during alignment to the reference but the lower scores will affect downstream analyses (if lower scores are maintained post-MergeBamAlignment - awaiting confirmation in a separate post).

    It seems that one would need to remove adapters using an external tool prior to commencing the above workflow and generating the unmapped BAM to fully remove adapters in all steps.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for the feedback ISmolicz. I'm still trying to track down more details about why the options you have brought up are recommended and I will respond to your other post when I am able to get more answers.

    However, there is mention in the tutorial in the MergeBamAlignment step of how to clip adapters for the final clean bam. MergeBamAlignment in default state has CLIP_ADAPTERS=true, which will clip adapters, not just adjust the quality. So if you want to clip the adapters you should not change CLIP_ADAPTERS to false, as it does in the tutorial.

    0
    Comment actions Permalink
  • Avatar
    ISmolicz

    Thank you for your reply Genevieve Brandt (she/her). I completely understand it is taking time to answer my other query so will await an update once information is available.

    Although MergeBamAlignment has the option CLIP_ADAPTERS=true as you have mentioned, from my understanding this is only soft-clipping and therefore, ultimately the adapters would still be present and not removed? Or is CLIP_ADAPTERS=true in fact specifying hard-clipping? The MergeBamAlignment documentation states: 

    • CLIP_ADAPTERS -- Whether to (soft-)clip the ends of the reads that are identified as belonging to adapters

    Thank you again.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yes, MergeBamAlignment soft clips adapters with CLIP_ADAPTERS=true.

    In our best practices pipeline for data pre-processing, we do not recommend changing the CLIP_ADAPTERS parameter, though what you ultimately decide depends on your research and your data.

    I am looking into the tutorial, unfortunately the original author is not around for me to figure out why exactly that parameter was changed, so I'm not sure when I will be able to find out. I can recommend that our team looks over the tutorial sometime in the future to verify the methods are up to date.

    Our most updated best practices are in WDL (workflow description language) format and can be found on our gatk-workflows github

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    kescullator

    Hello,

    Sorry to revive an old discussion, but I wondered if this had been looked into in the years since. I note that the tutorial was updated on June 25 this year, but the advice still seems to disagree in part with the best practices WDL (if I'm reading this correctly, these are much harder to read than the tutorial...). ISmolicz's points about bwa mem not noticing the base's quality score still seems pertinent - the WDL still doesn't seem to set CLIPPING_ACTION = 2, although I think the WDL now agrees that CLIP-ADAPTERS should be changed to false - this means that the adapters are not even soft-clipped, could this not cause problems downstream?

    Or are we supposed to hard clip adapters prior to these steps? It's not clear to me whether GATK expects us to have trimmed adapters with something like Trimmomatic before inputting the data to the GATK workflows. This may be a simpler solution to the adapter problem, unless there's some reason not to do this? 

    The other potential issue I see with the CLIPPING_ACTION = 2 strategy is that BaseRecalibrator uses the quality scores to build a model or some other statistical magic... couldn't messing with the quality score interfere with this? 

    I'd greatly appreciate any help figuring this out.

    Thanks heaps,

    Kate 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi kescullator

    Adapter cleanup is more of a philosophical perspective rather than a hard written rule. Some prefer keeping adapter sequences albeit with low base call qualities to ensure they are not contained in downstream calculations and some prefer to remove them completely either as a part of the device/vendor demultiplexing stage or pre-processing step by using different adapter cleanup tools. 

    All approaches are welcome and have their own merits. It is true that BWA MEM does not care about base call qualities and downstream applications only care about those. BaseRecalibrator may get affected by the presence of low quality adapter contaminants in the data but that remains to be investigated by the researcher. BaseRecalibrator applies its own covariates to determine what counts and what does not count towards a successful recalibration to reach convergence. BQSR flow is also optional and is completely removed by our DRAGEN-GATK Functional Equivalence workflows. 

    I hope it helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk