Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

(new user) Question about BAM preparation and MUTECT 2 for special use

0

5 comments

  • Avatar
    Gökalp Çelik

    Hi Tanya Sarkin Jain

    By default we recommend performing BQSR for all kinds of samples however if you have reservations for your genome editing to get undercalled by Mutect2 we actually recommend you to have your pre and post recalibration bam files kept if you have resources and try running your comparison with both sets of files to get a sense of what is missing or not. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Tanya Sarkin Jain

    Thank you - do you have any recommendations to what to look for when doing the comparison?

    Best,

    Tanya

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    You may want to check the concordance of your resulting files to see if calls get affected around the region of editing and/or elsewhere. GATK has Concordance checking tools for this purpose. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Tanya Sarkin Jain

    Thank you, from the documentation, that says: "The known variants are used to mask out bases at sites of real (expected) variation, to avoid counting real variants as errors. Outside of the masked sites, every mismatch is counted as an error. The rest is mostly accounting." it seems like this will for sure get rid of novel variations, is this interpretation correct?


    https://gatk.broadinstitute.org/hc/en-us/articles/360035890531-Base-Quality-Score-Recalibration-BQSR#:~:text=The%20base%20recalibration%20process%20involves,producing%20a%20new%20BAM%20file. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Tanya Sarkin Jain

    It is not correct. BQSR is not meant to get rid of novel variations but it is actually to reduce the noise caused by erroneous mismatches within reads which are all recalibrated by the model generated. 

    BQSR relies on input of known variant sites so that it will distinguish covariance signals of real vs noise. When article says mismatch it is not mismatching variants but it is all mismatched basecalls within individual reads. When BQSR is run, certain basecall measures and other metametrics such as positional bias, nucleotide composition bias etc are accounted for all real sites before generating a model. Once model is applied it will reduce the quality score of those sites with biases and will smoothen the signals from real variant sites therefore your calls will have a much more uniform coverage and AD DP values at the end. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk