(new user) Question about BAM preparation and MUTECT 2 for special use
gatk/4.2.3.0
I want to use Mutect2 to compare a cell line that has been edited, with its control before editing. Was thinking that Mutect2 could be used for this purpose. I understand that according to Gatk best practices, it is recommended to do Base Quality Recalibration - my concern is that this will lower the quality score of edited loci, as these edited loci likely don't occur as variants in the dbSNP file I'd have to supply (not common variants in the population).
From the documentation of BQSR: "For each bin, we count the number of bases within the bin and how often such bases mismatch the reference base, excluding loci known to vary in the population, according to the known variants resource (typically dbSNP). This information is output to a recalibration file in GATKReport format."
Should I preform BQSR? Appreciate any suggestions for preparing my BAM files for use of the tool!
-
By default we recommend performing BQSR for all kinds of samples however if you have reservations for your genome editing to get undercalled by Mutect2 we actually recommend you to have your pre and post recalibration bam files kept if you have resources and try running your comparison with both sets of files to get a sense of what is missing or not.
I hope this helps.
-
Thank you - do you have any recommendations to what to look for when doing the comparison?
Best,
Tanya
-
Hi again.
You may want to check the concordance of your resulting files to see if calls get affected around the region of editing and/or elsewhere. GATK has Concordance checking tools for this purpose.
I hope this helps.
-
Thank you, from the documentation, that says: "The known variants are used to mask out bases at sites of real (expected) variation, to avoid counting real variants as errors. Outside of the masked sites, every mismatch is counted as an error. The rest is mostly accounting." it seems like this will for sure get rid of novel variations, is this interpretation correct?
-
It is not correct. BQSR is not meant to get rid of novel variations but it is actually to reduce the noise caused by erroneous mismatches within reads which are all recalibrated by the model generated.
BQSR relies on input of known variant sites so that it will distinguish covariance signals of real vs noise. When article says mismatch it is not mismatching variants but it is all mismatched basecalls within individual reads. When BQSR is run, certain basecall measures and other metametrics such as positional bias, nucleotide composition bias etc are accounted for all real sites before generating a model. Once model is applied it will reduce the quality score of those sites with biases and will smoothen the signals from real variant sites therefore your calls will have a much more uniform coverage and AD DP values at the end.
Regards.
Please sign in to leave a comment.
5 comments