Mutect2 overlapping reads behavior with --pcr-snv-qual parameter
Hello!
Could you help me understand how does the "--pcr-snv-qual" parameter work?
I have a dataset with 150nt paired reads with most of quality scores equal 37. However based on bam file QC almost all inserts sizes are less than 300nt so majority of pairs have a significant overlap. In result Mutect2 4.2.0.0 that I'm using drops base quality of overlapped nucleotides to BQ=20 and majority of variants have MBQ=20.
I was playing with "--pcr-snv-qual" parameter and found quite strange behavior:
1) values from 36 to 74 (although I did not test every integer) give majority of variants with MBQ=half of pcr-snv-qual.
2) values <= 34 and >= 80 give me majority of variants with MBQ=37.
I was looking in the code, GitHub issues and forum pages and found that Mutect2 set base quality as half of pcr-snv-qual for bases that are the same in both reads from the overlapping pair and 0 if they are different.
1) However why using pcr-snv-qual <= 34 results in MBQ=37?
2) Is it safe to override overlapping reads base quality adjustment by setting pcr-snv-qual very high? I don't see why it is reasonable to drop quality of base that was supported by two high-quality reads (e.g. two A bases with BQ=37).
Thanks!
-
Official comment
Hi Semen Leyn
Here is the official response from our team.
The goal of this parameter is to reduce overstating the amount of evidence from overlapping reads because while they are independent (hence error probabilities multiply and quals i.e. log space probabilities add) as far as sequencing error is concerned they are not independent regarding PCR errors. That is, the two error models of overlapping reads that agree is two independent sequencing errors OR a single PCR error upstream that propagated to both reads.
As a convenient way to make the math work out, we represent the possibility of PCR error by modifying the base quals, which generally works okay but has the undesirable effect of polluting the MBQ.
Setting
--pcr-snv-qual
too low (34 is highly unrealistic) causes Mutect2 to consider overlapping reads as not supporting the variant at all, yielding an MBQ of 37 that comes only from non-overlapping reads. Increasing the parameter past a certain point essentially disables it because the base quals, not the PCR qual, become the dominant error possibility.The reason this is correct is because Mutect2's genotyping model is smart and combines the evidence from overlapping reads into evidence from the fragment as a whole, since fragments are the true physical independent unit of evidence.
You may set the parameter extremely high if you want, but beware that if two overlapping reads have BQ = 37 and we do not use the correction we are saying that the probability of a PCR error is less than 1 in 25 million. PCR is usually more noisier than that.
We hope this helps.
Comment actions -
Thank you, now I understand the issue much better. However it complicates some filters as you can't tell where MBQ20 came from - from low base quality or PCR error adjustments.
Please sign in to leave a comment.
2 comments