Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 overlapping reads behavior with --pcr-snv-qual parameter

0

2 comments

  • Official comment
    Avatar
    Gökalp Çelik

    Hi Semen Leyn

    Here is the official response from our team. 

    The goal of this parameter is to reduce overstating the amount of evidence from overlapping reads because while they are independent (hence error probabilities multiply and quals i.e. log space probabilities add) as far as sequencing error is concerned they are not independent regarding PCR errors.  That is, the two error models of overlapping reads that agree is two independent sequencing errors OR a single PCR error upstream that propagated to both reads.

    As a convenient way to make the math work out, we represent the possibility of PCR error by modifying the base quals, which generally works okay but has the undesirable effect of polluting the MBQ.

    Setting --pcr-snv-qual too low (34 is highly unrealistic) causes Mutect2 to consider overlapping reads as not supporting the variant at all, yielding an MBQ of 37 that comes only from non-overlapping reads. Increasing the parameter past a certain point essentially disables it because the base quals, not the PCR qual, become the dominant error possibility.

    The reason this is correct is because Mutect2's genotyping model is smart and combines the evidence from overlapping reads into evidence from the fragment as a whole, since fragments are the true physical independent unit of evidence.

    You may set the parameter extremely high if you want, but beware that if two overlapping reads have BQ = 37 and we do not use the correction we are saying that the probability of a PCR error is less than 1 in 25 million. PCR is usually more noisier than that. 

    We hope this helps. 

     

     

    Comment actions Permalink
  • Avatar
    Semen Leyn

    Thank you, now I understand the issue much better. However it complicates some filters as you can't tell where MBQ20 came from - from low base quality or PCR error adjustments.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk