Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

bqsr output issue

0

3 comments

  • Avatar
    Bhanu Gandham

    Hi,

     

    It looks like vast majority of bases in the original data had quality 38, which is really strange and something to probably look into on your end. We think that's why there are so many recalibration errors. This sounds like sequencer's issue to me. You can use picard's tools like CollectBaseDistributionByCycle to confirm the base quality distribution.

    The plots don't look wrong to me because it is showing all the points even if the diagonal line is going off the "screen", all the data seems to be there. From what I can tell, everything is fine.. The recalibrated data matches the diagonal much better than the original, which has most of it's data at q38 but it's actual quality is more like q27.

    Note: The "errors" are places in the genome that disagree with the reference (but are not in the provided vcf).

     

    0
    Comment actions Permalink
  • Avatar
    John Denton

    Thanks for the clarification on this! A colleague of mine and I were wondering about those uncalibrated scores. So the post-BQSR outputs appear to be okay to use, and the cause of the pre-calibration quality score anomaly is something to be explored further, but is not critical, in this case?

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    That is correct.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk