Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

bqsr output issue

0

8 comments

  • Avatar
    Bhanu Gandham

    Hi,

     

    It looks like vast majority of bases in the original data had quality 38, which is really strange and something to probably look into on your end. We think that's why there are so many recalibration errors. This sounds like sequencer's issue to me. You can use picard's tools like CollectBaseDistributionByCycle to confirm the base quality distribution.

    The plots don't look wrong to me because it is showing all the points even if the diagonal line is going off the "screen", all the data seems to be there. From what I can tell, everything is fine.. The recalibrated data matches the diagonal much better than the original, which has most of it's data at q38 but it's actual quality is more like q27.

    Note: The "errors" are places in the genome that disagree with the reference (but are not in the provided vcf).

     

    0
    Comment actions Permalink
  • Avatar
    John Denton

    Thanks for the clarification on this! A colleague of mine and I were wondering about those uncalibrated scores. So the post-BQSR outputs appear to be okay to use, and the cause of the pre-calibration quality score anomaly is something to be explored further, but is not critical, in this case?

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    That is correct.

    0
    Comment actions Permalink
  • Avatar
    TAYYABA ALVI

    Hello, I had applied BQSR on a single sample data. I have difficulties in understanding certain things.

    BQSR reports showed read groups as 'None' , but I have already performed addreadgroup on my file and can see @RG line in my bam files.(both before and after recalibration).

    The base quality score is between 11 and 38, which seems very low. And I have no idea why a lot of bases have <20 quality score when I have already filtered bases with quality score less than 20.

     

    I am a beginner and this is my first time using any tool practically.So I apologize if my questions are very naive.

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hello TAYYABA ALVI, could you give an example of a read group in your file that does not match this plot?

    0
    Comment actions Permalink
  • Avatar
    Maggie Sudo Pui San

    Hello all, I have very similar output with John Denton's. However both my cycle covariate plots looked a bit off. They have low log10 (observations) values on the negative scale (-1 to -100++) of the x-axis...  Even in the other plots, the log10 value seemed to be very low/ faint too. Is that normal?

    Thank you in advance. I would really appreciate if you could also look through other plots/ readings and let me know if there's anything else I need to be concerned with. 

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Maggie Sudo Pui San

     

    Can you please start a new thread with your question. Please post the exact commands you are using and the versions of tools you are using. Can you also pease give an explanation on what you are trying to do and the challenge you are facing.

    0
    Comment actions Permalink
  • Avatar
    Maggie Sudo Pui San

    Hi Bhanu Gandham 

     

    Okay I will do that

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk