bqsr output issue
I have bqsr plots that do not seem to make sense, and I am wondering what troubleshooting can be done.
I'm using gatk 4.1.4.1 to generate variants for a project on a non-model organism. I'm at the first BQSR step. Following a pipeline from a 2018 forum post, I used the command syntax below to generate the pre-recalibration and post-recalibration tables needed to examine the effects of quality score correction. The BAM file is a duplicate-marked, indexed BAM file, and the SNP and INDEL variants are Variants filtered using SelectVariants and FilterVariants, from outputs of a GenotypeGVCF call to a merged set of 4 separately VCFed sample files, merged with CombineGVCFs. The commands I used for bqsr were:
gatk BaseRecalibrator -I $BAMFILE -R $REF --known-sites $SNPVARIANTS --known-sites $INDELVARIANTS -O $PRERECAL.table
gatk ApplyBQSR --bqsr-recal-file $PRERECAL.table -I $BAMFILE -O $OUTFILE.bam
gatk BaseRecalibrator -I $OUTFILE.bam -R $REF --known-sites $SNPVARIANTS --known-sites $INDELVARIANTS -O $POSTRECAL.table
gatk ApplyBQSR --bqsr-recal-file $POSTRECAL.table -I $OUTFILE.bam -O $OUTFILE-2.bam
gatk AnalyzeCovariates -before $PRERECAL.table -after $POSTRECAL.table -plots $PLOTNAME.pdf
and I get the attached plots. The first plot seems unusually restricted, with the correction not seeming to linearize very well, and the regression running off the plot. The final plots and tables appear to contain a high number of errors, as well.
-
Hi,
It looks like vast majority of bases in the original data had quality 38, which is really strange and something to probably look into on your end. We think that's why there are so many recalibration errors. This sounds like sequencer's issue to me. You can use picard's tools like CollectBaseDistributionByCycle to confirm the base quality distribution.
The plots don't look wrong to me because it is showing all the points even if the diagonal line is going off the "screen", all the data seems to be there. From what I can tell, everything is fine.. The recalibrated data matches the diagonal much better than the original, which has most of it's data at q38 but it's actual quality is more like q27.
Note: The "errors" are places in the genome that disagree with the reference (but are not in the provided vcf).
-
Thanks for the clarification on this! A colleague of mine and I were wondering about those uncalibrated scores. So the post-BQSR outputs appear to be okay to use, and the cause of the pre-calibration quality score anomaly is something to be explored further, but is not critical, in this case?
-
That is correct.
-
Hello, I had applied BQSR on a single sample data. I have difficulties in understanding certain things.
BQSR reports showed read groups as 'None' , but I have already performed addreadgroup on my file and can see @RG line in my bam files.(both before and after recalibration).
The base quality score is between 11 and 38, which seems very low. And I have no idea why a lot of bases have <20 quality score when I have already filtered bases with quality score less than 20.
I am a beginner and this is my first time using any tool practically.So I apologize if my questions are very naive.
Thank you
-
Hello TAYYABA ALVI, could you give an example of a read group in your file that does not match this plot?
-
Hello all, I have very similar output with John Denton's. However both my cycle covariate plots looked a bit off. They have low log10 (observations) values on the negative scale (-1 to -100++) of the x-axis... Even in the other plots, the log10 value seemed to be very low/ faint too. Is that normal?
Thank you in advance. I would really appreciate if you could also look through other plots/ readings and let me know if there's anything else I need to be concerned with.
-
Can you please start a new thread with your question. Please post the exact commands you are using and the versions of tools you are using. Can you also pease give an explanation on what you are trying to do and the challenge you are facing.
-
Okay I will do that
Please sign in to leave a comment.
8 comments