I have bqsr plots that do not seem to make sense, and I am wondering what troubleshooting can be done.
I'm using gatk 18.104.22.168 to generate variants for a project on a non-model organism. I'm at the first BQSR step. Following a pipeline from a 2018 forum post, I used the command syntax below to generate the pre-recalibration and post-recalibration tables needed to examine the effects of quality score correction. The BAM file is a duplicate-marked, indexed BAM file, and the SNP and INDEL variants are Variants filtered using SelectVariants and FilterVariants, from outputs of a GenotypeGVCF call to a merged set of 4 separately VCFed sample files, merged with CombineGVCFs. The commands I used for bqsr were:
gatk BaseRecalibrator -I $BAMFILE -R $REF --known-sites $SNPVARIANTS --known-sites $INDELVARIANTS -O $PRERECAL.table
gatk ApplyBQSR --bqsr-recal-file $PRERECAL.table -I $BAMFILE -O $OUTFILE.bam
gatk BaseRecalibrator -I $OUTFILE.bam -R $REF --known-sites $SNPVARIANTS --known-sites $INDELVARIANTS -O $POSTRECAL.table
gatk ApplyBQSR --bqsr-recal-file $POSTRECAL.table -I $OUTFILE.bam -O $OUTFILE-2.bam
gatk AnalyzeCovariates -before $PRERECAL.table -after $POSTRECAL.table -plots $PLOTNAME.pdf
and I get the attached plots. The first plot seems unusually restricted, with the correction not seeming to linearize very well, and the regression running off the plot. The final plots and tables appear to contain a high number of errors, as well.
Please sign in to leave a comment.