BQSR returns lower scores for male samples
Hi,
We've applied BQSR to our dataset following best practice recipes. When inspecting the results, we've noticed that BQSR returns lower Q scores for male samples (slide 1). We don't see such a split between genders in the original qualities and the trend is consistent across all of our muxes.
We've relaunched the analysis using the latest GATK version for a subset of the data (n=30 random samples) and we still see the same trends (slide 2).
Have you come across similar observations before? And what next steps would you recommend to troubleshoot this further?
Here the details of our set up:
GATK version used: v4.0.6.0 and v4.1.6.0
Shared folder: https://drive.google.com/drive/folders/1NEfvraUxPIvl7Vv0jZHESBQquyFhxY1Z?usp=sharing
Exact GATK commands used: launch_bqsr.sh
Slide deck with figures: troubleshooting_bqsr.pdf
Dataset details:
- N = 1,534 samples
- Library prep: PCR-Free (most common) + Nano (first mux)
- Sequencing: HiSeq 4000 2x150bp
- Analysis: GATK4 best practices (germline)
Many thanks.
-
Can you please remove the sex chromosomes when calling BaseRecalibrator (e.g. -XL chrX -XL chrY) and compare the results?
-
Hi Bhanu Gandham,
See slide 3 here: https://drive.google.com/open?id=1bun1U9OgNQfD8FQpmR75fWahcMeBFeer
In a nutshell, I've re-launched BQSR with the following combinations:
- Keep input bam as is, but subset known sites VCF to include only autosomes (i.e. subset_vcf)
- Subset input bams to include autosomes only, keep all chr in known sites VCF (i.e. subset_bam)
- Subset both
Interestingly, gender differences disappear for sets #2 and #3, which suggests that there are no systematic differences between male and female samples. The differences in %Q30 only arise when applying BQSR to sex chromosomes, which brings the overall quality down even further. Also, including/excluding masking of known sites in allosomes has no major impact.
Would you have any thoughts on why the BQSR model is so heavily influenced by chrX/Y?
Thanks
-
Hi Bhanu Gandham,
Have you had the chance to follow up on this?
I wonder if this is a trend specific to our dataset, have you seen a similar behaviour elsewhere?
Many thanks,
-
Hi,
We think that this could happen because X and Y chromosomes may have lower MAPQ reads due to similarities between the two chromosomes and reads are getting mismapped between these chromosomes.
You could either exclude the sex chromosomes from the BQSR step.
Or if you want to recalibrate the sex chromosomes then you could try one of these two options:
- Run BQSR on the only autosomal chromosomes and apply the model constructed to the sex chromosomes, OR
- Run BQSR on only the sex chromosomes and apply the model constructed to the sex chromosomes.
These are suggestion we think that might work but we haven't tested this ourselves. We are very interested to know how well these two suggestions work for you.
-
Hi Bhanu,
Thanks for the feedback.
Here the results of the follow-up analyses that you’ve suggested (slide 4):
https://drive.google.com/open?id=1FPu6Fcq1-twjT_1ts2Di0ebizWvDUmBs
Labels for items 1 and 2 are as follows:
-
Run BQSR on the only autosomal chromosomes and apply the model constructed to the sex chromosomes = model_autosomes_apply_all
-
Run BQSR on only the sex chromosomes and apply the model constructed to the sex chromosomes = model_allosomes_apply_allosomes
Altogether, the gender bias disappears when calibrating the BQSR model with autosomes only, even when including sex chromosomes later when applying the correction. So it seems that BQSR is only applied on the chromosomes used to build the model, is this correct?
Regarding the hypothesis that lower MAPQ reads in chrX and Y could drive the trend, could you expand on how MAPQ is taken into account when generating and/or applying the BQSR model?
Thanks
-
-
Hi,
I am glad to hear that the gender bias disappears when calibrating the BQSR model with autosomes. As I mentioned above, we have not tested this on our end so unfortunately I cannot comment on this further. We will add this to the list of things we will test and benchmark on our end but it might take sometime.
-
Hi Bhanu,
Thanks for the response. Please keep us updated on testing outcomes on your end. Since this is such an early step within the analysis pipeline, it has the potential to have a major impact on the final results downstream.
Cheers,
-
Since calibrating the BQSR model with autosomes and applying the model constructed to the sex chromosomes seems like a good workaround for now I suggest you proceed with that. Since we have other things we are working on that are on a higher priority list I cannot promise a timeline for testing and benchmarking results.
-
What does this mean "model_autosomes_apply_all" and "model_allosomes_apply_allosomes". Only search brings me back to this thread. Is this a param(s) in ApplyBQSR?
-
Brian Wiley, it looks like that is just a label being used by Mar Gonzàlez-Porta, as we do not have that option in BQSR.
-
Thanks. @Mar Gonzàlez-Porta, can you advise how you applied to all chromosomes after running BQSR on the autosomes?
-
Mar Gonzàlez-Porta, sorry the @ didn't work from Firefox.
-
Hi Brian Wiley, I used different input BAMs for the modeling vs. recalibration steps: (1) run BaseRecalibrator on a subsetted BAM with only autosomes; (2) run ApplyBQSR on the full BAM. Cheers
Please sign in to leave a comment.
13 comments