Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

BQSR returns lower scores for male samples

0

13 comments

  • Avatar
    Bhanu Gandham

    Hi Mar Gonzàlez-Porta

     

    Can you please remove the sex chromosomes when calling BaseRecalibrator  (e.g. -XL chrX -XL chrY) and compare the results?

    0
    Comment actions Permalink
  • Avatar
    Mar Gonzàlez-Porta

    Hi Bhanu Gandham,

    See slide 3 here: https://drive.google.com/open?id=1bun1U9OgNQfD8FQpmR75fWahcMeBFeer

    In a nutshell, I've re-launched BQSR with the following combinations:

    1. Keep input bam as is, but subset known sites VCF to include only autosomes (i.e. subset_vcf)
    2. Subset input bams to include autosomes only, keep all chr in known sites VCF (i.e. subset_bam)
    3. Subset both

    Interestingly, gender differences disappear for sets #2 and #3, which suggests that there are no systematic differences between male and female samples. The differences in %Q30 only arise when applying BQSR to sex chromosomes, which brings the overall quality down even further. Also, including/excluding masking of known sites in allosomes has no major impact.

    Would you have any thoughts on why the BQSR model is so heavily influenced by chrX/Y?

    Thanks

    0
    Comment actions Permalink
  • Avatar
    Mar Gonzàlez-Porta

    Hi Bhanu Gandham,

    Have you had the chance to follow up on this?

    I wonder if this is a trend specific to our dataset, have you seen a similar behaviour elsewhere?

    Many thanks,

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi,

     

    We think that this could happen because X and Y chromosomes may have lower MAPQ reads due to similarities between the two chromosomes and reads are getting mismapped between these chromosomes.

    You could either exclude the sex chromosomes from the BQSR step.

    Or if you want to recalibrate the sex chromosomes then you could try one of these two options:

    1. Run BQSR on the only autosomal chromosomes and apply the model constructed to the sex chromosomes, OR
    2. Run BQSR on only the sex chromosomes and apply the model constructed to the sex chromosomes.

    These are suggestion we think that might work but we haven't tested this ourselves. We are very interested to know how well these two suggestions work for you. 

     

    0
    Comment actions Permalink
  • Avatar
    Mar Gonzàlez-Porta

    Hi Bhanu,

    Thanks for the feedback.

    Here the results of the follow-up analyses that you’ve suggested (slide 4):

    https://drive.google.com/open?id=1FPu6Fcq1-twjT_1ts2Di0ebizWvDUmBs

    Labels for items 1 and 2 are as follows:

    1. Run BQSR on the only autosomal chromosomes and apply the model constructed to the sex chromosomes = model_autosomes_apply_all

    2. Run BQSR on only the sex chromosomes and apply the model constructed to the sex chromosomes = model_allosomes_apply_allosomes

    Altogether, the gender bias disappears when calibrating the BQSR model with autosomes only, even when including sex chromosomes later when applying the correction. So it seems that BQSR is only applied on the chromosomes used to build the model, is this correct?

    Regarding the hypothesis that lower MAPQ reads in chrX and Y could drive the trend, could you expand on how MAPQ is taken into account when generating and/or applying the BQSR model?

    Thanks

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi,

     

    I am glad to hear that the gender bias disappears when calibrating the BQSR model with autosomes. As I mentioned above, we have not tested this on our end so unfortunately I cannot comment on this further. We will add this to the list of things we will test and benchmark on our end but it might take sometime.

    0
    Comment actions Permalink
  • Avatar
    Mar Gonzàlez-Porta

    Hi Bhanu,

    Thanks for the response. Please keep us updated on testing outcomes on your end. Since this is such an early step within the analysis pipeline, it has the potential to have a major impact on the final results downstream.

    Cheers,

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Mar Gonzàlez-Porta

     

    Since calibrating the BQSR model with autosomes and applying the model constructed to the sex chromosomes seems like a good workaround for now I suggest you proceed with that. Since we have other things we are working on that are on a higher priority list I cannot promise a timeline for testing and benchmarking results.

    0
    Comment actions Permalink
  • Avatar
    Brian Wiley

    What does this mean "model_autosomes_apply_all" and "model_allosomes_apply_allosomes".  Only search brings me back to this thread.  Is this a param(s) in ApplyBQSR?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Brian Wiley, it looks like that is just a label being used by Mar Gonzàlez-Porta, as we do not have that option in BQSR.

    0
    Comment actions Permalink
  • Avatar
    Brian Wiley

    Thanks. @Mar Gonzàlez-Porta, can you advise how you applied to all chromosomes after running BQSR on the autosomes?

    0
    Comment actions Permalink
  • Avatar
    Brian Wiley

    Mar Gonzàlez-Porta, sorry the @ didn't work from Firefox.

    0
    Comment actions Permalink
  • Avatar
    Mar Gonzàlez-Porta

    Hi Brian Wiley, I used different input BAMs for the modeling vs. recalibration steps: (1) run BaseRecalibrator on a subsetted BAM with only autosomes; (2) run ApplyBQSR on the full BAM. Cheers

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk