Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ApplyBQSR for somatic mutation detection in HLA regions

Answered
0

7 comments

  • Avatar
    Pamela Bretscher

    Hi Elisabetta Manduchi,

    I am going to move your post into our Community Discussions -> New User Advice topic, as the somatic topic is for reporting bugs and issues with GATK.

    You can read more about our forum guidelines and the topics here: Forum Guidelines.

    Best,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Elisabetta Manduchi

    Thank you Pamela and sorry for originally posting under the incorrect topic.

    Elisabetta

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Elisabetta Manduchi,

    Thanks for writing into our forum! Yes, it would be a good idea to use your recalibration tables from the non-HLA regions for your HLA analysis. We wouldn't recommend building a BQSR recalibration model  on the HLA chromosomes because the BQSR algorithm assumes mismatches are errors. Since the HLA regions are variant dense, the BQSR model will not be appropriately built. 

    In terms of your overall analysis, we would recommend if possible to upgrade to hg38 or something newer than b37 because b37 does not have good HLA alternate contigs.

    Please let us know if you have further questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Elisabetta Manduchi

    Thank you so much for your response Genevieve, which confirms my initial intuition. If I may tap into your experience with this type of analysis, I do have a couple of related questions. 

    First, I should clarify that the reason I'm using b37 is because that is the build recommended at https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle for exome data, which is the type of data I'm working on. As mentioned, for somatic mutation in non-HLA regions I've followed the Mutect2 pipeline recommendations. For the HLA regions, I therefore wanted to use the same build. HLA-typing for the  individuals in my data was previously done elsewhere, thus, for each individual, I'm aligning to the specific HLA sequences of those individuals. I'm also trying to roughly follow the mutation calling approach of Polysolver (in particular including its pre and post mutation calling processing), but I want to use Mutect2 pipeline as close as possible to that used for the non-HLA regions hence augmenting the polysolver pre- and post-processing to align to the latter. My 2 additional questions:

    1. For the post Mutect2 filtering with FilterMutectCalls, it seems to me that (similarly to BQSR) I would use the same contamination table previously computed for these  samples in the standard pipeline for non-HLA mutation calling. Is this correct?

    2. Instead, for the -ob-priors in FilterMutantCalls on the HLA, I imagine  that I have to learn the specific model, by running first Mutect2 on the HLA regions with the --f1r2-tar-gz option, and then using LearnReadOrientationModel on the resulting f1r2.tar.gz. Is this correct?

    In short, for BQSR and contamination I would use  what previously computed for these  samples in the standard pipeline for non-HLA mutation calling, but for --ob-priors I have to specifically learn the model on the HLA.

    I'd appreciate your feedback on this. 

    Thanks again,
    Elisabetta

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Elisabetta,

    Yes, for both of your questions (1 and 2), that sounds like it would get the best results. Just a caveat though - I am not an expert, so don't take my recommendations as more important than what you see in your own research! I just know a lot about how GATK works and runs from working with our many users.

    For your hg38 vs b37 - I see why you are running b37. However, if you have hg38 interval lists available, I think it would be best to use hg38. Especially if the HLA regions are important for your analysis. b37 has much worse HLA contigs available and you will miss variants on the HLA chromosomes.

    Let me know if you have any further questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Elisabetta Manduchi

    Thanks Genevieve, I really appreciate your feedback.

    Regarding the build, just to clarify, I'm only aligning to b37 for the non-HLA analyses. For HLA I'm aligning to the type-specifc allele sequences from polysolver (I believe derived from IMGT-HLA), not to the contigs from b37.

    Thanks again!

    Elisabetta

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Oh I see! That makes sense :)

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk