Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 somatic mutation filtering

0

4 comments

  • Avatar
    Gökalp Çelik

    Hi microbiome

    FilterMutectCalls uses those exact criteria that you proposed to use for filtering Mutect2 calls. Mutect2 is quite naive in terms of calling any variation as a variant but provides a detailed list of parameters along with each variant provided with a proper modeling, PoN and Normals those variants are tagged with various filters to indicate why they are not called as PASSing variants. Depending on the study you have it may be normal to have that many variants called as PASS. 

    Can you provide more details about your samples. Are they whole genome sequencing data or panel data?

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    microbiome

    Hello Gökalp Çelik,

    Thank you so much for your response and clarification! To answer your question, my data is whole genome sequencing (WGS). The samples I am working with are from tumor tissues, and I am comparing variants before and after treatment. Since the sample size is large and the number of variants that passed the filters is still quite high, I wanted to make sure that I am using the right criteria for filtering.

    Would you recommend any specific thresholds for parameters like DP, ECNT, GERMQ, MBQ, and MMQ in the context of WGS? Or should I consider any other post-filtering strategies specific to WGS data to reduce the number of false positives?

    I appreciate any further insights or recommendations you might have!

    Best regards.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi microbiome

    Since your samples are whole genome sequencing data it is normal to have so many calls. You may be able to limit the number of calls to regions close to coding segments so that you can minimize the number of variants to deal with especially those that can be functionally annotated. 

    Other than that we don't have definitive numbers for the parameters that you indicated. If your samples are pre and post treatment data you may be able to use pre and post treatment data as your pseudo-normals in separate runs and try to find out differences between calls in both states. It is not a simple task to completely delineate all variants however it may be a good start. 

    I hope this helps.

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    microbiome

    Hi Gökalp Çelik,

    Thank you for the explanation! I’ll try focusing on coding regions and using the pre-treatment data as pseudo-normals to narrow down the variants.

    I appreciate the advice!

    Best regards.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk