Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutations in contiguous sites within a sample

0

4 comments

  • Avatar
    Genevieve Brandt (she/her)

    Thank you for your post, ashgorden! I want to let you know we have received your question and will be moving it to the Community Discussions -> General Discussion topic, as the Somatic topic is for reporting bugs and issues with GATK.

    We'll get back to you if we have any updates or follow up questions. Please see our Support Policy for more details about how we prioritize responding to questions. 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi ashgorden,

    The reason that Mutect2 does not combine variants adjacent to each other is that most users do not prefer to have MNPs in VCF files. MNPs are a lot less commonly handled for downstream analysis and make the analysis more difficult. So, we keep these records separate unless you change the --max-mnp-distance.

    The output you have shown here looks standard and fine to us, we don't see problems with the format of these variant sites.

    Let us know if you have any other questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Philipp Hähnel

    Dear Genevieve and ashgorden,

    I'm following this question since I was also curious about what's going on. Mutect2 DOES combine adjacent single nucleotide substitutions as --max-mnp-distance 1 by default. I think the output above requires a bit more detailed explanation:

    Most of the cases above, e.g. M54, are a consequence of the irreducible representation of indels that is chosen by Mutect2 as either [N>N...N] or [N...N>N] plus some adjacent MNV. They can in principle be combined if the phasings (0|1 vs 1|0) of the indel and the MNV are the same. 

    The variants M35 chr1 have different phasing information, so they represent two distinct variants and can not be combined.

    Cases where there is an insertion plus an adjacent deletions represent two distinct variants as there are just a number of reads supporting each of those. 

    My guess for why cases like M46 chr14 are not combined is that the phasing is uncertain (0/1), so that the first variant could be on one chromosome and the second on the other, leading to two distinct variants. They could also be the same and be an actual MNV if the phasing coincides, but we just don't know.

    This is how I understand the annotations.

    Best,

    Philipp

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you Philipp for taking the time and writing out this explanation! Complex sites like these are hard to represent.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk