Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Haplotype calls deletion followed by insertion instead of indel

0

2 comments

  • Avatar
    Louis Bergelson

    Unless I'm misunderstanding, I think this is essentially a representation issue choosing between several different essentially equivalent variants.  As you say, if HaplotypeCaller's scoring indicated that 4 adjacent SNPs are more likely than a deletion + insertion than it would potentially be written out that way (either as 4 different variants or 1 single MNP depending on the exact details and parameters.). There are many different possible ways to represent the same variants in a VCF and we do our best to output a coherent one but we may not always pick the ideal version. 

    What James is saying about local realignment, is that many tools that you would use downstream to compare variants against each other will not look at the vcf's representation in isolation but will attempt to realign them in order to produce a representation which is less representation dependent.

    If your downstream analysis needs your variants to be expressed in a specific format I would recommend using a standardization tool which can put it into the format you expect.  I don't have a great suggestion for what tools are available but I'm sure there are a number that produce different formats. 

    0
    Comment actions Permalink
  • Avatar
    Rolf Schröder

    Hey Louis,

    thanks a lot. This does help a lot for my understanding!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk