Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Slippage filter and multi-tool indel consensus calling

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Luka Culibrk

    We have a quite thorough documentation about Mutect2 and FilterMutectCalls in the link below. 

    https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf 

    From this document Slippage filter is explained as follows.

    For indels in short tandem repeats (STRs) FilterMutectCalls uses a simple model 
    for the possibility that alt reads are due to polymerase slippage.  The prior 
    $\pi_L$ for a real variant of length 'L' comes from the allele fraction clustering
    model. FilterMutectCalls assumes that polymerase slippage only occurs in STRs of 
    8 bases or more and only results in insertions or deletions of a single repeat unit.
    The likelihood of 'a' alt reads out of 'd' total reads in the case of a real somatic
    variant is given by the allele fraction clustering model.  The likelihood in the case 
    of polymerase slippage is the marginal of binomial likelihoods over a slippage rate 
    with a uniform prior from 0 to 0.1, which is a regularized Beta function.  
    Given priors and likelihoods, the error probability follows.

    There are 2 parameters set for this purpose under FilterMutectCalls 

    --min-slippage-length <Integer>

    Minimum number of reference bases in an STR to suspect polymerase slippage  
    Default value:8.

    and

    --pcr-slippage-rate <Double>  
    The frequency of polymerase slippage in contexts where it is suspected  
    Default value:0.1.

    You should be able to adjust these values to get your filters accordingly to fit a known set of truth however you may need to pay attention that some of those variants found within databases might already be filtered as well in the original data therefore the risk is up-to  you to depend on findings of other variant callers vs Mutect2. 

    To answer the question whether we have any validation data for this particular filter, short answer is no. Long answer, Mutect2 bioarxiv paper in the link below indicates the performance of Mutect2 and FilterMutectCalls for SNVs and INDELs and may be a source of reference for both tools' performance metrics. 

    https://www.biorxiv.org/content/10.1101/861054v1.full.pdf 

    I hope this helps. 

    1
    Comment actions Permalink
  • Avatar
    Luka Culibrk

    Hi Gökalp Çelik, thank you for the information. I previously did read the documentation on this, I was moreso hoping for details for these classes of indels that I'm concerned about in this case. In our case, it does appear that Mutect2 is capturing a more accurate picture of the biological context of our data compared to a consensus-intersect strategy, I was simply wondering if there were more information to validate this specific filter, specifically in the context of potential false negatives that it might introduce. In this regard, I believe you have answered my question however, so thanks! Also, thank you for the preprint link.

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Homopolymer errors are the major source of INDEL errors and also their usage for microsatellite instability is quite delicate matter and without a proper normal it is almost impossible to tell if there is really a variant there or what we observe is simply the errors caused by PCR and/or sequencing technology we have. 

    If you sequence germline samples with PCR positive sample preparation you will almost always observe an indel in one or more of those homopolymer regions with quite low allele fraction therefore their value as positive variants is lower than what other variants are. 

    One way to make sure if there is such a variant really present would be to run Mutect2 with matched tumor-normal data prepared similarly. 

    I hope this helps. 

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk