Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

documentation for Mutect2 theory

Answered
0

4 comments

  • Avatar
    Andrew Uzilov

    On a related note, the table in the section "B. Hard Filters" doesn't seem to match my Mutect2 output VCFs in GATK v4.1.8.0, here are some examples:

    "fragment_length" is called "fragment" in VCF -- same thing?

    there is no "duplicate_evidence" in my VCF... is it same as "duplicates" in whitepaper?

    "base_quality" should be "base_qual" ?

    and so forth.

    So I am wondering if the whitepaper is up to date or if something is deviant about my workflow.

    1
    Comment actions Permalink
  • Avatar
    Mark Fleharty

    The whitepaper is the most up to date documentation on the theory used for M2.  We have not made significant changes to the theory behind M2 in the last 10 months.

    Could you be more specific about which variables you are trying to understand?  Perhaps we need to make the documentation clearer.

    I recommend using this Terra workspace:

    https://app.terra.bio/#workspaces/help-gatk/Somatic-SNVs-Indels-GATK4

    0
    Comment actions Permalink
  • Avatar
    Andrew Uzilov

    It would be really useful if the whitepaper had a table giving the names of all the INFO and FORMAT fields in the Mutect2 post-filter VCF cross-referenced to the name of the variables in the whitepaper.  But the specific fields in which I am interested are, for Mutect2 in GATK v4.1.8.0:

    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">

    ##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination">
    ##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
    ##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
    ##INFO=<ID=SEQQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not sequencing errors">
    ##INFO=<ID=STRANDQ,Number=1,Type=Integer,Description="Phred-scaled quality of strand bias artifact">
    ##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
    ##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">

    0
    Comment actions Permalink
  • Avatar
    Mark Fleharty

    I agree,  it seems the white paper. needs to be updated.

    I've created a github ticket at:

    https://github.com/broadinstitute/gatk/issues/6965

    Please feel free to add to it if I didn't capture everything.

     

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk