Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

10 comments

  • Avatar
    David Benjamin

    The Mutect2 pipeline is already able to segment the germline allele fraction and use this to distinguish germline variants from somatic variants by their alelle fractions.  Furthermore, FilterMutectCalls goes a step further by clustering somatic allele fractions.

    For details, please refer to our documentation: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf —  section III, subsection B for the relevant GetPileupSummaries and CalculateContamination commands and the beginning of section II for the FilterMutectCalls command.

    0
    Comment actions Permalink
  • Avatar
    that girl

    thanks a lot.

    but sgz take the cnv into account which mutect2 seems not, the sgz not only considered the germline allele fraction

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Segmenting the germline allele fraction is equivalent to accounting for CNV, at least as far as tumor-only calling is concerned.  That is, if FilterMutectCalls knows the allele fraction of germline hets is 1/3 and 2/3 in some region, this tells it to be less confident of somatic calls with similar allele fractions even if it never explicitly states that the copy number is 3.

    0
    Comment actions Permalink
  • Avatar
    that girl

    germline allele fraction here means gnomad database or others?

    how the mutect2 works with gnomad database, is it if the variant appears in gnomad and it will tag germline_risk in FILTER column? or FilterMutectCalls works with gnomad in other ways?

     

    by the way, gnomad updates a lot, has gatk updates the file used gnomad

     

    thanks a lot

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    "Allele frequency" refers to the prevalence of an alternate allele in a population and is used for FilterMutectCalls' statistical model for germline filtering (it is much more sophisticated than hard filtering any variant in gnomAD).  "Allele fraction" refers to the prevalence of an allele in a sample's DNA and is used for CNV segmentation.  Please refer to our documentation for more details.

    We have not updated the gnomAD version of our official germline resource recently.

    0
    Comment actions Permalink
  • Avatar
    that girl

    "Allele fraction" refers to the prevalence of an allele in a sample's DNA and is used for CNV segmentation.

    here sample means what? population database gnomad? or anything else, in fact I only see gnomad used in gatk.

     

    I know it maybe not just hard filtering, but it is really hard for us who were developers to read the orign code in java, so can you give the main points of how it does soft filter, I think many people are interested in this

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    "Sample" means a single extraction of DNA that was sequenced.  Usually but always one BAM file contains one sample.  For example, if we have blood normal DNA, the primary tumor, and a metastasis, that's three samples.  Mutect2 can use any -germline-resource VCF that contains an AF field, but there's no good reason not to use gnomAD for human samples.

    ". . .so can you give the main points of how it does soft filter, I think many people are interested in this. . ."

    This is described in our documentation: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf.

    0
    Comment actions Permalink
  • Avatar
    that girl

    thanks a lot.

    I think most people can not hanle the confused math formula easily. so let me put it in another way. 

    in tumor-only mode, if use gnomad database in gatk, will it get the same result as FoundationOne CDx which use sgz.  have you ever test this, because FoundationOne CDx  is a FDA approved, I think the variants there maybe a golded standard.

     

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Results will be different because while both account for aneuploidy they are different methods.  I don't think they will be all that different because SGZ appears to be mathematically sound.

    0
    Comment actions Permalink
  • Avatar
    that girl

    thanks a lot

    of course sgz method is sound, it has been fda approved. and I think gatk developes should think consider make the soft filtering more readily comprehensible for us common users, most of us are not statistician, but eager for a vivid explaination of soft filtering, we do not want gatk to be a black box

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk