Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 output - resources?

0

4 comments

  • Avatar
    Jason Cerrato

    Hi Mia,

    Thanks for writing in. Is there anything specific you are curious about that isn't answered by the Mutect2 GATK article, or the articles that link out from that page regarding specific tasks?

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

     

    Thanks. I have come across that page, but I have not found answers to my questions. 

    1) I am looking to understand structure of various output files - e.g. basic folders generated are 'call-Filter', 'call-Funcotate', 'call-LearnReadRation','call-M2','call-MergeStats', 'call-MergeVCFs', 'callSplitIntervals'

    1a) What sorts of outputs do these folders contain? Each has multiple files, but I cannot find guidelines as to what these are. 

    1b) What are the differences between different vcf/maf files produced across these folders in terms of different numbers of mutations which they contain (e.g. filtered.annotated.maf and filtered.vcf) ? 

    1c) Some files have column names that are abbreviations (e.g. file filtering_stats in call-Filter folder) - how can I find out what these columns mean? 

    2)  How does flagging and filtering of the VCF files work, in terms of marking a mutation as 'PASS' or anything else (e.g. germline etc) ? Which mutations are also Funcotated (this might be relating to differences outlined in question 1b)? 

    Thanks,

    Mia

     

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    This paper may help in answering some of your questions: https://www.biorxiv.org/content/10.1101/861054v1.full.pdf

    Here is some information about Somatic short discovery more generally: https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-

    A member of the GATK team will be reviewing your questions to fill in any perceived gaps that these two documents don't fill, but feel free to take a look at them and come back with any lingering questions.

    Kind regards,

    Jason

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi MPetlj, Many of the different folders are different steps of the workflow. This article goes over all of the steps performed: https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-

    And we also have a funcotator tutorial which may help understand these outputs as well: https://gatk.broadinstitute.org/hc/en-us/articles/360035889931-Funcotator-Information-and-Tutorial

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk