Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

af-only-gnomad.hg38.vcf.gz details

0

8 comments

  • Avatar
    lmose

    Additionally, there are non_cancer allele counts in the Gnomad files which I believe exclude cancer samples.  Do we know if these samples are included in the af-only-gnomad.hg38.vcf.gz AF?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi lmose,

    Please see this Mutect2 FAQ, there is information about the af-only-gnomad.hg38.vcf.gz file in question 19. There are also many discussions on the forum regarding that file, please see those. For example, this one: https://gatk.broadinstitute.org/hc/en-us/community/posts/360058276951-Which-file-is-af-only-gnomad-hg38-vcf-gz-

    We are not the group who creates the gnomad files so we only have specific information about how the file was modified for GATK use.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    lmose

    Hi Genevieve,

    Thanks for responding.  Yes, I spent time looking at the various threads including the one you mentioned.  I could not find the specific details in any of them or in the FAQ.  My apologies if I'm missing something.

    My questions above are specifically about the details of how this file was created.  The details of the original Gnomad files are already well documented by the Gnomad team.

    To clarify, from what I understand, af-only-gnomad.hg38.vcf.gz appears to have re-computed AF values from the Gnomad exomes and genomes (from some version of Gnomad 2).  It would be good to understand exactly how these values were re-computed and exactly which Gnomad version was used.

    Thanks again.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I don't believe that the AF values have been recomputed, the only changes according to our documentation are removing the INFO lines.

    If this is not the case, please provide more information so that I can thoroughly look into it.

    0
    Comment actions Permalink
  • Avatar
    lmose

    The Gnomad datasets are provided for exomes and genomes separately with AF computed separately for each.  My understanding is that af-only-gnomad.hg38.vcf.gz includes both genome and exome data and so this would require either recomputing the AF values, or just picking one from either the exome or the genome datasets.  It would be good to have clarification on what was done either way.

    Also, there are multiple versions of Gnomad v2.  It would be good to know which one is used here.  The latest is v2.1.1.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi lmose,

    We don't have this information currently available but I can submit a documentation request.

    Our first priority is resolving questions about GATK tool-specific errors and abnormal results from the tools. For more information, you can view our support policy. We are not able to guarantee a solution for this request. If other community members know this answer and can help out, please do so!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi lmose,

    I found a WDL script that is used by our developers to make the Mutect2 resources. You can view it for more detailed information about how the file is made.

    https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect_resources.wdl

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    lmose

    Thanks.  That script appears to just remove the non-AF fields.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk