Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Panel of Normals Documentation

Answered
1

16 comments

  • Avatar
    David Benjamin

    Ryan Gimple "Mutect2-WGS-panel-b37.vcf" and "Mutect2-exome-panel.vcf" are a whole-genome and exome panel, respectively, each generated from several hundred normals sequenced with standard Broad Genomocis Platform protocols 4-5 years ago.  Because most errors caught by the panel of normals are mapping artifacts these are still useful despite changes in sequencing technology.  "1000g_pon.hg38.vcf" is an hg38 panel of normals for both exomes and whole genomes generated from 1000 Genomes Project samples.  Finally, "af-only-gnomad.hg38.vcf" is a copy of the gnomAD VCF stripped of all unnecessary INFO fields.  It is used for the -germline-resource argument.

    1
    Comment actions Permalink
  • Avatar
    Ryan Gimple

    Thanks for the information! I am using hg38 for my analyses. To be able to use the "Mutect2-WGS-panel-b37.vcf" file for my pipeline, do I need to perform a liftover step to hg38, or does this file already exist somewhere in either of the resource bundles? Or should I only use the "1000g_pon.hg38.vcf" file?

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    You should use the 1000 Genomes hg38 panel.  Since hg38 is superior to hg19 and has fewer alignment artifacts, lifting-over an hg19 panel would mean lifting over mapping artifacts that don't exist in hg38.  I'm sure there are other reasons, too.

    0
    Comment actions Permalink
  • Avatar
    Jenifer

    Hi @David Benjamin!

    I am trying to use Mutect2-exome-panel.vcf, from somatic-b37 directory, but I have a doubt:

    As you said: "Mutect2-WGS-panel-b37.vcf" and "Mutect2-exome-panel.vcf" are a whole-genome and exome panel, respectively, each generated from several hundred normals", but when I read the files, I see that INFO field says SOMATIC for each variant existing there. Why are they classified as somatic? Shouldn't they be errors?

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    This is a relic of how those panels were generated.  Mutect2 does not do anything with INFO fields in the panel of normals, so it won't affect variant calls.

    0
    Comment actions Permalink
  • Avatar
    Felix

    Hi David Benjamin,

    I saw that the 1000g_pon.hg38.vcf.gz file is almost three years old. Is it still safe to use it with the current version of MuTect2?

    Best,

    Felix

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Yes, it is safe to use with the current version of Mutect2.

    0
    Comment actions Permalink
  • Avatar
    Felix

    Thank you, David!

    0
    Comment actions Permalink
  • Avatar
    Jose Camacho

    Hi!

    I have 2 questions about the germline resources files. I want to run Mutect2 for my exome samples (tumor and normal, aligned to hg19 reference) and, after a search in the Google cloud files (https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-b37%2F;tab=objects?prefix=&forceOnObjectsSortingFiltering=false), I found these 2 files:

    af-only-gnomad.raw.sites.vcf

     Mutect2-exome-panel.vcf

    My question is, which one should I use? My second question is that, given that I am working with hg19, will these b-37 files work for me? Or should I do a liftover?

    Thank you very much in advance!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi, please note that your question was posted while the GATK Team was Out of Office

    Please repost any outstanding GATK issues and we will get to them if possible. Our first priority is solving GATK issues and abnormal results, see our support policy for more details.

    0
    Comment actions Permalink
  • Avatar
    Alex Blain

    Following on from Ryan's question:

    I currently have two cohorts I am running exome data for, one is a UK cohort and the other is from Africa. The UK cohort has ~60 samples with a matched normal, but after creating my panel of normals the tumour only samples still have several hundred variants called even after stringent downstream filtering based on read depth, number of variant reads etc. Conversely, I only have eight matched normal samples for the African cohort, definitely not enough to create a robust PON.

    Because of this, I was wanting to use the GATK Mutect2-exome-panel.vcf to see if this would aid my analysis, my question is does the Mutect2-exome-panel.vcf contain sequencing artefacts only, or does it also contain SNPs like a normal PON would? I'm a bit worried that applying this PON to my cohorts may be inappropriate due to the differences in SNPs between populations, assuming this PON was generated using a US population.

    Thank you in advance for your help.

    Alex

    0
    Comment actions Permalink
  • Avatar
    ming hu

    Hi, David Benjamin

      When I open files titled "1000g_pon.hg38.vcf", I found that 'filtering_status=Warning: unfiltered Mutect 2 calls. Please run FilterMutectCalls to remove false positives.' and 'tumor_sample=HG02775'.  Is "1000g_pon.hg38.vcf" an hg38 panel of normals or vcf file of one normal sample (HG02775)?

     Thanks for your consideration. I look forward to hearing from you.

    Best,

    Ming

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Alex Blain,

    Yes, the Mutect2-exome-panel.vcf also contains SNPs like a typical panel of normals. In regards to applying this PON to the cohort from a different region, this should work fine. Here is a resource about the improvements to the genome files making them more representative of regional haplotype differences:

    https://gatk.broadinstitute.org/hc/en-us/articles/360035890951-Human-genome-reference-builds-GRCh38-or-hg38-b37-hg19 

    I hope this answered your question.

    Best,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi ming hu,

    The 1000g_pon.hg38.vcf file is an hg38 panel of normals file not just a vcf for one sample. I hope this helps.

    Best, 

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Yannick Gansemans

    Hi GATK team,

    Like some users above, I am using the 1000g_pon.hg38.vcf  as a panel of normals for tumor samples where I do not have a 'normal' sample to analyse with mutect2.  It would be very useful to have some more background information for this panel:

    * Is this a whole genome or whole exome panel?

    * Data from how many individuals was used to build it?

    * If this panel would be derived from a whole exome variant calling, what is the effect when using it with whole genome tumor samples? Do you loose any variants outside the exomes or does mutect2 just assumes the panel has no mutations outside the exomes?

    Thank you for dealing with my questions!

    Greetings,

    Yannick

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Yannick Gansemans,

    The 1000g_pon.hg38.vcf file is from the 1000 Genomes Project samples, which you can read more about here: https://www.internationalgenome.org/. We personally did not collect the data, but the data is publicly available.

    It's from WGS not WES which is why the panel of normals can be used with both WES and WGS somatic analysis.

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk