Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

How do I prepare pon from my own data?

Answered
0

8 comments

  • Avatar
    Brian Haas

    Hi,

    The process for creating your own PoN is:

    To make your own PoN:

    You will need at least 40 normals to pass into the initial step, but the command structure for all three steps is given below.

     

    1) Run Mutect2 in tumor-only mode on each normal BAM individually,

    gatk Mutect2 -R reference.fasta -I normal1.bam --max-mnp-distance 0 -O normal1.vcf.gz 
    gatk Mutect2 -R reference.fasta -I normal2.bam --max-mnp-distance 0 -O normal2.vcf.gz 
    ...
    gatk Mutect2 -R reference.fasta -I normal40.bam --max-mnp-distance 0 -O normal40.vcf.gz 
     

    2) Create a GenomicsDB from the normal Mutect2 calls,

    gatk GenomicsDBImport -R reference.fasta -L intervals.interval_list \
      --genomicsdb-workspace-path pon_db \
      -V normal1.vcf.gz \
      -V normal2.vcf.gz \
      ...
      -V normal40.vcf.gz
     

    3) and then Combine the normal calls using CreateSomaticPanelOfNormals.

    !gatk CreateSomaticPanelOfNormals -R reference.fasta \ --germline-resource af-only-gnomad.vcf.gz \ -V gendb://pon_db \ -O pon.vcf.gz



    Wrt disk space, I don't see a way around this.  If you're doing this on the cloud, you can increase your space allocation pretty easily.

    Also, I don't think you're going to want to merge your PoN with other available PoNs.  The only purpose of the PoN is to weed out likely artifacts that are specific to your library prep and sequencing.  If you decide to merge different PoNs to effectively exclude more sites, you might curate the PoN to be sure that you're not removing sites that are of particular interest inadvertently. That's the only downside... removing sensitivity for specific sites in the PoN.

    0
    Comment actions Permalink
  • Avatar
    Tingwen Chen

    Hi Brain, 

    Thank you for your reply.

    Here comes another question, I'm following the instruction from here: https://gatk.broadinstitute.org/hc/en-us/articles/360046224491-CreateSomaticPanelOfNormals-BETA-

    The third step on that webpage doesn't have the argument "--germline-resource af-only-gnomad.vcf.gz". Should I add it?

     

    Tingwen 

     

    0
    Comment actions Permalink
  • Avatar
    Brian Haas

    Yes, I believe that should be added.

     

    The official PoN workflow commands are here:

    https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect2_pon.wdl

     

    I'll put in a request that the documentation gets updated.

    0
    Comment actions Permalink
  • Avatar
    Tingwen Chen

    OK. 

    I'll added it to my analysis workflow. 

    Thank you. 

    0
    Comment actions Permalink
  • Avatar
    Ian Yi-Feng Chang

    If I have paired tumor and PBMC samples, can I use the PBMC to create PON, and using the PON to analyze the paired tumor and PBMC samples?

     

    0
    Comment actions Permalink
  • Avatar
    Mark Fleharty

    Yes, PBMCs are good for constructing PoNs.

    0
    Comment actions Permalink
  • Avatar
    mina ming

    Sorry I only have tumour samples not any matched normal

    Can I still create PON with my tumour samples?

    0
    Comment actions Permalink
  • Avatar
    Mark Fleharty

    You can still create a PoN with tumor samples, but you run the risk of constructing a PoN that will filter common driver events in your panel.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk