Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Creating Panel of Normals (PoN) from Blood Sample of cancer patients

Answered
1

9 comments

  • Avatar
    Philipp Hähnel

    Dear Ranjan,

    you can use all blood samples, provided that they were sequenced on the same sequencing platform. The tumor samples should also be sequenced on that platform as the Panel of Normals is used to filter sequencing artifacts from the variant calls, which are platform-specific. 

    You can also combine the vcf from your panel of normals with the one provided in the best practices (liked is exome, but if you have WGS data, you will want to use the genome PoN) as the latter one contains plenty of artifacts that occur across platforms. 

    Best,

    Philipp

    1
    Comment actions Permalink
  • Avatar
    Anthony DiCi

    Hi Ranjan,

    Thank you for writing to the GATK Community Forum. We hope we can clarify your question.

    When creating your Panel of Normals, please reference the selection criteria in the technical documentation linked below. 

    1. Panel of Normals

    Philipp Hähnel, thank you for your note on sequencing platform standardization. Sounds good!

    Please let us know if you have any further questions. 

    Best,

    Anthony

    1
    Comment actions Permalink
  • Avatar
    Ranjan J. Sarma

    Anthony DiCi Thank you for the essential suggestions.

    Philipp Hähnel Thank you for the quick response. I missed mentioning that 37 such samples were sequenced (Whole Exome) in Illumina HiSeq 4000 and 21 new samples were sequenced (Whole Exome) (WES) in Illumina NovaSeq 6000.

    0
    Comment actions Permalink
  • Avatar
    Philipp Hähnel

    Then you will want to create two panel of normals, one for each of those two subsets. If you want to be most accurate about the variant calling step, then use the respective PoN of the matching platform for each tumor sample. But you should also be fine with just combining the two PoNs as that is essentially what the provided best practice mutect2 PoN is. 

    A gatk dev could maybe second that opinion.

    0
    Comment actions Permalink
  • Avatar
    Ranjan J. Sarma

    Thank you! I will try both ways and compare.

    0
    Comment actions Permalink
  • Avatar
    Ranjan J. Sarma

    Philipp Hähnel

    So, I have another doubt. Suppose,  I run gatk Mutect2 using Tumor and matched normal. In that case, Can I use the Adjacent normal tissue samples as matched normal and PoN constructed from VCFs generated using HaplotypeCaller on blood samples ?

    0
    Comment actions Permalink
  • Avatar
    Philipp Hähnel

    Ranjan, 

    This is almost correct. You'd run Mutect2 in paired tumor-normal mode with the normal sample from the adjacent normal tissue. Using the matched normal sample will filter rare germline variants that are not in gnomad (I recommend also using the gnomad resource as a germline resource to filter common SNPs).

    The PoN is created by running Mutect2 on the blood samples in tumor-only mode (no matched normal), without filtering. Creating the panel also filters or annotates germline variants in gnomad if you provide it with that resource, so that the panel should only end up with pure sequencing artifacts.

    Feel free to use my workflows for that, which are based on gatk best practices, but a little bit more up to date and optimized for compute resources. 

    Best, Philipp

    0
    Comment actions Permalink
  • Avatar
    Ranjan J. Sarma

    Thank you for clearing my doubts. I will definitely go through the workflow now.

    0
    Comment actions Permalink
  • Avatar
    Anthony DiCi

    Philipp Hähnel, thank you for your valued expertise and guidance. We appreciate our GATK Guides!

    Ranjan J. Sarma, I'm glad to hear that Phillip's suggestions have helped make things more clear.

    I wanted to follow up on my last response with an additional comment, we do recommend the minimum size for a PoN to be at least 40 samples.

    Please feel free to reach out any time with any additional questions!

    Best,

    Anthony

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk