Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Creating Panel of Normals (PoN) from Blood Sample of cancer patients

Answered
1

14 comments

  • Avatar
    Philipp Hähnel

    Dear Ranjan,

    you can use all blood samples, provided that they were sequenced on the same sequencing platform. The tumor samples should also be sequenced on that platform as the Panel of Normals is used to filter sequencing artifacts from the variant calls, which are platform-specific. 

    You can also combine the vcf from your panel of normals with the one provided in the best practices (liked is exome, but if you have WGS data, you will want to use the genome PoN) as the latter one contains plenty of artifacts that occur across platforms. 

    Best,

    Philipp

    1
    Comment actions Permalink
  • Avatar
    Anthony DiCi

    Hi Ranjan,

    Thank you for writing to the GATK Community Forum. We hope we can clarify your question.

    When creating your Panel of Normals, please reference the selection criteria in the technical documentation linked below. 

    1. Panel of Normals

    Philipp Hähnel, thank you for your note on sequencing platform standardization. Sounds good!

    Please let us know if you have any further questions. 

    Best,

    Anthony

    1
    Comment actions Permalink
  • Avatar
    Ranjan J. Sarma

    Anthony DiCi Thank you for the essential suggestions.

    Philipp Hähnel Thank you for the quick response. I missed mentioning that 37 such samples were sequenced (Whole Exome) in Illumina HiSeq 4000 and 21 new samples were sequenced (Whole Exome) (WES) in Illumina NovaSeq 6000.

    0
    Comment actions Permalink
  • Avatar
    Philipp Hähnel

    Then you will want to create two panel of normals, one for each of those two subsets. If you want to be most accurate about the variant calling step, then use the respective PoN of the matching platform for each tumor sample. But you should also be fine with just combining the two PoNs as that is essentially what the provided best practice mutect2 PoN is. 

    A gatk dev could maybe second that opinion.

    0
    Comment actions Permalink
  • Avatar
    Ranjan J. Sarma

    Thank you! I will try both ways and compare.

    0
    Comment actions Permalink
  • Avatar
    Ranjan J. Sarma

    Philipp Hähnel

    So, I have another doubt. Suppose,  I run gatk Mutect2 using Tumor and matched normal. In that case, Can I use the Adjacent normal tissue samples as matched normal and PoN constructed from VCFs generated using HaplotypeCaller on blood samples ?

    0
    Comment actions Permalink
  • Avatar
    Philipp Hähnel

    Ranjan, 

    This is almost correct. You'd run Mutect2 in paired tumor-normal mode with the normal sample from the adjacent normal tissue. Using the matched normal sample will filter rare germline variants that are not in gnomad (I recommend also using the gnomad resource as a germline resource to filter common SNPs).

    The PoN is created by running Mutect2 on the blood samples in tumor-only mode (no matched normal), without filtering. Creating the panel also filters or annotates germline variants in gnomad if you provide it with that resource, so that the panel should only end up with pure sequencing artifacts.

    Feel free to use my workflows for that, which are based on gatk best practices, but a little bit more up to date and optimized for compute resources. 

    Best, Philipp

    0
    Comment actions Permalink
  • Avatar
    Ranjan J. Sarma

    Thank you for clearing my doubts. I will definitely go through the workflow now.

    0
    Comment actions Permalink
  • Avatar
    Anthony DiCi

    Philipp Hähnel, thank you for your valued expertise and guidance. We appreciate our GATK Guides!

    Ranjan J. Sarma, I'm glad to hear that Phillip's suggestions have helped make things more clear.

    I wanted to follow up on my last response with an additional comment, we do recommend the minimum size for a PoN to be at least 40 samples.

    Please feel free to reach out any time with any additional questions!

    Best,

    Anthony

     

    0
    Comment actions Permalink
  • Avatar
    Garima Thakur

    Hello,

    I am trying to create Panel of normals to work with Somatic-CNVs-GATK4 workflow in future. But I only have blood sample from the patients. 

    I am not sure how to proceed with that, as I don't have any normal sample. 

    Should I use HaplotypeCaller with blood samples? 

     

    0
    Comment actions Permalink
  • Avatar
    Garima Thakur

    Philipp Hähnel Would really appreciate your inputs for this query:

    The following lines are confusing about PON from the description here:

    (1) they are made from normal samples (in this context, "normal" means derived from healthy tissue

    Normals are typically derived from blood samples.

    I want to create PON for CNV calling in Somatic samples (Lung Cancer cohort) and would like to know if the sample type should be derived from the blood or tissue of normal healthy individuals. In another section tissue sample was not recommended to be used for PON.

    (1) Can I use the Negative samples from the Lung Cancer cohort which are not positive for CNVs

    (2) Can you please confirm if the blood sample type can be used from healthy individuals for the 40 samples criteria of PON for CNV?

    (3) To make up to 40 samples, Can the combination of both (1) and (2) be used for CNV PON by running the samples in a single run on the same sequencing platform? 

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    All the samples in the panel must be sequenced with the same technology (sequencing platform, target capture, library prep etc) as each other and as the somatic samples for CNV calling.  It is fine to mix different types blood and healthy tissue samples in the panel provided that they are all non-somatic and the different types of DNA extraction do not greatly affect the sequencing.  For example, blood and healthy fresh tissue should not be a problem, but blood and FFPE samples, even if sequenced on the same platform, might differ too much due to the sample degradation in FFPE.

    Mixing your blood samples and negative samples to reach 40 samples for your panel is probably fine.

    0
    Comment actions Permalink
  • Avatar
    Garima Thakur

    David Benjamin Thank you for the quick response and for clarifying that I can use blood samples + clear negative non-somatic samples to create the PON for CNV calling.

    1) The targeted Lung Cancer panel I am setting up the PON for has 12 genes in it. Is there such a minimum number of genes in panel criteria to call out CNVs correctly?

    2) There are no genes on X or Y chromosome in this panel. As gender-based sample selection makes sense only to call out CNVs correctly on these chromosomes, I assume selecting both male and female samples (for the blood+negative batch) will not affect the PON or CNV calling in any way. Please correct me if I am wrong.

    3) What would the criteria be for selecting individuals to take blood samples from? Can the 40 individuals be those who are supposedly normals without any known diagnosis of cancer?

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Hi Garima Thakur,

    1) 12 genes is a rather small panel---how many genomic bins does the panel yield? The segmentation methods in the somatic CNV pipeline are intended to be used in scenarios in which copy number events span many adjacent genomic bins.

    2) Correct, I think you can safely assume that patterns of sequencing bias and noise will be relatively identical on the autosomal chromosomes across both male and female samples.

    3) Yes, it would be fine to take the samples from healthy individuals. The primary goal of the PoN is to capture the same patterns of sequencing noise and bias that are present in your case samples, but unobscured by any confounding copy-number activity. So any relatively quiet samples should suffice.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk