Creating Panel of Normals (PoN) from Blood Sample of cancer patients
AnsweredHello,
I am investigating somatic mutation in stomach cancer from 31 patients. I have the patients Tumor and adjacent normal tissues. Moreover, I have their blood samples including other patients' (Stomach cancer from sample population) totaling it to 58 patients. I have the whole exome sequencing data of all the samples.
In that case can I use all the blood samples to create the Panel of Normals?
-
Dear Ranjan,
you can use all blood samples, provided that they were sequenced on the same sequencing platform. The tumor samples should also be sequenced on that platform as the Panel of Normals is used to filter sequencing artifacts from the variant calls, which are platform-specific.
You can also combine the vcf from your panel of normals with the one provided in the best practices (liked is exome, but if you have WGS data, you will want to use the genome PoN) as the latter one contains plenty of artifacts that occur across platforms.
Best,
Philipp
-
Hi Ranjan,
Thank you for writing to the GATK Community Forum. We hope we can clarify your question.
When creating your Panel of Normals, please reference the selection criteria in the technical documentation linked below.
Philipp Hähnel, thank you for your note on sequencing platform standardization. Sounds good!
Please let us know if you have any further questions.
Best,
Anthony
-
Anthony DiCi Thank you for the essential suggestions.
Philipp Hähnel Thank you for the quick response. I missed mentioning that 37 such samples were sequenced (Whole Exome) in Illumina HiSeq 4000 and 21 new samples were sequenced (Whole Exome) (WES) in Illumina NovaSeq 6000.
-
Then you will want to create two panel of normals, one for each of those two subsets. If you want to be most accurate about the variant calling step, then use the respective PoN of the matching platform for each tumor sample. But you should also be fine with just combining the two PoNs as that is essentially what the provided best practice mutect2 PoN is.
A gatk dev could maybe second that opinion.
-
Thank you! I will try both ways and compare.
-
So, I have another doubt. Suppose, I run gatk Mutect2 using Tumor and matched normal. In that case, Can I use the Adjacent normal tissue samples as matched normal and PoN constructed from VCFs generated using HaplotypeCaller on blood samples ?
-
Ranjan,
This is almost correct. You'd run Mutect2 in paired tumor-normal mode with the normal sample from the adjacent normal tissue. Using the matched normal sample will filter rare germline variants that are not in gnomad (I recommend also using the gnomad resource as a germline resource to filter common SNPs).
The PoN is created by running Mutect2 on the blood samples in tumor-only mode (no matched normal), without filtering. Creating the panel also filters or annotates germline variants in gnomad if you provide it with that resource, so that the panel should only end up with pure sequencing artifacts.
Feel free to use my workflows for that, which are based on gatk best practices, but a little bit more up to date and optimized for compute resources.
Best, Philipp
-
Thank you for clearing my doubts. I will definitely go through the workflow now.
-
Philipp Hähnel, thank you for your valued expertise and guidance. We appreciate our GATK Guides!
Ranjan J. Sarma, I'm glad to hear that Phillip's suggestions have helped make things more clear.
I wanted to follow up on my last response with an additional comment, we do recommend the minimum size for a PoN to be at least 40 samples.
Please feel free to reach out any time with any additional questions!
Best,
Anthony
-
Hello,
I am trying to create Panel of normals to work with Somatic-CNVs-GATK4 workflow in future. But I only have blood sample from the patients.
I am not sure how to proceed with that, as I don't have any normal sample.
Should I use HaplotypeCaller with blood samples?
-
Philipp Hähnel Would really appreciate your inputs for this query:
The following lines are confusing about PON from the description here:
(1) they are made from normal samples (in this context, "normal" means derived from healthy tissue
Normals are typically derived from blood samples.
I want to create PON for CNV calling in Somatic samples (Lung Cancer cohort) and would like to know if the sample type should be derived from the blood or tissue of normal healthy individuals. In another section tissue sample was not recommended to be used for PON.
(1) Can I use the Negative samples from the Lung Cancer cohort which are not positive for CNVs
(2) Can you please confirm if the blood sample type can be used from healthy individuals for the 40 samples criteria of PON for CNV?
(3) To make up to 40 samples, Can the combination of both (1) and (2) be used for CNV PON by running the samples in a single run on the same sequencing platform?
-
All the samples in the panel must be sequenced with the same technology (sequencing platform, target capture, library prep etc) as each other and as the somatic samples for CNV calling. It is fine to mix different types blood and healthy tissue samples in the panel provided that they are all non-somatic and the different types of DNA extraction do not greatly affect the sequencing. For example, blood and healthy fresh tissue should not be a problem, but blood and FFPE samples, even if sequenced on the same platform, might differ too much due to the sample degradation in FFPE.
Mixing your blood samples and negative samples to reach 40 samples for your panel is probably fine. -
David Benjamin Thank you for the quick response and for clarifying that I can use blood samples + clear negative non-somatic samples to create the PON for CNV calling.
1) The targeted Lung Cancer panel I am setting up the PON for has 12 genes in it. Is there such a minimum number of genes in panel criteria to call out CNVs correctly?
2) There are no genes on X or Y chromosome in this panel. As gender-based sample selection makes sense only to call out CNVs correctly on these chromosomes, I assume selecting both male and female samples (for the blood+negative batch) will not affect the PON or CNV calling in any way. Please correct me if I am wrong.
3) What would the criteria be for selecting individuals to take blood samples from? Can the 40 individuals be those who are supposedly normals without any known diagnosis of cancer?
-
Hi Garima Thakur,
1) 12 genes is a rather small panel---how many genomic bins does the panel yield? The segmentation methods in the somatic CNV pipeline are intended to be used in scenarios in which copy number events span many adjacent genomic bins.
2) Correct, I think you can safely assume that patterns of sequencing bias and noise will be relatively identical on the autosomal chromosomes across both male and female samples.
3) Yes, it would be fine to take the samples from healthy individuals. The primary goal of the PoN is to capture the same patterns of sequencing noise and bias that are present in your case samples, but unobscured by any confounding copy-number activity. So any relatively quiet samples should suffice.
Please sign in to leave a comment.
14 comments