Creating Panel of Normals (PoN) from Blood Sample of cancer patientsAnswered
I am investigating somatic mutation in stomach cancer from 31 patients. I have the patients Tumor and adjacent normal tissues. Moreover, I have their blood samples including other patients' (Stomach cancer from sample population) totaling it to 58 patients. I have the whole exome sequencing data of all the samples.
In that case can I use all the blood samples to create the Panel of Normals?
you can use all blood samples, provided that they were sequenced on the same sequencing platform. The tumor samples should also be sequenced on that platform as the Panel of Normals is used to filter sequencing artifacts from the variant calls, which are platform-specific.
You can also combine the vcf from your panel of normals with the one provided in the best practices (liked is exome, but if you have WGS data, you will want to use the genome PoN) as the latter one contains plenty of artifacts that occur across platforms.
Thank you for writing to the GATK Community Forum. We hope we can clarify your question.
When creating your Panel of Normals, please reference the selection criteria in the technical documentation linked below.
Philipp Hähnel, thank you for your note on sequencing platform standardization. Sounds good!
Please let us know if you have any further questions.
Anthony DiCi Thank you for the essential suggestions.
Philipp Hähnel Thank you for the quick response. I missed mentioning that 37 such samples were sequenced (Whole Exome) in Illumina HiSeq 4000 and 21 new samples were sequenced (Whole Exome) (WES) in Illumina NovaSeq 6000.
Then you will want to create two panel of normals, one for each of those two subsets. If you want to be most accurate about the variant calling step, then use the respective PoN of the matching platform for each tumor sample. But you should also be fine with just combining the two PoNs as that is essentially what the provided best practice mutect2 PoN is.
A gatk dev could maybe second that opinion.
Thank you! I will try both ways and compare.
So, I have another doubt. Suppose, I run gatk Mutect2 using Tumor and matched normal. In that case, Can I use the Adjacent normal tissue samples as matched normal and PoN constructed from VCFs generated using HaplotypeCaller on blood samples ?
This is almost correct. You'd run Mutect2 in paired tumor-normal mode with the normal sample from the adjacent normal tissue. Using the matched normal sample will filter rare germline variants that are not in gnomad (I recommend also using the gnomad resource as a germline resource to filter common SNPs).
The PoN is created by running Mutect2 on the blood samples in tumor-only mode (no matched normal), without filtering. Creating the panel also filters or annotates germline variants in gnomad if you provide it with that resource, so that the panel should only end up with pure sequencing artifacts.
Feel free to use my workflows for that, which are based on gatk best practices, but a little bit more up to date and optimized for compute resources.
Thank you for clearing my doubts. I will definitely go through the workflow now.
Philipp Hähnel, thank you for your valued expertise and guidance. We appreciate our GATK Guides!
Ranjan J. Sarma, I'm glad to hear that Phillip's suggestions have helped make things more clear.
I wanted to follow up on my last response with an additional comment, we do recommend the minimum size for a PoN to be at least 40 samples.
Please feel free to reach out any time with any additional questions!
Please sign in to leave a comment.