How do I prepare pon from my own data?Answered
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 18.104.22.168
b) Exact command used:
c) Entire error log:
If not an error, choose a category for your question(REQUIRED):
a)How do I (......)?
b) What does (......) mean?
c) Why do I see (......)?
d) Where do I find (......)?
e) Will (......) be in future releases?
I'm trying to prepare my customized pon file for my 19 WES data. I followed https://gatk.broadinstitute.org/hc/en-us/articles/360046224491-CreateSomaticPanelOfNormals-BETA- However, with all the exon regions (for the NimbleGen ExomeV3+UTR kit), the gatk GenomicsDBImport was really slow and failed to complete because run out of all the disk space. Can I just split the region bed file into multiple bed files and then fuse all the vcf files together for Mutect2 variant calling? Also should I merge the pon file derived from my own data with the public GATK panels of normals (1000g_pon.hg38.vcf.gz)?
The process for creating your own PoN is:
To make your own PoN:
You will need at least 40 normals to pass into the initial step, but the command structure for all three steps is given below.
1) Run Mutect2 in tumor-only mode on each normal BAM individually,
gatk Mutect2 -R reference.fasta -I normal1.bam --max-mnp-distance 0 -O normal1.vcf.gz gatk Mutect2 -R reference.fasta -I normal2.bam --max-mnp-distance 0 -O normal2.vcf.gz ... gatk Mutect2 -R reference.fasta -I normal40.bam --max-mnp-distance 0 -O normal40.vcf.gz
2) Create a GenomicsDB from the normal Mutect2 calls,
gatk GenomicsDBImport -R reference.fasta -L intervals.interval_list \ --genomicsdb-workspace-path pon_db \ -V normal1.vcf.gz \ -V normal2.vcf.gz \ ... -V normal40.vcf.gz
3) and then Combine the normal calls using CreateSomaticPanelOfNormals.
!gatk CreateSomaticPanelOfNormals -R reference.fasta \ --germline-resource af-only-gnomad.vcf.gz \ -V gendb://pon_db \ -O pon.vcf.gz
Wrt disk space, I don't see a way around this. If you're doing this on the cloud, you can increase your space allocation pretty easily.
Also, I don't think you're going to want to merge your PoN with other available PoNs. The only purpose of the PoN is to weed out likely artifacts that are specific to your library prep and sequencing. If you decide to merge different PoNs to effectively exclude more sites, you might curate the PoN to be sure that you're not removing sites that are of particular interest inadvertently. That's the only downside... removing sensitivity for specific sites in the PoN.
Thank you for your reply.
Here comes another question, I'm following the instruction from here: https://gatk.broadinstitute.org/hc/en-us/articles/360046224491-CreateSomaticPanelOfNormals-BETA-
The third step on that webpage doesn't have the argument "--germline-resource af-only-gnomad.vcf.gz". Should I add it?
Yes, I believe that should be added.
The official PoN workflow commands are here:
I'll put in a request that the documentation gets updated.
I'll added it to my analysis workflow.
If I have paired tumor and PBMC samples, can I use the PBMC to create PON, and using the PON to analyze the paired tumor and PBMC samples?
Yes, PBMCs are good for constructing PoNs.
Sorry I only have tumour samples not any matched normal
Can I still create PON with my tumour samples?
You can still create a PoN with tumor samples, but you run the risk of constructing a PoN that will filter common driver events in your panel.
Please sign in to leave a comment.