Panel of Normals - inconsistent filtering behavior
Hello everyone,
My team is trying to create an oncology panel and when trying to incorporate Panel of Normals (PON) in our assay, the generated PON (from this documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360042479112) is not filtering some variants in the normal sample despite all metrics seeming to indicate they be filtered. Can you advise on why these variants might not be filtered from some samples but from others? The variants that are not getting filtered out are known recurrent technical artifacts that we want removed.
For example, every sample (12 total) had this variant filtered out successfully:
sampleID genomic_coords position_coverage variant_coverage VAF
NGS142-D009 chr10:g.43615633C>G 1921 911 0.474
NGS142-D010 chr10:g.43615633C>G 3360 3345 0.996
NGS142-D026 chr10:g.43615633C>G 3787 1902 0.502
The PON value of this variant was:
chr10 43615633 . C G . . BETA=117213.66,98.07;FRACTION=0.038
And, some variants are partially filtered:
sampleID genomic_coords position_coverage variant_coverage VAF Filtered
NGS142-D025 chr11:g.125525195A>G 7837 7834 1.000 filtered out
NGS142-D003 chr11:g.125525195A>G 13202 13199 1.000 remaining
And the PON value of this variant was:
chr11:g.125525195A>G BETA=2352.48,0.622;FRACTION=0.830 40 25
For the variant that didn't get filtered (for example: chr11:g.125525195A>G), there is an additional tag named "panel_of_normals" in the vcf file. We would like to know:
1) Does PON remove or tag and keep this variant, or does it do both for matching variants in sample?
2) How is this behaviour co-ordinated? In short, how does PON know when to remove the variant vs when to only tag it and keep?
GATK version: 4.1.7.0
`java -version` output:
openjdk version "11.0.9" 2020-10-20
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9+11)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.9+11, mixed mode)
Reference used: hg19.fa
Commands used:
# Mutect2 run on R1 and R2 .bam files for the normal sample
gatk Mutect2 -R ${reference} -I ${bamfile} --max-mnp-distance 0 -O ${outdir}
# variants are indexed
gatk SortVcf -I ${sample}_mutect2.vcf -O ${sample}_mutect2_sorted.vcf
# GenomicsDBImport is used to generate PON db
gatk --java-options "-Xmx4g -Xms4g" GenomicsDBImport -R ${reference} -L ${targetbedfile} --genomicsdb-workspace-path ${out_dir}/pon_db -V PON{ID}_final.vcf.gz
#afgnomead indexed
gatk IndexFeatureFile -I af-only-gnomad.raw.sites_chr.vcf
#PON vcf is generated
gatk CreateSomaticPanelOfNormals -R ${reference} --germline-resource ${af_gnomead} -V $pon_db -O pon.vcf.gz
I appreciate any insight on this behavior of the PON filter.
-
Hi Paula Berry,
Thank you for writing to the GATK forum! I hope that we can help you sort this out.
I brought your inquiry to our developers and received some feedback and next steps to share with you.
Could you please clarify whether or not you are using FilterMutectCalls? We don’t have any info from your post about FilterMutectCalls, where the filtering occurs.
Once you clarify this, hopefully, we can help you further. I look forward to hearing back from you.
Best,
Anthony -
Hi Anthony DiCi, thank you for taking a look at this. We use Sentieon v.202010 to filter, where TNhaplotyper2 is the module where mutect2 is used, and TNfilter is filtermutect.
The two below commands are what is used for the variant calling and PON filtering:
sentieon/202010/libexec/driver -t 10 -r hs37d5_pms2_cl_ex12-end_masked.fa -i A015-F.bam --algo TNhaplotyper2 --pon pon.vcf.gz --min_base_qual 20 --tumor_sample A015-F A015-F_unfiltered.vcf
sentieon/202010/libexec/driver -r hs37d5_pms2_cl_ex12-end_masked.fa -t 10 --algo TNfilter --tumor_sample A015-F -v A015-F_unfiltered.vcf A015-F.vcfsample where variant got filtered out:
before filter:
chr1 11181327 . C T . PASS GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0/1:57,2504:0.978:2561:28,1255:29,1188:0|1:11181327_C_T:11181327:30,27,1208,1296
after filter:
N/Asample where variant is kept:
before filter:
chr1 11181327 . C T . PASS GT:AD:AF:DP:F1R2:F2R1:SB 0/1:342,1323:0.801:1665:157,644:184,658:155,187,651,672
After filter:
chr1 11181327 . C T . panel_of_normals GT:AD:AF:DP:F1R2:F2R1:SB 0/1:342,1323:0.801:1665:157,644:184,658:155,187,651,672
Example Summary:
chr1 11181327 . C T . panel_of_normals BETA=0.333,0.244;FRACTION=0.075
total number before PON filtering:
23number filtered out:
21The ones that were left in were tagged "panel of normal" instead of filtered.
-
Hi Paula Berry,
Thank you for responding with this information! Regrettably, we do not support Sentieon; therefore, we cannot help you in that regard. I’d recommend locating the entity that supports the filter and submitting a help request there.
To re-emphasize, I genuinely regret not being able to provide more helpful information on this issue. Thank you for being a valued member of the GATK community.
Please do not hesitate to reach back out in the future for any other GATK-related questions/problems that might arise.
Best,
Anthony -
Hi Paula Berry,
We haven't heard from you in a while so we're going to close out this ticket. If you still require assistance, simply respond to this email and we'll be happy to pick up where we left off!
Kind regards,
Anthony
Please sign in to leave a comment.
4 comments