Mutect2 calls germline variants not present in germline resource
a) GATK version used: 4.5.0.0
I am seeing differences between two tumor states
b) Exact command used:
Mutect2 -R /data/genome/hg38ucsc/hg38_no_alt.fa -I Tumor2.bam -I Tumor1.bam -normal LH00280_109_22KKWNLT3_3_GAGTTACC --intervals intervals.bed.gz --germline-resource af-only-gnomad.hg38.vcf.gz --panel-of-normals 1000g_pon.hg38.vcf.gz --genotype-germline-sites true --f1r2-tar-gz tumor.f1r2.tar.gz -O tumor_out.vcf.gz -bamout tumor_out.bam
LearnReadOrientationModel
LearnReadOrientationModel -I tumor.f1r2.tar.gz -O tumor.readorientationmodel.tar.gz
GetPileupSummaries from both
GetPileupSummaries -I Tumor1.bam -R /data/genome/hg38ucsc/hg38_no_alt.fa -V small_exac_common_3.hg38.vcf.gz -L Twist_Exome_RefSeq_targets_hg38.bed.gz -O Tumor1.getpileupsummaries.table
GetPileupSummaries -I Tumor2.bam -R /data/genome/hg38ucsc/hg38_no_alt.fa -V small_exac_common_3.hg38.vcf.gz -L Twist_Exome_RefSeq_targets_hg38.bed.gz -O Tumor2.getpileupsummaries.table
CalculateContamination
CalculateContamination -I Tumor2.getpileupsummaries.table -matched Tumor1.getpileupsummaries.table --tumor-segmentation difference.segments.table -O difference.contamination.table
filterMutectCalls
FilterMutectCalls -R /data/genome/hg38ucsc/hg38_no_alt.fa -V tumor_out.vcf.gz --tumor-segmentation difference.segments.table --contamination-table difference.contamination.table --ob-priors tumor.readorientationmodel.tar.gz --stats variants.dir/C404T_Tissue.ALI.first.vcf.gz.stats -L Twist_Exome_RefSeq_targets_hg38.bed.gz --filtering-stats tumor_out.vcf.gz.stats -O filtered.vcf.gz 2
I detect filtered variants with no "germline" tag which have a corresponding germline population frequency resource for the exact alternative allele. For example, I have the following filtered variants:
chr1 1495668 . A G . normal_artifact;panel_of_normals AS_FilterStatus=SITE;AS_SB_TABLE=52,65|4,2;DP=123;ECNT=2;GERMQ=93;MBQ=28,20;MFRL=215,144;MMQ=60,49;MPOS=4;NALOD=-7.272e-01;NLOD=10.11;PON;POPAF=3.27;ROQ=21;TLOD=5.36 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:65,2:0.054:67:23,1:19,1:50,2:30,35,2,0 0/1:52,4:0.070:56:11,1:22,1:39,2:22,30,2,2
chr1 1632349 . C T . weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=9,17|1,1;DP=30;ECNT=2;GERMQ=24;MBQ=20,20;MFRL=188,157;MMQ=60,60;MPOS=52;NALOD=1.13;NLOD=3.56;POPAF=4.61;ROQ=4;TLOD=3.20 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:17,0:0.070:17:1,0:6,0:12,0:6,11,0,0 0/1:9,2:0.222:11:4,0:0,1:6,1:3,6,1,1
chr1 6253798 . C T . orientation;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=69,71|2,1;DP=148;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=204,144;MMQ=60,60;MPOS=14;NALOD=1.73;NLOD=15.64;POPAF=4.01;ROQ=1;TLOD=4.37 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:70,0:0.018:70:30,0:20,0:52,0:36,34,0,0 0/1:70,3:0.059:73:22,0:24,2:46,2:33,37,2,1
With the corresponding matching alternative alleles in the germline population frequency (af-only-gnomad.hg38.vcf.gz):
chr1 1495668 rs201429000 A G 2144.90 PASS AC=22;AF=0.0005416chr1 1632349 . C T 317.47 PASS AC=1;AF=2.463e-05
chr1 6253798 rs189356842 C T 1230.03 PASS AC=4;AF=9.845e-05
Why are these variants not tagged as germline despite coinciding exactly with the germline population resource?
-
It is all about the allele frequency of the germline variant which affects the probability normalization of a particular site to be counted as germline or not.
TL;DR: The higher the allele frequency the higher the probability of a site to be considered as germline.
Long description can be found in the mutect2 documentation (Pages 7 and 8) about how germline filter works.
https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf
I hope this helps.
-
Thank you Gökalp Çelik it does help a lot.
What about the opposite case? Variants filtered as "germline" but not found in the germline population frequency resource. I am assuming this is because the matched normal provides "strong" evidence of being a germline variant. However, if the matched normal is a tumor and not a healthy tissue, would it make sense to consider these variants as "non-germline"?
Thank you.
-
Hi again.
If a variant is not found if the germline resource there is also an estimation of allele frequency used in the actual algorithm therefore not being in the germline resource does not warrant a variant to not be marked as germline.
Matched normal always provides the best evidence for a possible germline event but Mutect2 does not discriminate if a normal is actually a pseudonormal. You need to do that distinction and change the way variants filtered by other post processing means.
I am sure David Benjamin the author of this tool already mentioned this in the past that there is a high number of variants detected by mutect not found in germline resource to be an actual germline variant therefore estimations and assumption on the germline resource is already coded with this prior probability in mind.
-
Thank you very much for the help Gökalp Çelik
Please sign in to leave a comment.
4 comments