GERMQ value varies between the samples for the same mutation
REQUIRED for all errors and issues:
a) GATK version used: 4.4.0.0
b) Exact command used:
gatk Mutect2 -R hg38.fa --intervals exome_target_intervals.bed --interval-padding 100 -I tumor_sample.bam --germline-resource af-only-gnomad.hg38.vcf.gz --panel-of-normals pon.vcf.gz -O tumor_sample.vcf.gz
gatk GetPileupSummaries -R hg38.fa --intervals exome_target_intervals.bed --interval-padding 100 -I tumor_sample.bam -V af-only-gnomad.hg38.vcf.gz -O tumor_sample.pileups.table
gatk CalculateContamination -R hg38.fa --intervals exome_target_intervals.bed --interval-padding 100 -I tumor_sample.pileups.table -O tumor_sample.contamination.table --tumor-segmentation tumor_sample.segments.tsv
gatk FilterMutectCalls -V tumor_sample.vcf.gz -R hg38.fa --contamination-table tumor_sample.contamination.table --tumor-segmentation tumor_sample.segments.tsv -O tumor_sample.filtered.vcf.gz
c) Entire program log: not able to provide because I work in a closed environment where copy-pasting information out is not possible
Hi,
I have been using GATK's Mutect2 method to perform tumor-only somatic variant calling. I have a cohort of samples prepared in the same way. I built a Panel of Normals based on 45 samples. As germline resource I used the gnomAD germline sequencing resource. I downloaded this file (af-only-gnomad.hg38.vcf.gz) from gcp-public-data--broad-references Google Cloud bucket.
When I look at the FILTER column of the VCF files, obtained after FilterMutectCalls, I see that the same mutation can have germline or PASS status depending on the tumor sample. Sometimes the GERMQ value is very low (1), but sometimes it's very high (90). This means that I am not able to reliably detect somatic mutations based on the PASS/germline filtering. Is there something I could do to fix this? The pipeline is exactly the same for all tumor samples.
-
Though you mentioned that you cannot copy paste information here is there anything that strikes you about that variant within different samples such as VAF, tumor purity, etc?
To further assess this issue we may need further information about the variant and call metrics that are produced by Mutect2.
Regards.
-
Hi,
Here's the information for two samples of the same mutation. I don't know if there's anything else that's different besides GERMQ.
PASS AS FilterStatus=SITE;AS SB TABLE=174,177|- 54,58;DP.502;ECNT=2;GERMQ.93;MBQ.20,37;MFRL.143,152;MMQ.60,60;MPOS.25;POPAF=7.30;RPA=1,2;RU=TCTG;STR;STRQ.93;TLOD=398.68 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:351,112:0.250:463:86,29:82,22:264,88:174,177,54,58
germline AS FilterStatus.SITE;AS SB TABLE.82,77|- 74,74;DP=324;ECNT=2;GERMQ=1;MBQ.37,37;MFRL.162,164;MMQ.60,60;MPOS.32;POPAF.7.30;RPA=1,2;RU=TCTG;STR;STRQ=93;TLOD=435.00 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:159,148:0.481:307:47,38:35,28:129,120:82,77,74,74
-
Looking at those call metrics it is not too surprising to see these differences because the GERMQ metric is basically a phred scaled (-10*log10) probability that the alt allele is not germline (Lower the values the more likely the variant is germline).
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
In your case the variant with GERMQ score 93 has a MAF of 0.25 which could be a somatic origin whereas the one with GERMQ 1 has a MAF of 0.48 which is almost a heterozygous call if you look at the balance of the ref/alt. Therefore the bottom variant is tagged most likely to be a germline originated one.
It does not seem like you are doing anything wrong or unable to capture rare variants from this perspective.
I hope this helps.
-
Hello,
Thanks for the quick response. This mutation is a known pathogenic driver mutation in cancer. In all of the sample this mutation is definitely somatic. So, if I understood correctly, mutations of which VAF is close to 0.5 or above it are tagged germline by Mutect2 on tumor-only mode? The only way to fix this issue would be to use a matched germline DNA sample?
-
The answer is most likely yes. Although these mutations are known to be of somatic origin mostly (probably according to current knowledge and literature) the below possibilities should always be kept in mind due to complex nature of human genetics and embryogenesis.
- Sample could be really germline mutant for that mutation, however onset of the cancer is late due to the genetic background of the patient.
- Individual could be highly chimeric (even higher for certain tissues) for that mutation where a baseline chimerism has not been detected with regular methods.
- Individual could have obtained this mutation during very early embryogenesis which still puts it in the somatic league (depending on the stage and germ layer that has it).
The calculation of the parameter using a single tumor sample may not account for these without additional input therefore what you observe is what the tool and filter can provide you with the available ones.
I hope this helps.
-
Also, here are the standard public service announcements regarding homemade panels of normals and tumor-only calling
1) Using the panels of normals in our public google bucket gs://gatk-best-practices/ is almost always superior to creating your own. Unless you are working with non-human data or you have at least 100 normals and a very good reason you are better off using one of our panels.
2) Tumor-only calling is inherently and unavoidably unreliable. Some back-of-the-envelope population genetics calculations (which I really did at some point, I promise!) reveal that the average human genome contains something like 30-50,000 rare germline mutations that are absent from gnomAD. I mean, gnomAD is a huge and extremely useful resource, but there is a very long tail of allele frequencies (if you want to be pedantic and point out that the range of 0 to 0.01 is not a long tail, we can say log allele frequencies). Without a matched normal those 30,000 rare germline mutations are very hard to distinguish from somatic mutations. FilterMutectCalls does its best by modeling allele frequencies, and of course a knowledge of common driver mutations is helpful too, but there is no hope of perfection. -
These are some very interesting points. Can you link the panel of normals? Is the germline resource I mentioned the best one?
-
You can find the google cloud links to those PON files in the article below
https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON
hg38
gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gzhg19
gs://gatk-best-practices/somatic-b37/Mutect2-exome-panel.vcf
gs://gatk-best-practices/somatic-b37/Mutect2-WGS-panel-b37.vcf
Please sign in to leave a comment.
8 comments