Mutect2 genotype-germline-sites filtering discrepancy
AnsweredHi, I have a few technical questions about changes in filtered variants when running mutect2 with -genotype-germline-sites.
I ran mutect2 on matched tumor-normal data with and without -genotype-germline-sites. Everything else about these runs was the same.
When I compared the output vcfs I noticed differences in which variants pass all filters between the two different runs. Each run had unique variants that only passed - i.e. some variants were marked as pass when mutect2 was run with -genotype-germline-sites that failed when run with standard settings, and vice-versa.
When I looked through these variants I noticed two different patterns of unique variants:
Unique PASS variants to genotype-germline: the unique variants that PASSED in genotypegermline but were rejected in standard analysis failed in the standard run because of the "strand_bias" filter. The "strand_bias" filter marks more variants in the standard analysis than in the genotypegermline analysis. Looking through these variants on IGV, they look like they are false positives and for some reason when you run mutect2 with --genotypegermlinesites it prevents this filter from accurately working.
Unique PASS variants to standard: These variants were all rejected in genotypegermline but passed in standard mutect2 failed because of haplotype or clustered_events. I believe this is a potential problem with --genotypegermlinesites because when you include germlinesites, bona fide somatic variants that happen to be close to germline sites get filtered (when you run genotype germline you are more likely to include the germline variant in the activeregion of a somatic variant because you create an active region around the germline variant in addition to the somatic variant). It seems like if you run -genotypegermline sites you will have false negatives and miss these somatic variants because they get filtered.
These are not an insubstantial number of variants - -genotypegermline sites returned 3910 PASS variants, and there were 123 variants that failed genotypegermline sites but passed in standard mutect2 just because they failed the haplotype or clustered_events (likely false negatives).
Do you have any suggestions for how to get around these two issues? One way I can think of to get around the second issue is to ignore the haplotype or clustered_event filters when running --genotypegermlinesites, but this would have the effect of introducing false positives in the variant call. Is there a way to increase the number of nearby events that trigger the haplotype/clustered _events filters? Changing this could also restore the false negatives. I am not sure how to solve the issue in which the strand_bias filter stops working as well when running -genotypegermlinesites.
Thank you!
-
Hi TA,
What version of GATK did you run with this comparison?
Thank you,
Genevieve
-
Hi,
I used gatk 4.2.0.0
We ran on terra referencing broad gatk docker us.gcr.io/broad-gatk/gatk@sha256:f2602e0bbc0117c30d23d8d626eb8d0a21ca672bb71180b5cf25425603a0ae09
-
TA The issue with the clustered events and haplotype filters when running in -genotype-germline-sites is a real problem. These filters should only be triggered by a cluster of technical artifacts or somatic variants, not by germline variants. However, since the default mode of Mutect2 ignores most germline sites, we overlooked this possibility. We need to fix those filters so that they work as intended.
I can't think of a reason for the strand bias filter. For the calls that have the strand bias filter with default settings and then pass when genotyping germline sites, are there any other filters applied in default mode?
-
Hi TA, I have created a github issue ticket so that our developer team can continue to look into this filtering discrepancy that you brought up. Here is the link where you can follow along with the progress: https://github.com/broadinstitute/gatk/issues/7391.
Could you submit example data demonstrating these discrepancies to our ftp site following these instructions: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671?
Thank you,
Genevieve
-
Hi David Benjamin,
I'm sorry to bump a 3 year old thread, but I've noticed the same behavior where the strand_bias filter behaves inconsistently with --genotype-germline-sites on GATK 4.4.0, and I thought it more appropriate to update/mention here as opposed to making a new thread.
e.g. on one VCF I'm working with:
1 636733 . A T . strand_bias AS_FilterStatus=strand_bias;AS_SB_TABLE=39,28|4,0;DP=73;ECNT=1;GERMQ=93;MBQ=30,30;MFRL=342,570;MMQ=48,40;MPOS=16;NALOD=1.53;NLOD=9.62;POPAF=6.00;ROQ=38;TLOD=7.16 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:34,0:0.029:34:14,0:18,0:32,0:21,13,0,0 0/1:33,4:0.143:37:18,1:11,3:29,4:18,15,4,0
with --genotype-germline-sites:
1 636733 . A T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=39,28|4,0;DP=73;ECNT=1;GERMQ=93;MBQ=30,30;MFRL=342,570;MMQ=48,40;MPOS=16;NALOD=1.53;NLOD=9.62;POPAF=6.00;ROQ=38;TLOD=7.16 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:34,0:0.029:34:14,0:18,0:32,0:21,13,0,0 0/1:33,4:0.143:37:18,1:11,3:29,4:18,15,4,0
Is there a recommendation of which filter call to take in these cases of inconsistent behavior?
In this callset there appear to be 126 instances of this occurring out of about 5500 PASS variants.
-
Hi Luka Culibrk
We are working on fixing these issues currently, and hopefully these fixes may land on a near point release.You may check the PRs later on to follow up whether any of these fixes make into the release. In the meantime branches are active and running commits such as
https://github.com/broadinstitute/gatk/commit/1549d88f37dcfbd0215f3ce52619f1ee85358792
Thank you for your patience.
Please sign in to leave a comment.
6 comments