Mutect2 Somatic-vs-Somatic calling
Hi GATK Team,
I want to perform variant calling from serially subcloned cell cultures, where we don't have a matched normal but we do have parental DNA for samples. I'm interested in identifying:
1) The de novo mutations in the daughter cells
2) Variants which were subclonal in parent culture and became clonal in the daughter culture
1) is simple. Mutect2 does this out of the box. 2) is the problem.
The problem is that Mutect2 by default will infer many parent-subclonal mutations as germline with high confidence and not emit them, preventing us from testing for enrichment in daughter and rescuing them. Furthermore, turning on --genotype-germline-sites to overcome this runs into previously documented issues with the clustered_mutations filter: https://github.com/broadinstitute/gatk/issues/7391
I'm inclined to postprocess the filtered VCF by modifying the ECNT values based on whether the filtered clustered events are technical or not and passing it back to FilterMutectCalls, but I'm expecting there to be issues with this approach. I've also tried doing separate single-sample calling on the pairs of cultures to expected results - lots and lots of issues with processing variants tagged with the germline filter, as well as the haplotype/clustered_events filter.
I'd like to reach out to the community here to see whether there might be other ways of achieving what I'm after here that are enabled by mutect2 options directly that I might have missed in the documentation or not fully understood and dismissed. Thanks in advance
-
Just to follow-up, one specific question I might have is this;
If I want to "undo" the effect of --genotype-germline-sites on the ECNT field, I would want to identify the germline variants that Mutect2 normally would not emit. I could run Mutect2 twice, once with --genotype-germline-sites on and once with it off and do a set diff, but alternatively if there's a way to reverse engineer the identity of the germline variants which normally would be not emitted based on values in the VCF, that would be very helpful.
-
Hi Luka Culibrk
In order to disable clustered-events filter or say in order to allow more variants in a single assembly region you can modify the following parameter to overcome one possible problem that you think you may face. These parameters are in FilterMutectCalls.
--max-events-in-region <Integer> Maximum events in a single assembly region. Filter all variants if exceeded. Default value: 2.
Also if you are concerned about variants within the same haplotype with another filtered variant you may modify the following parameter to remove that restriction as well
--distance-on-haplotype <Integer> On second filtering pass, variants with same PGT and PID tags as a filtered variant within this distance are filtered. Default value: 100.
Your possible solution of running Mutect2 twice to get a diff between 2 VCF files could most certainly help in your case.
Running the parent clone as a tumor-only may help you identify possible subclonal variants underneath and you may pick them as sites only to re-call them from the experimental clones to see if their allelic fractions change to a higher value.
We would like to see your feedback on how you will follow-up this experimental setup as it is quite interesting to observe the effects of cell culture clonality on somatic variant calling.
Regards.
-
Hi Gökalp Çelik
Thank you very much for your reply!
Firstly I'm hesitant to strictly increase the threshold for clustered events because I assume that the threshold has been set at a default of 2 for very good reason, and based on what I've seen that filter does catch a ton of junk. Likewise with the haplotype filter.
Running the parent clone as a tumor-only may help you identify possible subclonal variants underneath and you may pick them as sites only to re-call them from the experimental clones to see if their allelic fractions change to a higher value.
I attempted this somewhat, where I ran tumor-only on parents and daughters, and used the -alleles argument to pass daughter alleles to parent in order to get Mutect2 to genotype/estimate VAFs for these variants and thereby do the comparison you're referring to. The major obstacle to using tumor-only mode is that it makes Mutect2 perform much worse when it comes to identifying true germline mutations and I find that many "germline" variants are false negatives.
So far my most promising approach is to run Mutect2 in paired mode and to rescue variants tagged with "germline" or "normal_artifact" based on likelihoods of the variant being subclonal in parent and enriched in daughter. However this still runs afoul of the clustered mutations and haplotype filters because --genotype-germline-sites inflates the mutation count and thereby the ECNT values.
Running mutect2 twice is the simplest solution but also doubles the computational footprint of this process. I've tried to check the source code to see the logic of when a mutation is not emitted but my java is too lacking to comprehend Mutect2's majesty.
-
Luka Culibrk This new PR: https://github.com/broadinstitute/gatk/pull/8717 removes germline events from the accounting of the ECNT annotation and should fix the unfortunate interaction between the clustered events filter and the --genotype-germline-sites option. Please let us know if the issue is not resolved.
-
Thank you David Benjamin! I will test this in the coming days - do you have any estimate on when this may get into a versioned release?
-
Probably 1-4 days for code review and merging into the master branch, followed by roughly a month between minor releases.
Please sign in to leave a comment.
6 comments