Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 Somatic-vs-Somatic calling

1

6 comments

  • Avatar
    Luka Culibrk

    Just to follow-up, one specific question I might have is this;

    If I want to "undo" the effect of --genotype-germline-sites on the ECNT field, I would want to identify the germline variants that Mutect2 normally would not emit. I could run Mutect2 twice, once with --genotype-germline-sites on and once with it off and do a set diff, but alternatively if there's a way to reverse engineer the identity of the germline variants which normally would be not emitted based on values in the VCF, that would be very helpful.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Luka Culibrk

    In order to disable clustered-events filter or say in order to allow more variants in a single assembly region you can modify the following parameter to overcome one possible problem that you think you may face. These parameters are in FilterMutectCalls. 

    --max-events-in-region <Integer> Maximum events in a single assembly region.  Filter all variants if exceeded.  Default value: 2.

    Also if you are concerned about variants within the same haplotype with another filtered variant you may modify the following parameter to remove that restriction as well

    --distance-on-haplotype <Integer> On second filtering pass, variants with same PGT and PID tags as a filtered variant within this distance are filtered.  Default value: 100. 

    Your possible solution of running Mutect2 twice to get a diff between 2 VCF files could most certainly help in your case. 

    Running the parent clone as a tumor-only may help you identify possible subclonal variants underneath and you may pick them as sites only to re-call them from the experimental clones to see if their allelic fractions change to a higher value. 

    We would like to see your feedback on how you will follow-up this experimental setup as it is quite interesting to observe the effects of cell culture clonality on somatic variant calling. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Luka Culibrk

    Hi Gökalp Çelik

    Thank you very much for your reply!

    Firstly I'm hesitant to strictly increase the threshold for clustered events because I assume that the threshold has been set at a default of 2 for very good reason, and based on what I've seen that filter does catch a ton of junk. Likewise with the haplotype filter.

    Running the parent clone as a tumor-only may help you identify possible subclonal variants underneath and you may pick them as sites only to re-call them from the experimental clones to see if their allelic fractions change to a higher value.

    I attempted this somewhat, where I ran tumor-only on parents and daughters, and used the -alleles argument to pass daughter alleles to parent in order to get Mutect2 to genotype/estimate VAFs for these variants and thereby do the comparison you're referring to. The major obstacle to using tumor-only mode is that it makes Mutect2 perform much worse when it comes to identifying true germline mutations and I find that many "germline" variants are false negatives.

    So far my most promising approach is to run Mutect2 in paired mode and to rescue variants tagged with "germline" or "normal_artifact" based on likelihoods of the variant being subclonal in parent and enriched in daughter. However this still runs afoul of the clustered mutations and haplotype filters because --genotype-germline-sites inflates the mutation count and thereby the ECNT values.

    Running mutect2 twice is the simplest solution but also doubles the computational footprint of this process. I've tried to check the source code to see the logic of when a mutation is not emitted but my java is too lacking to comprehend Mutect2's majesty.

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

     Luka Culibrk This new PR: https://github.com/broadinstitute/gatk/pull/8717 removes germline events from the accounting of the ECNT annotation and should fix the unfortunate interaction between the clustered events filter and the --genotype-germline-sites option.  Please let us know if the issue is not resolved.

    0
    Comment actions Permalink
  • Avatar
    Luka Culibrk

    Thank you David Benjamin! I will test this in the coming days - do you have any estimate on when this may get into a versioned release?

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Probably 1-4 days for code review and merging into the master branch, followed by roughly a month between minor releases.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk