Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 not calling a 4-bp deletion in BRCA1 with 50% AF

0

7 comments

  • Avatar
    Bhanu Gandham

    Hi,

    Based on the screenshots it is possible that this issue is the same as described in this issue ticket: https://github.com/broadinstitute/gatk/pull/6113. 

    As for disabling tool default read filters is a little misleading, because it doesn't actually mean that every read should make it into your bamout and seen for processing. What the default read filters do is filter out really bad reads with obvious issues (like not having mapping qualities or being otherwise invalid) as well as getting rid of duplicate marked reads. This all happens before the tool sees the reads. Before genotyping we filter reads that mismatch too drastically to their best haplotpyes, usually this is a small number of reads on a good day but in a situation like this its possible the 4 base deletion caused all of the variant reads to be disqualified if the correct haplotype wasnt found.

     

    Options for you to try:

    1. try running with --linked-de-bruijn-graph enabled.
    2. try with --recover-all-dangling-branches enabled

    Let me know if these options work for you or not.

    2
    Comment actions Permalink
  • Avatar
    mack812

    Hi Bhanu Gandham,

    Sorry for the delay and thank you for your reply.

    I tried both option and indeed --linked-de-bruijn-graph option made the tool detect the 4bp del (not so the --recover-all-dangling-braches one).

    I think this is a very similar situation to an issue I posted on the old forum, which was kindly solved by David Benjamin:

    https://gatkforums.broadinstitute.org/gatk/discussion/24507/mutect2-repeatedly-not-detecting-somatic-variant-idh2-r172k-with-solid-read-support-and-5-af

    I see now that there is microhomology in the beginning and end of the reads from this amplicon (GCTTT) and that the variant is less than 25bp away from the beggining of the read (like on that previous issue) so I guess that, similar to that previous case, the variant ended up again on a dangling end of the graph. This time I was running M2 with MT-mode on (which solved that previous similar case), but it seems that it was not enough to recover the dangling end this time. Hence, I guess that this new --linked-de-bruijn-graph argument works better than MT-mode at recovering dangling ends, as anticipated by David Benjamin in his response on that previous case from the link above.

    Anyway, I would greatly appreciate if you could confirm that this is indeed the case: if the argument --linked-de-bruijn-graph is a better approach than MT-mode for recovering variants from dangling ends, especially considering the context of amplicon tech with high depth.

    BTW: I have two pending-of-approval posts with other interesting situations (this time germline-HaplotypeCaller issues). Could you please release them from that stage (if they are judged appropiate of course)? Thanks in advance.

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    mack812 Currently neither approach is better.  Both will miss some variants.  I would tentatively suggest using both at the same time for amplicon sequencing.

    We ought to do much better soon once an improvement to M2's graph pruning algorithm (currently) goes in and after some more work on dangling ends.

    1
    Comment actions Permalink
  • Avatar
    mack812

    Hi David Benjamin,

    Thanks for replying.

    I tried both arguments together but that did not work. Also tried adding the three arguments (MT-mode, linked de Bruijn, and recover all dangling) but no luck with that either.

    Anyway, thanks for letting me know that new approaches to this issue are on their way. In the meantime I think I will run M2 twice, each time with one of the current solutions, and merge the results in a vcf without redundancies. Maybe even also try the --recover-all-dangling-braches.

     

    0
    Comment actions Permalink
  • Avatar
    janrehker

    Hi Bhanu Gandham,

    Your advice helped me solve one of my issues. :)

    During my work with deep sequencing amplicon data I stumbled onto a 17 bp deletion which I believe to be true, though it has a <1% AF.

    My required options were:

    '--linked-de-bruijn-graph --max-reads-peralignment-start 0'

    I also switched to '--dont-use-soft-clipped-base true' as it reduced probably false positive calls.

    '--recover-all-dangling-braches' did not help me with my variant similar to what mack812 reported.

     

    Best regards,

    Jan

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Glad you were able to solve your issue, janrehker! Thank you for providing your solution so that other members of the GATK community can find it if they are having the same issue.

    0
    Comment actions Permalink
  • Avatar
    janrehker

    That's what a forum is made for :)

    I should mention that '--max-reads-peralignment-start 0' drastically increases the time consumed for variant calling, at least in my data with deep coverage up to 32k reads.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk