Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

mutect2 realignment

0

3 comments

  • Avatar
    David Benjamin

    If you run Mutect2 (or HaplotypeCaller, for that matter) with the -debug option your stdout will include lines like:

    INFO Mutect2Engine - Assembling 1:21924496-21924625 with 454 reads: (with overlap region = 1:21924396-21924725)

    (translation: M2 has detected possible somatic variation in the interval 1:21924496-21924625, and will assemble reads over the padded interval 1:21924396-21924725)

    INFO AssemblyResultSet - Trimmed region to 1:21924519-21924610 and reduced number of haplotypes from 10 to only 10

    (translation: Mutect2 has trimmed the assembled haplotypes and reads to 1:21924519-21924610 — which is the interval that would show up in a bamout —before running Pair-HMM alignment)

     

    For your second question, the best practices WDL is correct.  Using the bamout used to be the right approach, but we overhauled FilterAlignmentArtifacts and now the original bam is necessary.  I must have forgotten to rewrite the javadoc (from which the online tool documentation is generated) when making this change and apologize for the confusion.

    0
    Comment actions Permalink
  • Avatar
    Qing Zhang

    Thanks, David, this is very helpful.

     

    Just to double-check, if I did not run M2 with --debug, there is no way that I can retrieve those realigned intervals?

     

    Thanks!

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    That's correct.  There will be some evidence of where local assembly occurred in the form of variants that failed the weak_evidence filter, but this won't tell you about places where the evidence was too weak for any output(i.e. below the -emit-lod threshold) and it won't give you the bounds of the assembled intervals.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk