mutect2 realignment
Hi GATK Team,
I wonder if there is any indication in the outputs / intermediate files for regions that triggered M2 realignment if I did not use -bamout in M2?
My another question is, what should be the input bam for FilterAlignmentArtifact? I saw "The bam input to this tool should be the reassembly bamout produced by HaplotypeCaller or Mutect2 in the process of generating the input callset. " in the document (https://gatk.broadinstitute.org/hc/en-us/articles/360037226112-FilterAlignmentArtifacts-EXPERIMENTAL-), however in best-pratice WDL, I saw the input bam is tumor bam. (https://github.com/gatk-workflows/gatk4-somatic-snvs-indels/blob/master/mutect2.wdl#L361). Could you please clarify?
Thanks!
-
If you run Mutect2 (or HaplotypeCaller, for that matter) with the -debug option your stdout will include lines like:
INFO Mutect2Engine - Assembling 1:21924496-21924625 with 454 reads: (with overlap region = 1:21924396-21924725)
(translation: M2 has detected possible somatic variation in the interval 1:21924496-21924625, and will assemble reads over the padded interval 1:21924396-21924725)
INFO AssemblyResultSet - Trimmed region to 1:21924519-21924610 and reduced number of haplotypes from 10 to only 10
(translation: Mutect2 has trimmed the assembled haplotypes and reads to 1:21924519-21924610 — which is the interval that would show up in a bamout —before running Pair-HMM alignment)
For your second question, the best practices WDL is correct. Using the bamout used to be the right approach, but we overhauled FilterAlignmentArtifacts and now the original bam is necessary. I must have forgotten to rewrite the javadoc (from which the online tool documentation is generated) when making this change and apologize for the confusion.
-
Thanks, David, this is very helpful.
Just to double-check, if I did not run M2 with --debug, there is no way that I can retrieve those realigned intervals?
Thanks!
-
That's correct. There will be some evidence of where local assembly occurred in the form of variants that failed the weak_evidence filter, but this won't tell you about places where the evidence was too weak for any output(i.e. below the -emit-lod threshold) and it won't give you the bounds of the assembled intervals.
Please sign in to leave a comment.
3 comments