Mutect2 with and without UMI informationAnswered
Hello, I recently saw this post: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant
- HaplotypeCaller and Mutect2 are optimized to expect UMIs (Unique Molecular Identifiers). If your data does not have UMIs, then they will not play nice with the callers.
And I am wondering if someone can explain how mutect2 and haplotypecaller uses UMI information, and what do we need to do if we don't have UMI-duplexed sequence data?
Hi Davy Deng,
This statement is just regarding running GATK with amplicon data. HaplotypeCaller and Mutect2 do not use UMIs, but if you have UMIs, we recommend running the UMI aware MarkDuplicates step. If you have amplicon data and do not have UMIs, you have to skip MarkDuplicates.
Hope this helps!
Hello Genevieve Brandt (she/her),
If HaplotypeCaller and Mutect2 don't directly use UMIs, perhaps the wording of the linked article could be adjusted as this confused me as well. Just to be sure, is there anything that Picard's UmiAwareMarkDuplicatesWithMateCigar does specifically which is required by HaplotypeCaller/Mutect2 that is different from other methods of UMI-aware read deduplication, such as umi-tools and gencore? Or are you just saying in general that the data needs to be deduplicated in a UMI aware fashion prior to variant calling?
Hi Matthew Lueder,
Yes, if your data has UMIs, the only step this will change is the MarkDuplicates step. You will need to perform UMI-aware read deduplication. I am not familiar with the other tools you mentioned, but you do not have to use our GATK tool specifically (UmiAwareMarkDuplicatesWithMateCigar).
The section of the article you are confused about only applies to amplicon data. Normal MarkDuplicates uses positional information to deduplicate reads, so with amplicon data, many of the reads will be marked as duplicates. That is why with amplicon data, it is imperative to make sure you do read deduplication in with a UMI-aware tool or skip marking duplicates all together.
I'll have our documentation team update that article! Thanks for the suggestion.
Please sign in to leave a comment.