Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 with and without UMI information

Answered
0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Davy Deng,

    This statement is just regarding running GATK with amplicon data. HaplotypeCaller and Mutect2 do not use UMIs, but if you have UMIs, we recommend running the UMI aware MarkDuplicates step. If you have amplicon data and do not have UMIs, you have to skip MarkDuplicates.

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Matthew Lueder

    Hello Genevieve Brandt (she/her),

    If HaplotypeCaller and Mutect2 don't directly use UMIs, perhaps the wording of the linked article could be adjusted as this confused me as well. Just to be sure, is there anything that Picard's UmiAwareMarkDuplicatesWithMateCigar does specifically which is required by HaplotypeCaller/Mutect2 that is different from other methods of UMI-aware read deduplication, such as umi-tools and gencore? Or are you just saying in general that the data needs to be deduplicated in a UMI aware fashion prior to variant calling?

    Thanks!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Matthew Lueder,

    Yes, if your data has UMIs, the only step this will change is the MarkDuplicates step. You will need to perform UMI-aware read deduplication. I am not familiar with the other tools you mentioned, but you do not have to use our GATK tool specifically (UmiAwareMarkDuplicatesWithMateCigar). 

    The section of the article you are confused about only applies to amplicon data. Normal MarkDuplicates uses positional information to deduplicate reads, so with amplicon data, many of the reads will be marked as duplicates. That is why with amplicon data, it is imperative to make sure you do read deduplication in with a UMI-aware tool or skip marking duplicates all together.

    I'll have our documentation team update that article! Thanks for the suggestion.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk