Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 in multisample mode slows at HLA Loci in hg38 alt-aware alignments


1 comment

  • Avatar
    David Benjamin

    This davidben fellow's advice is solid* but you might need to scale up some values for multi-sample mode.  The stride can stay at 20 but the reads per alignment start pertain to the total depth over all samples.  For example, the current value will downsample to 6*20 = 120 reads starting in every 20-base window.

    The combination of many samples with a highly polymophic region such as the HLA is inevitably going to push Mutect2 to its limits.  I would also try to rein in the complexity of the local assembly by setting the mapping quality read filter to a higher threshold like 40 or 50.  You should also experiment with the --linked-de-bruijn-graph argument.

    Congratulations, by the way, on posing a question where the answer is not to stick with the defaults!  And please let us know how these settings go.  This is uncharted territory for us.

    * He's the lead developer of Mutect2.

    ** He's also me.

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk