Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 in multisample mode slows at HLA Loci in hg38 alt-aware alignments


    David Benjamin

    This davidben fellow's advice is solid* but you might need to scale up some values for multi-sample mode.  The stride can stay at 20 but the reads per alignment start pertain to the total depth over all samples.  For example, the current value will downsample to 6*20 = 120 reads starting in every 20-base window.

    The combination of many samples with a highly polymophic region such as the HLA is inevitably going to push Mutect2 to its limits.  I would also try to rein in the complexity of the local assembly by setting the mapping quality read filter to a higher threshold like 40 or 50.  You should also experiment with the --linked-de-bruijn-graph argument.

    Congratulations, by the way, on posing a question where the answer is not to stick with the defaults!  And please let us know how these settings go.  This is uncharted territory for us.

    * He's the lead developer of Mutect2.

    ** He's also me.

