Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Some MarkDuplicates chunks take a very long time to run

0

1 comment

  • Avatar
    Tiffany Miller

    Hi there! The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    That being said, I'll offer a quick few thoughts though it doesn't answer your main question. Have you tried MarkDuplicatesSpark to increase speed? It will typically run faster than MarkDuplicates and SortSam by a factor of 15% over the same data at 2 cores and will scale linearly to upwards of 16 cores.

    This thread offers thoughts on settings.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk