Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 wdl interval list and scatter count

0

3 comments

  • Avatar
    David Benjamin

    Sheryl that is a good intervals file to use.  The WDL splits the calling region into smaller interval files totaling approximately the same number of bases.  There is no optimum scatter count as far as accuracy is concerned, although weird effects might come up with an absurdly large scatter count above 10,000 or so.  The only point of scattering is to split the job in parallel over multiple computers.  It doesn't do multi-core parallel processing on a single machine.

    0
    Comment actions Permalink
  • Avatar
    Sheryl

    Thanks for your reply David Benjamin.

    Oh right - I'm not sure this is the message that comes across from the documentation,

    e.g.

    • Mutect2.scatter_count -- Number of executions to split the Mutect2 task into. The more you put here, the faster Mutect2 will return results, but at a higher cost of resources.

     

    So apart from giving mutect2 the interval list of callable regions, are there any other ways to increase the speed of mutect2 on a single machine???

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Unfortunately, not really.  You can get away with increasing -initial-lod a bit more than the default, but doing so too much causes false negatives.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk