Mutect2 wdl interval list and scatter count
REQUIRED for all errors and issues:
a) GATK version used:
b) Exact command used:
c) Entire program log:
Hi, I am using GATK v4.3.0 and the mutect2.wdl for whole genome analysis. I have found a wgs_calling_regions.hg38.interval_list from here:
Can you confirm that this is the file that we give to mutect2.wdl.inputs.json for the
"Mutect2.intervals": "File? (optional)" option???.
Can I also check, as there are 356 intervals in this file, is it optimum to give the scatter count as 356 (i.e. will the wdl split over the intervals listed or does it just split the total number of bases given in the interval list by the scatter count number?)
-
Sheryl that is a good intervals file to use. The WDL splits the calling region into smaller interval files totaling approximately the same number of bases. There is no optimum scatter count as far as accuracy is concerned, although weird effects might come up with an absurdly large scatter count above 10,000 or so. The only point of scattering is to split the job in parallel over multiple computers. It doesn't do multi-core parallel processing on a single machine.
-
Thanks for your reply David Benjamin.
Oh right - I'm not sure this is the message that comes across from the documentation,
e.g.
-
Mutect2.scatter_count
-- Number of executions to split the Mutect2 task into. The more you put here, the faster Mutect2 will return results, but at a higher cost of resources.
So apart from giving mutect2 the interval list of callable regions, are there any other ways to increase the speed of mutect2 on a single machine???
-
-
Unfortunately, not really. You can get away with increasing -initial-lod a bit more than the default, but doing so too much causes false negatives.
Please sign in to leave a comment.
3 comments