I'm looking into parallelizing my somatic whole-exome pipeline using the scatter strategy (though recreating it using Bash and Python). That said, I've been looking through the wdl scripts on the gatk github page, and I've seen the variable scatter_counts being initialized, but never actually given a value.
I was wondering the considerations are for choosing how many intervals (i.e. scatter_count value) to parallelize over. I've arbitrarily chosen 10 splits for my pipeline to start, but I'm curious if there is a better way to figure out this number for scaling purposes or to hear what values some of the Broad teams use in their own analyses.
Please sign in to leave a comment.