Most resource allocation problems you run into will be associated with either Spark multithreading or Java. We detail the most common issues as well as the recommended solutions below. These solutions typically involve adding either Spark or Java arguments to your GATK command line; see the GATK command line documentation for instructions on adding these arguments to your command line as they must be provided in a way that is different from regular GATK arguments.
Too many threads?
GATK will not use more threads than you allow. If you're running one of the tools that can use Spark multithreading, you can control the number of threads it uses with the Spark-specific arguments --num-executors
and --executor-cores
.
In addition to the threads used by GATK itself, Java may run threads of its own for garbage collection. If that causes you problems, you can limit the maximum number of garbage collection threads used by Java using the Java argument -XX:ConcGCThreads=1
(shown here with the max limit set to a single thread).
Too much memory?
You can set an upper limit for how much memory Java can use to run your command using the Java argument -Xmx
.
Too much CPU?
This is usually related to garbage collection, just like the threads issue mentioned above. The solution is the same; limit the maximum number of garbage collection threads used by Java using the Java argument -XX:ConcGCThreads
.
0 comments
Please sign in to leave a comment.