Getting CNNScoreVariants to run faster
I was wondering if anything can be done to get CNNScoreVariants to run faster. There doesn't seem to be a Spark version of it yet. The documentation mentions the arguments "--inter-op-threads" and "--intra-op-threads" - could these help on a multi-core system?
At present I am running GATK 4.1.4.1 on a laptop with multiple cores and just going through the materials from the Costa Rica workshop about CNNScoreVariants on the laptop. (It is likely I will ultimately run the production job using CNNScoreVariant on a cluster which has multi-core nodes.) I am also running the related 3-gatk-cnn-tutorial notebook on Terra. It seems to have run for at least half an hour on that platform already; so it would be useful to have some sort of typical estimate for the run time for the first "run the default 1D model" example on that platform (e.g. in a computer lab would this have been allowed to run during the lunch break or overnight?)
William
-
Hi ,
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
The long run time on Terra seems to be due to a particular issue with Terra and/ or Google Cloud at the time. When I eventually got the example from the tutorial to run on my laptop, it actually just took a few minutes. There were actually a lot of problems with successfully getting CNNScoreVariants to run on the laptop; but that belongs in another thread, I think,
William
-
Thank you for the update WVNicholson!
Please sign in to leave a comment.
3 comments