Documentation for running GATK and Cromwell on IBM Cloud
AnsweredHi,
I am working in IBM Research and testing GATK/Cromwell on IBM Cloud. It would be nice if we could add IBM Cloud and/or OpenShift to pages for running GATK on the cloud and Computing Platforms. Is it possible? I can write and contribute the documentation around the entire steps to enable them.
Regards,
Takeshi Yoshimura
-
Official comment
Hi Takeshi Yoshimura, I am the writer responsible for those documents. Thank you very much for your interest in helping us improve our documentation!
Those two articles are actually in serious need of revision, so I'd be happy to integrate the information about IBM Cloud and OpenShift that we are missing.
It would be fantastic if you are willing to contribute to the documentation, thank you for offering! Once I verify the information, then I can integrate it into the articles. Just send me the body text or a link to the information you think we should add (as well as any relevant hyperlinks where people can find more comprehensive information) and I'll get it done.
You can paste that information here, or if you'd rather move the conversation to email then you can get in contact with me using the Broad Institute's contact directory by searching for my name "Derek Caetano Anolles".
Comment actions -
Hi Derek Caetano Anolles,
This is great. I will start writing and give it to you later. Thank you! -
Here is the document for GATK on IBM Cloud.
---GATK on IBM Cloud
IBM Cloud offers an automation tool to deploy an LSF cluster with a shared filesystem (SFS) pre-installed. You can quickly test Cromwell on it to run GATK pipelines. Detailed steps to deploy LSF on IBM Cloud are described in IBM Cloud documentation.With a typical configuration, you can deploy LSF on IBM Cloud like the following architecture.
You can securely access the LSF manager in a virtual private cloud (VPC) using SSH. Then, you can download and start Cromwell and other software such as Java and Docker.SFS is mounted at /home/lsfadmin/shared. So, you need to specify it to the parameter "backend.providers.LSF.config.root" in cromwell.conf. You also need to add "-v" parameter to mount ${cwd} to ${docker_cwd} for docker-run in "submit-docker" parameter. By doing so, submitted jobs can read and write files on SFS. Example cromwell.conf is the following.backend {
default = LSF
providers {
LSF {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
runtime-attributes = """
Int cpu = 1
Int memory = 1048576
String? docker
"""
root = "/home/lsfadmin/shared/cromwell/"
submit = "bsub -J ${job_name} -cwd ${cwd} -o ${out} -e ${err} /usr/bin/env bash ${script}"
submit-docker = "bsub -J ${job_name} -cwd ${cwd} -o ${out} -e ${err} docker run -v ${cwd}:${docker_cwd} --memory=${memory} --cpus=${cpu} ${docker} ${job_shell} ${docker_script}"
kill = "bkill ${job_id}"
check-alive = "bjobs ${job_id}"
job-id-regex = "Job <(\\d+)>.*"
}
}
}
}For advanced usages, LSF on IBM Cloud can utilize its auto-scaling feature according to its load. Also, LSF can connect your on-premise cluster to a cloud cluster with job forwarding. For more information about them, please check IBM Cloud documentation.
-
Also, I published our Kubernetes backend for Cromwell at my GitHub repository. https://github.com/takeshi-yoshimura/cromwell-k8s and https://github.com/takeshi-yoshimura/cromwell/tree/k8s-backend-78. The former repository contains README.md for "GATK on Kubernetes" (I tested this only on OpenShift, but I believe this works on Kubernetes too). The latter contains the complete code for the backend. I will also raise a PR for this to the upstream Cromwell repository.
-
Hi,
Thank you very much for creating the "GATK on IBM Cloud" page. I want to fix a small part of this.
> IBM Cloud offers automation tools for deploying an LSF cluster with a shared filesystem (SFS) pre-installed.
SFS means NFS, GPFS, and other distributed filesystems, not a specific IBM product. Could you please update the paragraph to be the following?
---
IBM Cloud offers automation tools for deploying an LSF cluster with NSF or GPFS pre-installed.
-
Hi again Takeshi Yoshimura,
Thanks for the recommendation! I've just updated the document now.
-
Great. Thank you!
Please sign in to leave a comment.
7 comments