We aim to provide the research community with a range of options for running our Best Practices workflows exactly the same way we do it in-house at the Broad Institute. To that end, we make all our workflow scripts available publicly in Github under a dedicated organization called gatk-workflows, and we provide Docker containers for all versions of GATK (since 2018) in DockerHub. See the Best Practices to browse the pipelines by use case.
GATK's preferred pipelining solution: WDL + Cromwell
Our workflows are written in WDL, a user-friendly scripting language maintained by the OpenWDL community.
Cromwell is an open-source workflow execution engine that supports WDL as well as CWL, the Common Workflow Language, and can be run on a variety of different platforms, both local and cloud-based. We take advantage of Cromwell's flexibility in our own work: we do some of our initial development work on the Broad's UGER cluster, then we run at scale on Google Cloud. This allows us to run exactly the same scripts regardless of the compute environment.
Cromwell with Azure
Cromwell on Azure configures all Azure resources needed to run workflows through Cromwell on the Azure cloud. The installation sets up a VM host to run the Cromwell server and uses Azure Batch to spin up virtual machines that run each task in a workflow.
Cromwell workflows can be written using WDL or CWL scripting languages. Examples of WDL and CWL scripts are located here and here, respectively.
More information about deploying your own instance of Cromwell on Azure is located in the Microsoft CromwellOnAzure repository.
The Azure GATK Resource Bundle page also catalogs the standard files used for working with human re-sequencing data with the GATK, with instruction on how to access them.
More information about pipelining features is located on the Broad GATK Resource Bundle page.
0 comments
Please sign in to leave a comment.