A year ago this month, we announced the start of our collaboration with the DRAGEN team at Illumina, which aims to combine the respective strengths of GATK and DRAGEN as well as promote the standardization of secondary analysis pipelines used in genomics (see original blog post).
TL;DR: Tune in to the DRAGEN-GATK webinar hosted by GenomeWeb on September 29 to learn more about the status of the project and the technical wizardry involved.
The first jointly developed DRAGEN-GATK pipeline (for germline short variants) has already been available in its proprietary hardware-accelerated form from Illumina -- specifically, version 3.4 of the DRAGEN Bio-IT Platform, as I mentioned in our last update. Meanwhile, we've been working hard on the open-source software implementation of this pipeline, which involved rewriting the algorithms responsible for key accuracy improvements in Illumina's DRAGEN pipeline into GATK and associated tools.
After weathering some delays due to the COVID-19 pandemic, we are now expecting to be able to release the full open-source software version of this first DRAGEN-GATK pipeline in early November of this year (just in time for my birthday, woohoo). This new pipeline implementation will replace the current GATK Best Practices for germline short variant calling, and will produce results that are *functionally equivalent* to the results produced by the proprietary accelerated DRAGEN pipeline.
We care deeply about making this pipeline as accessible, portable and reproducible as possible, so in addition to releasing all the relevant software in Github, we'll provide a set of WDL workflows and Docker container images containing the precompiled executables with all dependencies correctly installed. We'll also make the workflows available for import into popular analysis platforms through the Dockstore tool repository, and we'll publish a Terra workspace containing the workflows in a fully-configured state along with example genomic data for testing.
We realize that the prospect of a major pipeline update raises a lot of questions, so we plan to roll out a set of blog posts and documentation articles that will provide all the necessary technical details about what's new in the pipeline -- and what you need to know to apply it to your data. Most excitingly, we're currently finalizing the content for a webinar that will be co-presented by Séverine Catreux from the DRAGEN team and Eric Banks from the GATK team. Séverine and Eric will provide an in-depth look at the key methodological improvements in DRAGEN-GATK, and will be available for Q&A after their presentation, so don't miss this opportunity to get the lowdown straight from the experts. The webinar will be hosted by GenomeWeb on September 29; registration is already open, so be sure to register today.
Don't want to miss out on any updates? Subscribe to this blog by clicking the "Follow" button in the top right corner, and don't hesitate to leave a comment below -- or ask any burning questions that you feel can't wait until September 29.