I always wince a bit at the phrase "we are thrilled to announce bla bla blah" because it's such a hackneyed sentence starter. Nevertheless, I do have something to announce, and I am in fact quite thrilled to be doing so. I'd been waiting for the hard copies to be ready, and here they are in their unadorned (and amateurishly photographed) glory, as delivered today by the incomparable US Postal Service.
This, dear (potential) reader, is the first edition of Genomics in the Cloud, an O'Reilly animal book co-authored by Brian O'Connor of the University of California, Santa Cruz and yours truly (Geraldine of GATK).
Even though GATK only appears in second position in the subtitle, it's fair to say that the bulk of the book is in fact about running GATK — appropriately and efficiently, with style and grace. For the first half of the book, the cloud aspect is largely one of convenience (pre-configured environment, no fiddling required!). Then in the second half things get a bit more cloud-heavy as we get into thornier topics: like running pipelines at the kind of scale where cloud computing really does help quite a bit. It's all command-line based until the third act, where we move to the web-UI environment of Terra in order to demonstrate some advanced cloud features without having to put you through a server administration course (because nobody wants that -- ok maybe 15 people want that). All of this happens on Google Cloud, but to be clear the key concepts apply on other platforms as well, including local HPC/clusters (we’re planning some supplemental blog posts to demonstrate that). Throughout the book we put a lot of emphasis on principles and practical methods for making your analyses portable and reproducible across platforms, which is a thing Brian and I both care about enormously.
Here is the table of contents:
- Foreword by Dr. Eric Lander, Founding Director of the Broad Institute
- Preface: Purpose, Audience and Scope of this book
- Genomics in a Nutshell: A Primer for Newcomers to the Field
- Computing Technology Basics for Life Scientists
- First Steps in the Cloud
- First Steps with GATK
- GATK Best Practices for Germline Short Variant Discover
- GATK Best Practices for Somatic Variant Discovery
- Automating Analysis Execution with Workflows
- Deciphering Real Genomics Workflows
- Running Single Workflows at Scale with Pipelines API
- Running Many Workflows Conveniently in Terra
- Interactive Analysis in Jupyter Notebook
- Assembling Your Own Workspace in Terra
- Making a Fully Reproducible Paper
As you can see, we've included intro chapters to both genomics and computing stuff (that's a technical term), to make the topic accessible to people with a wide range of backgrounds. To make the book newcomer-friendly, we tried to assume as little command-line experience as possible, and we don't assume any cloud experience at all— all necessary concepts are explained from scratch. But we also don't skimp on accuracy and technical depth. It's quite a balancing act; you should definitely check it out and then let us know on which side you think we erred the most. I look forward to the spread of opinions :)
If you have access to the O'Reilly Learning Library (sometimes called Safari) through your institution, you can start reading right away at https://oreil.ly/genomics-cloud. To get a sense of the writing and technical level, you can also browse several chapters in the Kindle version preview on Amazon. The paperback version is also previewable, though it's printed in grayscale so the images don't pop quite as much (especially screenshots).
In my next blog post I'll write a bit more about where this book came from and what it means to me from a personal perspective.