Terra is provided as an accessible service by the Broad Institute's Data Sciences Platform, where GATK itself is also developed. It provides a user-friendly graphical interface for setting up, running, and sharing your pipelines using Terra’s interactive Jupyter notebooks. This makes it easier to collaborate and publish your data, figures, and workflows without having to worry about the underlying computational infrastructure. It permits you to pay only for what you use, making it more economical, on top of more user-friendly.
You can read more about how and why you can get started with GATK in Terra in this series of blog posts:
- Getting started with GATK? Terra can make it easier
- Test drive GATK Best Practices workflows on Terra
- The future of GATK tutorials is written in Jupyter Notebooks
- From Python Magic to embedded IGV: A closer look at GATK tutorial notebooks
- Learn GATK through workshop tutorials
GATK Best Practice Workspaces
If you're interested in using Terra to analyze your data, take a look at our GATK Best Practice workspaces. These collections of online workflows are preconfigured for common use cases, along with example data that is suitable for testing and benchmarking (both at small scale and at full scale).
It should just be a matter of a few clicks to run any pipeline you like on the preloaded example datasets -- or, with a few more (simple) steps, to run them on your own data.
Here’s a list of the workspaces we currently offer — take a look and see if there’s a workspace that suits your needs.
Germline variant discovery
Best practices for germline SNPs & indels - A series of workflows which cover pre-processing, SNP and indel variant calling, and joint calling. Outputs from each workflow are designed to become inputs to the next, so the entire pipeline can be run altogether or in parts. Try it now at GATK4-Germline-Preprocessing-VariantCalling-JointCalling
Best practices for germline CNVs - Use GATK's GermlineCNVCaller to call a cohort of samples, build a model for denoising further case samples, then call case samples using a previously built model for denoising. This analysis will detect germline CNVs in exome sequence data. Try it now at Germline-CNVs-GATK4
Best practices for germline variant calling in RNAseq - This workflow calls germline short variants from RNAseq data, using unmapped BAM and a corresponding GTF annotation file. As an output, the workflow will produce a recalibrated BAM file, a VCF file, and a filtered VCF. Index files for each output will also be included. Try it now at GATK4-RNA-Germline-VariantCalling
Somatic variant discovery
Best practices for somatic SNVs & indels using Mutect2 - A Mutect2 workflow crafted to be used for variant discovery of SNVs and indels in somatic data. This workspace is plug-n-play — just add in your data, and the outputs will be created. A detailed notebook tutorial will walk you through the necessary steps. Try it now at Somatic-SNVs-Indels-GATK4
Best practices for Funcotator - Use Funcotator to analyze variants for their function and write the analysis to a specified output file. This workspace uses the default set of data sources for the human somatic use case. You can also modify annotations to use with germline data sources, or with your own custom data sources. Try it now at Variant-Functional-Annotation-With-Funcotator
Best practices for CNN variant filtration - Filter variants using GATK’s CNN tool, with additional options for advanced users to generate and evaluate their own training model. Create variant evaluations and summary plots out of inputted VCF and BED files, using our data or yours. Try it now at cnn-variant-filter
Best practices for SNP & indel variant calling in mitochondria - Use whole genome sequencing data to call mitochondrial variants, even those with low allele frequencies between 1-5%. Mitochondrial DNA possesses many particular characteristics that can make this kind of analysis difficult — circular DNA, nuclear mitochondrial DNA segment, etc — but this workflow will walk you through how to process your data and reliably call variants. Try it now at Mitochondria-SNPs-Indels-hg38
Best practices for variant calling with Spark (on multicore machines) - Call variants from aligned input data on a single multicore machine using the ReadsPipelineSpark pipeline. Try it now at Variant_Calling_Spark_Multicore
Broad production workflows
Best practices for germline SNPs & indels (as used at the Broad) - This workflow takes unmapped pair-end sequencing BAMs and returns a GVCF and other metrics read for joint genotyping, and accurately pre-processes the data for germline short variant discovery. This workspace holds Broads production sequence processing pipeline, which contains several quality control tasks within the workflow in addition to regular data processing tasks. Try it now at Whole-Genome-Analysis-Pipeline
Best practices for exome germline SNPs & indels (as used at the Broad) - Pre-process exome sequence data and then conduct germline short variant discovery. Input unmapped human exome sequencing BAMs in order to produce CRAM files, indices, md5 GVCFs, and report metrics. Try it now at Exome-Analysis-Pipeline
Getting started with Terra
We are constantly working to empower researchers with resources that will help them to spend less time figuring out how to run GATK, and more time doing interesting science using their results. Ultimately, we believe this will boost the portability and reproducibility of genomic analysis.
For introductory materials on how to use Terra, watch the videos in our Getting Started With Terra video playlist, which will walk you through the platform’s main features and demo how to use them.
If you can’t find what you’re looking for here, visit the Terra Showcase & Tutorials page directly, where we are continuing to add more workspaces for different use cases.