Logo
User Guide Tool Index Blog Forum DRAGEN-GATK Events Download GATK4
Sign in

Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

  1. GATK
  2. Technical Documentation
  3. Glossary

Glossary

Follow New articles New articles and comments

  • GATK4 command-line syntax
    Contents Java command basics Using the gatk wrapper script (recommended) Ad...
  • Phred-scaled quality scores
    You may have noticed that a lot of the scores that are output by the GATK are...
  • Biallelic vs Multiallelic sites
    A biallelic site is a specific locus in a genome that contains two observed a...
  • Intervals and interval lists
    Interval lists define subsets of genomic regions, sometimes even just individ...
  • Jumping libraries
    Jumping libraries are created to bypass difficult to align/map regions, such ...
  • Bisulfite sequencing - Cytosine methylation
    Cytosine methylation is a key component in epigenetic regulation of gene expr...
  • GVCF - Genomic Variant Call Format
    GVCF stands for Genomic VCF. A GVCF is a kind of VCF, so the basic format spe...
  • Likelihoods and Probabilities
    There are several instances in the GATK documentation where you will encounte...
  • Paired-end or mate-pair
    In paired-end sequencing, the library preparation yields a set of fragments, ...
  • Docker - container - image - registry
    A container is something quite similar to a virtual machine, which can be use...
  • Google Dataproc - Spark cluster service
    Dataproc is Google's Spark cluster service, which you can use to run GATK too...
  • GRCh37 hg19 b37 humanG1Kv37 - Human Reference Discrepancies
    Introduction This page explains the discrepancies between the different "h...
  • Read groups
    There is no formal definition of what a 'read group' is, however in practice ...
  • Version numbers
    GATK4 version numbers are based on semantic versioning. If that term doesn't ...
  • Funcotator Annotation Specifications
    Introduction This page details the specification of the annotations that F...
  • Panel of Normals (PON)
    A Panel of Normal or PON is a type of resource used in somatic variant analys...
  • Spark
    In a nutshell, Spark is a piece of software that GATK4 uses to do multithread...
  • HDF5 format
    A number of GATK tools produce or take in HDF5 format data (1; 2), e.g. Coll...
  • VCF - Variant Call Format
    This document describes "regular" VCF files produced for GERMLINE short varia...
  • OxoG oxidative artifacts
    Oxidation of guanine to 8-oxoguanine is one of the most common pre-adapter ar...
  • Haplotype map format
    Some Picard tools require a haplotype map that maps SNPs to LD (linkage diseq...
  • FASTA - Reference genome format
    The GATK requires the reference sequence in a single reference sequence in FA...
  • Hardware - optimizations - SSD - CPU - GPU - FPGA - TPU
    This article covers concepts and terminology frequently used when discussing ...
  • «
  • ‹
  • 1
  • 2

footer-logo © Broad Institute

  • twitter icon
  • facebook icon
  • linkedin icon
Powered by Zendesk