Logo
User Guide Tool Index Blog Forum DRAGEN-GATK Events Download GATK4
Sign in

Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

  1. GATK
  2. Technical Documentation
  3. Glossary

Glossary

Follow New articles New articles and comments

  • Structural Variants
    Structural variants are key players in human evolution and disease, but they ...
  • How to interpret SV VCFs
    Introduction The GATK-SV pipeline outputs structural variant records in VCF ...
  • Functional equivalence in DRAGEN-GATK
    DRAGEN-GATK is an open-source, GATK-based pipeline that aims to produce resul...
  • Reference Genome Components
    This document defines several components of a reference genome. We use the ...
  • GATKReport and gsalib
    A GATKReport is simply a text document that contains well-formatted, easy to ...
  • Fisher’s Exact Test
    Overview Fisher’s Exact Test is a statistical test that is used to analyze c...
  • Reference genome
    This document covers the general motivation behind the use of genome referenc...
  • GenomicsDB
    GenomicsDB is a datastore format developed by our collaborators at Intel to s...
  • uBAM - Unmapped BAM Format
    uBAM is a variant form of the BAM file format in which the read data does not...
  • Mate unmapped records
    Mate unmapped records are identifiable by the 8 SAM flag. It is possible for...
  • Coverage - Read depth metrics
    Coverage (a.k.a. read depth) describes the amount of sequence data that is av...
  • JEXL filtering expressions
    JEXL stands for Java EXpression Language. It's not a part of the GATK as such...
  • PF reads - Illumina chastity filter
    Illumina sequencers perform an internal quality filtering procedure called ch...
  • Read filters
    Read filters are internal filters that can be applied by the GATK engine when...
  • GitHub basics for researchers
    This guide introduces select elements of the broadinstitute/gatk GitHub repos...
  • Jar caching
    Jar caching is a thing you can do to speed up the process of running Spark to...
  • RefSeq - gene list format
    From the NCBI RefSeq website The Reference Sequence (RefSeq) collection a...
  • Human genome reference builds - GRCh38 or hg38 - b37 - hg19
    This document covers the specifics of human genome reference assemblies. For ...
  • Hybrid selection (exome preparation)
    Hybrid selection is a method that enables selection of specific sequences fro...
  • Lane - Library - Sample - Cohort
    There are four major organizational units for sequencing data that we use thr...
  • Parallelism - Multithreading - Scatter Gather
    Contents The concept of parallelism Parallel computing in practice (sort of...
  • Inbreeding Coefficient
    Overview Although the name Inbreeding Coefficient suggests it is a measure o...
  • Heterozygosity
    Heterozygosity in population genetics In the context of population genetics,...
  • PED - Pedigree format
    A pedigree is a structured description of the familial relationships between ...
  • Rank Sum Test
    Overview The Rank Sum Test, also known as Mann-Whitney-Wilcoxon U-test after...
  • Variant annotations
    Variant annotations can be produced by HaplotypeCaller, Mutect2, VariantAnnot...
  • Known variants - Training resources - Truth sets
    Many GATK tools require sets of known variant sites to operate correctly. Eac...
  • Resource bundle
    The GATK resource bundle is a collection of standard files for working with h...
  • Pre-adapter artifacts (in hybrid selection) - Bait bias
    Various sources of error affect the hybrid selection (HS) process. Pre-adapte...
  • SAM or BAM or CRAM - Mapped sequence data formats
    SAM, BAM and CRAM are all different forms of the original SAM format that was...
  • 1
  • 2
  • ›
  • »

footer-logo © Broad Institute

  • twitter icon
  • facebook icon
  • linkedin icon
Powered by Zendesk