Glossary
-
GATK4 command-line syntax
Contents Java command basics Using the gatk wrapper script (recommended) Ad... -
Phred-scaled quality scores
You may have noticed that a lot of the scores that are output by the GATK are... -
Biallelic vs Multiallelic sites
A biallelic site is a specific locus in a genome that contains two observed a... -
Intervals and interval lists
Interval lists define subsets of genomic regions, sometimes even just individ... -
Jumping libraries
Jumping libraries are created to bypass difficult to align/map regions, such ... -
Bisulfite sequencing - Cytosine methylation
Cytosine methylation is a key component in epigenetic regulation of gene expr... -
GVCF - Genomic Variant Call Format
GVCF stands for Genomic VCF. A GVCF is a kind of VCF, so the basic format spe... -
Likelihoods and Probabilities
There are several instances in the GATK documentation where you will encounte... -
Paired-end or mate-pair
In paired-end sequencing, the library preparation yields a set of fragments, ... -
Docker - container - image - registry
A container is something quite similar to a virtual machine, which can be use... -
Google Dataproc - Spark cluster service
Dataproc is Google's Spark cluster service, which you can use to run GATK too... -
GRCh37 hg19 b37 humanG1Kv37 - Human Reference Discrepancies
Introduction This page explains the discrepancies between the different "h... -
Read groups
There is no formal definition of what a 'read group' is, however in practice ... -
Version numbers
GATK4 version numbers are based on semantic versioning. If that term doesn't ... -
Funcotator Annotation Specifications
Introduction This page details the specification of the annotations that F... -
Panel of Normals (PON)
A Panel of Normal or PON is a type of resource used in somatic variant analys... -
Spark
In a nutshell, Spark is a piece of software that GATK4 uses to do multithread... -
HDF5 format
A number of GATK tools produce or take in HDF5 format data (1; 2), e.g. Coll... -
VCF - Variant Call Format
This document describes "regular" VCF files produced for GERMLINE short varia... -
OxoG oxidative artifacts
Oxidation of guanine to 8-oxoguanine is one of the most common pre-adapter ar... -
Haplotype map format
Some Picard tools require a haplotype map that maps SNPs to LD (linkage diseq... -
FASTA - Reference genome format
The GATK requires the reference sequence in a single reference sequence in FA... -
Hardware - optimizations - SSD - CPU - GPU - FPGA - TPU
This article covers concepts and terminology frequently used when discussing ...