Glossary
-
Structural Variants
Structural variants are key players in human evolution and disease, but they ... -
How to interpret SV VCFs
Introduction The GATK-SV pipeline outputs structural variant records in VCF ... -
Functional equivalence in DRAGEN-GATK
DRAGEN-GATK is an open-source, GATK-based pipeline that aims to produce resul... -
Reference Genome Components
This document defines several components of a reference genome. We use the ... -
GATKReport and gsalib
A GATKReport is simply a text document that contains well-formatted, easy to ... -
Fisher’s Exact Test
Overview Fisher’s Exact Test is a statistical test that is used to analyze c... -
Reference genome
This document covers the general motivation behind the use of genome referenc... -
GenomicsDB
GenomicsDB is a datastore format developed by our collaborators at Intel to s... -
uBAM - Unmapped BAM Format
uBAM is a variant form of the BAM file format in which the read data does not... -
Mate unmapped records
Mate unmapped records are identifiable by the 8 SAM flag. It is possible for... -
Coverage - Read depth metrics
Coverage (a.k.a. read depth) describes the amount of sequence data that is av... -
JEXL filtering expressions
JEXL stands for Java EXpression Language. It's not a part of the GATK as such... -
PF reads - Illumina chastity filter
Illumina sequencers perform an internal quality filtering procedure called ch... -
Read filters
Read filters are internal filters that can be applied by the GATK engine when... -
GitHub basics for researchers
This guide introduces select elements of the broadinstitute/gatk GitHub repos... -
Jar caching
Jar caching is a thing you can do to speed up the process of running Spark to... -
RefSeq - gene list format
From the NCBI RefSeq website The Reference Sequence (RefSeq) collection a... -
Human genome reference builds - GRCh38 or hg38 - b37 - hg19
This document covers the specifics of human genome reference assemblies. For ... -
Hybrid selection (exome preparation)
Hybrid selection is a method that enables selection of specific sequences fro... -
Lane - Library - Sample - Cohort
There are four major organizational units for sequencing data that we use thr... -
Parallelism - Multithreading - Scatter Gather
Contents The concept of parallelism Parallel computing in practice (sort of... -
Inbreeding Coefficient
Overview Although the name Inbreeding Coefficient suggests it is a measure o... -
Heterozygosity
Heterozygosity in population genetics In the context of population genetics,... -
PED - Pedigree format
A pedigree is a structured description of the familial relationships between ... -
Rank Sum Test
Overview The Rank Sum Test, also known as Mann-Whitney-Wilcoxon U-test after... -
Variant annotations
Variant annotations can be produced by HaplotypeCaller, Mutect2, VariantAnnot... -
Known variants - Training resources - Truth sets
Many GATK tools require sets of known variant sites to operate correctly. Eac... -
Resource bundle
The GATK resource bundle is a collection of standard files for working with h... -
Pre-adapter artifacts (in hybrid selection) - Bait bias
Various sources of error affect the hybrid selection (HS) process. Pre-adapte... -
SAM or BAM or CRAM - Mapped sequence data formats
SAM, BAM and CRAM are all different forms of the original SAM format that was...