Tools that analyze read coverage to detect copy number variants.
|Annotates intervals with GC content, mappability, and segmental-duplication content
|Calls copy-ratio segments as amplified, deleted, or copy-number neutral
|Creates a panel of normals for read-count denoising
|Denoises read counts to produce denoised copy ratios
|Determines the baseline contig ploidy for germline samples given counts data
|Filters intervals based on annotations and/or count statistics
|Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy
|Models segmented copy ratios from denoised copy ratios and segmented minor-allele fractions from allelic counts
|Creates plots of denoised copy ratios
|Creates plots of denoised and segmented copy-ratio and minor-allele-fraction estimates
|Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios
Tools that count coverage, e.g. depth per allele
|Generates table of filtered base counts at het sites for allele specific expression
|**BETA** (EXPERIMENTAL) Processes reads from a MITESeq or other saturation mutagenesis experiment.
|Collects reference and alternate allele counts at specified sites
|Collects read counts at specified intervals
|Count bases in a SAM/BAM/CRAM file
|Counts bases in the input SAM/BAM
|Count reads in a SAM/BAM/CRAM file
|Counts reads in the input SAM/BAM
|**BETA** Generate coverage summary information for reads data
|**BETA** Evaluate gene expression from RNA-seq reads aligned to genome.
|Collects data for training normal artifact filter
|Tabulates pileup metrics for inferring contamination
|**BETA** Local assembler for SVs
|Prints read alignments in samtools pileup format
|**BETA** Prints read alignments in samtools pileup format
Tools that collect sequencing quality related and comparative metrics
|Combines multiple QualityYieldMetrics files into a single file.
|Combines multiple Variant Calling Metrics files into a single file
|Evaluate and compare base quality score recalibration (BQSR) tables
|Generate index statistics from a BAM file
|**BETA** (Internal) Collects read metrics relevant to structural variant discovery
|Calculate the fraction of reads coming from cross-sample contamination
|Calculate statistics on fingerprints, checking their viability
|Creates a hash code based on the read groups (RG).
|Computes a fingerprint from the supplied input (SAM/BAM/CRAM or VCF) file and compares it to the provided genotypes
|Compare GATK's internal pileup to a reference Samtools mpileup
|Asserts the provided gzip file's (e.g., BAM) last block is well-formed; RC 100 otherwise
|Clusters the results of a CrosscheckFingerprints run by LOD score
|Produces a summary of alignment metrics from a SAM or BAM file.
|Collects summary and per-sample from the provided arrays VCF file
|Chart the nucleotide distribution per cycle in a SAM or BAM file
|**BETA** Collects base distribution per cycle in SAM/BAM/CRAM file(s).
|Collect metrics regarding GC bias.
|Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories.
|Collects hybrid-selection (HS) metrics for a SAM or BAM file.
|**EXPERIMENTAL** Estimates the rate of independent replication rate of reads within a bam.
|Collect metrics about the insert size distribution of a paired-end library.
|**BETA** Collects insert size distribution information on alignment data
|Collect jumping library metrics.
|Collect multiple classes of metrics.
|**BETA** Runs multiple metrics collection modules for a given alignment file
|Collect metrics to assess oxidative artifacts.
|Collect metrics about reads that pass quality thresholds and Illumina-specific filters.
|**BETA** Collects quality yield metrics from SAM/BAM/CRAM file(s).
|Collect whole genome sequencing-related metrics.
|Produces RNA alignment metrics for a SAM or BAM file.
|Collects metrics from reduced representation bisulfite sequencing (Rrbs) data.
|Program to collect error metrics on bases stratified in various ways.
|Collect metrics to quantify single-base sequencing artifacts.
|Calculate PCR-related metrics from targeted sequencing data.
|Collects per-sample and aggregate (spanning all samples) metrics from the provided VCF file
|Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
|**EXPERIMENTAL** Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
|Compares the base qualities of two SAM/BAM/CRAM files
|**BETA** Determine if two potentially identical BAMs have the same duplicate reads
|Compare two metrics files.
|Compare two input SAM/BAM/CRAM files.
|Extract OxoG metrics from generalized artifacts metrics.
|Checks that all data in the input files appear to have come from the same individual
|Estimates the numbers of unique molecules in a sequencing library.
|Accumulate flag statistics given a BAM file
|Spark tool to accumulate flag statistics
|Emit a single sample name
|Collect mean quality by cycle.
|**BETA** MeanQualityByCycle on Spark
|Chart the distribution of quality scores.
|**BETA** QualityScoreDistribution on Spark
|Validates a SAM/BAM/CRAM file.
|Prints a SAM or BAM file to the screen
Tools that process genomic intervals in various formats
|Converts a BED file to a Picard Interval List.
|Converts an Picard IntervalList file to a BED file.
|A tool for performing various IntervalList manipulations
|Lifts over an interval list from one reference build to another.
|Prepares bins for coverage collection
|Split intervals into sub-interval files.
Tools that perform metagenomic analysis, e.g. microbial community composition and pathogen detection
|Builds set of host reference k-mers
|Builds a taxonomy datafile of the microbe reference
|Step 2: Aligns reads to the microbe reference
|Step 1: Filters low quality, low complexity, duplicate, and host reads
|Combined tool that performs all steps: read filtering, microbe reference alignment, and abundance scoring
|Step 3: Classifies pathogen-aligned reads and generates abundance scores
Miscellaneous tools, e.g. those that aid in data streaming
|**BETA** Create a Hadoop BAM splitting index
|Provides a large, FIFO buffer that can be used to buffer input and output streams between programs.
|Gathers scattered BQSR recalibration reports into a single file
|**BETA** Gathers scattered VQSLOD tranches into a single file
|Creates an index for a feature file, e.g. VCF or BED file.
|**BETA** Parallel copy a file or directory from Google Cloud Storage into the HDFS file system used by Spark
|**EXPERIMENTAL** Replace bases in reads with reference bases.
|Condenses homRef blocks in a single-sample GVCF
Tools that manipulate read data in SAM, BAM or CRAM format
|Adds comments to the header of a BAM file.
|Record current alignment information to OA tag.
|Assigns all the reads in a file to a single new read-group.
|Apply base quality score recalibration
|**BETA** Apply base quality score recalibration on Spark
|**BETA** Both steps of BQSR (BaseRecalibrator and ApplyBQSR) on Spark
|Converts a BAM file into a BFQ (binary fastq formatted) file
|Generates recalibration table for Base Quality Score Recalibration (BQSR)
|**BETA** Generate recalibration table for Base Quality Score Recalibration (BQSR) on Spark
|Generates a BAM index ".bai" file.
|**BETA** Takes name-sorted file and runs BWA and MarkDuplicates.
|**BETA** Align reads to a given reference using BWA on Spark
|Cleans a SAM/BAM/CRAM files, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
|Clip reads in a SAM/BAM/CRAM file
|Collect Duplicate metrics from marked file.
|**BETA** Convert a headerless BAM shard into a readable BAM
|Downsample a SAM or BAM file.
|**BETA** Subsets reads by name
|Converts a FASTQ file to an unaligned BAM or SAM file
|Subsets reads from a SAM/BAM/CRAM file by applying one of several filters.
|Verify mate-pair information between mates and fix if needed.
|Fix Illumina base quality scores in a SAM/BAM/CRAM file
|Concatenate efficiently BAM files that resulted from a scattered parallel analysis
|Left-aligns indels from reads in a SAM/BAM/CRAM file
|Identifies duplicate reads.
|MarkDuplicates on Spark
|Identifies duplicate reads, accounting for mate CIGAR.
|Merge alignment data from a SAM or BAM with data in an unmapped BAM file.
|Merges multiple SAM/BAM/CRAM (and/or) files into a single file.
|Downsample a SAM or BAM file to retain a subset of the reads based on the reads location in each tile in the flowcell.
|**BETA** Reorder reads before running RSEM
|Unmaps reads with distant mates.
|Print reads in the SAM/BAM/CRAM file
|Print the header from a SAM/BAM/CRAM file
|PrintReads on Spark
|Reorders reads in a SAM or BAM file to match ordering in a second reference file.
|Replaces the SAMFileHeader in a SAM/BAM/CRAM file.
|Revert Quality Scores in a SAM/BAM/CRAM file
|Reverts the original base qualities and adds the mate cigar tag to read-group files
|Reverts SAM/BAM/CRAM files to a previous state.
|**BETA** Reverts SAM, BAM or CRAM files to a previous state.
|Convert a BAM file to a SAM file, or a SAM to a BAM
|Converts a SAM/BAM/CRAM file to FASTQ.
|DEPRECATED: Use SetNmMdAndUqTags instead.
|Fixes the NM, MD, and UQ tags in a SAM/BAM/CRAM file
|**EXPERIMENTAL** Examines aligned records in the supplied SAM or BAM file to locate duplicate molecules.
|Sorts a SAM, BAM or CRAM file.
|**BETA** SortSam on Spark (works on SAM/BAM/CRAM)
|Split Reads with N in Cigar
|Outputs reads from a SAM/BAM/CRAM by read group, sample and library name
|Splits a SAM/BAM/CRAM file into individual files by library
|Splits a SAM/BAM/CRAM file to multiple files.
|**EXPERIMENTAL** Incorporate read tags in a SAM file to that of a matching SAM file
|**EXPERIMENTAL** Identifies duplicate reads using information from read positions and UMIs.
|Clears the 0x400 duplicate SAM flag
Tools that analyze and manipulate FASTA format references
|Designs oligonucleotide baits for hybrid selection reactions.
|Create a BWA-MEM index image file for use with GATK BWA tools
|Composes a genome-wide STR location table used for DragSTR model auto-calibration
|Count the numbers of each base in a reference file
|Creates a sequence dictionary for a reference sequence.
|Subsets intervals from a reference sequence to a new FASTA file.
|Create an alternative reference by combining a fasta with a vcf.
|Create snippets of a fasta file
|**BETA** Identifies sequences that occur at high frequency in a reference
|Counts the number of non-N bases in a fasta file.
|Normalizes lines of sequence in a FASTA file to be of the same length.
|Writes an interval list created by splitting a reference at Ns.
|**BETA** Creates a shifted fasta file and shift_back file
Tools that perform variant calling and genotyping for short variants (SNPs, SNVs and Indels)
|estimates the parameters for the DRAGstr model
|Merges one or more HaplotypeCaller GVCF files into a single GVCF with appropriate annotations
|Import VCFs to GenomicsDB
|Perform joint genotyping on one or more samples pre-called with HaplotypeCaller
|**BETA** Perform "quick and dirty" joint genotyping on one or more samples pre-called with HaplotypeCaller
|Call germline SNPs and indels via local re-assembly of haplotypes
|**BETA** HaplotypeCaller on Spark
|Get the maximum likelihood estimates of artifact prior probabilities in the orientation bias mixture model filter
|Call somatic SNVs and indels via local assembly of haplotypes
|**BETA** Runs BWA (if specified), MarkDuplicates, BQSR, and HaplotypeCaller on unaligned or aligned reads to generate a VCF.
Tools that detect structural variants
|**BETA** (Internal) Tries to extract simple variants from a provided GATK-SV CPX.vcf
|**BETA** (Internal) Examines aligned contigs from local assemblies and calls structural variants
|**BETA** (Internal) Extracts evidence of structural variations from reads
|**BETA** (Internal) Produces local assemblies of genomic regions that may harbor structural variants
|**BETA** Clusters structural variants
|**BETA** Runs the structural variation discovery workflow on a single sample
|**BETA** (Internal) Examines aligned contigs from local assemblies and calls structural variants or their breakpoints
Tools that evaluate and refine variant calls, e.g. with annotations not offered by the engine
|(Internal) Annotate a vcf with a bam's read depth at each variant locus
|(Internal) Annotate a vcf with expected allele fractions in pooled sequencing
|Calculate genotype posterior probabilities given family and/or known population genotypes
|(Internal) Calculate proportions of different samples in a pooled bam
|Evaluate concordance of an input VCF against a validated truth VCF
|**BETA** Count PASS variants
|Counts variant records in a VCF file, regardless of filter status.
|CountVariants on Spark
|**BETA** Evaluate concordance of info fields in an input VCF against a validated truth VCF
|**EXPERIMENTAL** Filter variants based on clinically-significant Funcotations.
|Finds mendelian violations of all types within a VCF
|**BETA** Functional annotation for segment files. The output formats are not well-defined and subject to change.
|Data source downloader for Funcotator.
|Calculates the concordance between genotype data of one sample in each of two VCFs - truth (or reference) vs. calls.
|**EXPERIMENTAL** Check variants against tumor-normal bams representing the same samples, though not the ones from the actual calls.
|**BETA** General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more)
|Extract fields from a VCF file to a tab-delimited table
Tools that filter variants by annotating the FILTER column
|Apply a score cutoff to filter variants based on a recalibration table
|Apply a Convolutional Neural Net to filter annotated variants
|**EXPERIMENTAL** Train a CNN model for filtering variants
|**EXPERIMENTAL** Write variant tensors for training a CNN to filter variants
|**BETA** Make a panel of normals for use with Mutect2
|**EXPERIMENTAL** Filter alignment artifacts from a vcf callset.
|Filter somatic SNVs and indels called by Mutect2
|Apply tranche filtering
|Hard filters a VCF.
|Filter variant calls based on INFO and/or FORMAT annotations
|Build a recalibration model to score variant quality for filtering purposes
Tools that manipulate variant call format (VCF) data
|Replaces or fixes a VCF header.
|Gathers multiple VCF files from a scatter operation into a single VCF file
|**BETA** Gathers multiple VCF files from a scatter operation into a single VCF file
|Left align and trim vairants
|Lifts over a VCF file from one reference build to another.
|Creates a VCF that contains all the site-level information for all records in the input VCF but no genotype information.
|Combines multiple variant files into a single variant file
|Prints out variants from the input VCF.
|(Internal) Remove indels from the VCF file that are close to each other.
|Renames a sample within a VCF or BCF.
|Select a subset of variants from a VCF file
|Sorts one or more VCF files.
|Splits SNPs and INDELs into separate files.
|Updates the sequence dictionary in a variant file.
|Takes a VCF and a second file that contains a sequence dictionary and updates the VCF with the new sequence dictionary.
|Tool for adding annotations to VCF files
|Converts VCF to BCF or BCF to VCF.
|Converts a VCF or BCF file to a Picard Interval List
Tools that process sequencing machine data, e.g. Illumina base calls, and detect sequencing level attributes, e.g. adapters
|Asserts the validity for specified Illumina basecalling data.
|Collects Illumina Basecalling metrics for a sequencing run.
|Collects Illumina lane metrics for the given BaseCalling analysis directory.
|Tool determines the barcode for each read in an Illumina lane.
|Generate FASTQ file(s) from Illumina basecall read data.
|Transforms raw Illumina sequencing data into an unmapped SAM, BAM or CRAM file.
|Reads a SAM/BAM/CRAM file and rewrites it with new adapter-trimming tags.
Tools that manipulate data generated by Genotyping arrays
|Program to convert an Illumina bpm file into a bpm.csv file.
|Program to combine multiple genotyping array VCF files into one VCF.
|Program to generate a picard metrics file from the output of the bafRegress tool.
|Program to generate a picard metrics file from the output of the VerifyIDIntensity tool.
|Program to convert an Illumina GTC file to a VCF
|Program to merge a single-sample ped file from zCall into a single-sample VCF.
|Program to convert an Arrays VCF to an ADPC file.
Tools that perform methylation calling, processing bisulfite sequenced, methylation-aware aligned BAM
|**EXPERIMENTAL** Identify methylated bases from bisulfite sequenced, methylation-aware BAMs
Applied by engine to select reads for analysis
|Filters out reads where the alignment does not match the contents of the header
|Do not filter out any read
|Filters out reads that have greater than the threshold number of N bases
|Filter out reads with CIGAR containing N operator
|Filter out reads that have too many clipped bases on either end.
|Keep only reads that are first of pair
|Keep only read pairs with insert length less than or equal to the given value
|Keep only reads containing good CIGAR string
|Filter out reads without Read Group
|Filters out reads that don't overlap the specified region. NOTE: This approach to extracting overlapping reads is very slow compared to using PrintReads and -L on an indexed bam file.
|Keep only reads from the specified library
|Filter out unmapped reads
|Filter out reads without available mapping quality
|Filter out reads with mapping quality equal to zero
|Keep only reads with mapping qualities within a specified range
|Filter out reads where the bases and qualities do not match
|Keep only reads with mates mapped on the different strand
|Keep only paired reads with mates mapped >= mate-too-distant-length (default 1KB) apart or on different contigs
|Keep only reads whose mate maps to the same contig or is unmapped
|Filters reads whose mate is unmapped as well as unmapped reads.
|Filter out reads that fail platform quality checks, are unmapped and represent secondary/supplementary alignments
|Filters reads whose original alignment was chimeric.
|Filter out reads with fragment length different from zero
|Filter out reads that do not align to the reference
|Filter out reads marked as duplicate
|Keep only paired reads that are not properly paired
|Filter out reads representing secondary alignments
|Filter out reads representing supplementary alignments
|Filter out reads that are over-soft-clipped
|Filter out unpaired reads
|Filter out reads failing platfor/vendor quality checks
|Keep only reads with matching Read Group platform
|Filter out reads with matching platform unit attribute
|Keep only reads representing primary alignments (those that satisfy both the NotSecondaryAlignment and NotSupplementaryAlignment filters, or in terms of SAM flag values, must have neither of the 0x100 or 0x800 flags set).
|Keep only reads that are properly paired
|Keep records that don't match the specified filter string(s).
|Keep only reads from the specified read group
|Filter out reads where the read and CIGAR do not match in length
|Keep only reads whose length is within a certain range
|Keep only reads with this read name
|Keep only reads whose strand is as specified
|Keep only reads for a given sample
|Keep only paired reads that are second of pair
|Keep only reads with sequenced bases
|Filter out reads that are over-soft-clipped
|Keep only reads where the read end is properly aligned
|Keep only reads with a valid alignment start
|Keep only reads that are well-formed
Available to HaplotypeCaller, Mutect2, VariantAnnotator and GenotypeGVCFs. See https://software.broadinstitute.org/gatk/documentation/article?id=10836
|Allele-specific rank sum test of REF versus ALT base quality scores (AS_BaseQRankSum)
|Allele-specific strand bias estimated using Fisher's exact test (AS_FS)
|Allele-specific likelihood-based test for the consanguinity among samples (AS_InbreedingCoeff)
|Allele-specific rank sum test for mapping qualities of REF versus ALT reads (AS_MQRankSum)
|Allele-specific call confidence normalized by depth of sample reads supporting the allele (AS_QD)
|Allele-specific root-mean-square of the mapping quality of reads across all samples (AS_MQ)
|Allele-specific rank sum test for relative positioning of REF versus ALT allele within reads (AS_ReadPosRankSum)
|Allele-specific strand bias estimated by the symmetric odds ratio test (AS_SOR)
|Variant allele fraction for a genotype
|Total depth of coverage per sample and over all samples (DP)
|Describe the complexity of an assembly region
|Median base quality of bases supporting each allele (MBQ)
|Rank sum test of REF versus ALT base quality scores (BaseQRankSum)
|Counts and frequency of alleles in called genotypes (AC, AF, AN)
|Rank sum test for hard-clipped bases on REF versus ALT reads (ClippingRankSum)
|Number of Ns at the pileup
|Total depth of coverage per sample and over all samples (DP)
|Depth of coverage of each allele per sample (AD)
|Depth of informative coverage for each sample (DP)
|Phred-scaled p-value for exact test of excess heterozygosity (ExcessHet)
|Featurized read sets for Mutect3 training data
|Strand bias estimated using Fisher's exact test (FS)
|Depth of coverage of each allele per sample (AD)
|Median fragment length of reads supporting each allele (MFRL)
|Summary of genotype statistics from all samples (NCC, GQ_MEAN, GQ_STDDEV)
|Likelihood-based test for the consanguinity among samples (InbreedingCoeff)
|Rank sum test of per-read likelihoods of REF versus ALT reads (LikelihoodRankSum)
|Median mapping quality of reads supporting each allele (MMQ)
|Rank sum test for mapping qualities of REF versus ALT reads (MQRankSum)
|Count of all reads with MAPQ = 0 across all samples (MQ0)
|Count of read pairs in the F1R2 and F2R1 configurations supporting REF and ALT alleles (F1R2, F2R1)
|Number of alt reads with an OA tag that doesn't match the current alignment contig.
|Existence of a de novo mutation in at least one of the given families (hiConfDeNovo, loConfDeNovo)
|Variant confidence normalized by unfiltered depth of variant samples (QD)
|Root mean square of the mapping quality of reads across all samples (MQ)
|Rank sum test for relative positioning of REF versus ALT alleles within reads (ReadPosRankSum)
|Median distance of variant starts from ends of reads supporting each allele (MPOS)
|Annotate with local reference bases (REF_BASES)
|List of samples that are not homozygous reference at a variant site (Samples)
|Number of forward and reverse reads that support REF and ALT alleles (SB)
|Strand bias estimated by the symmetric odds ratio test (SOR)
|Tandem repeat unit composition and counts per allele (STR, RU, RPA)
|Number of non-duplicate-insert ALT reads (AS_UNIQ_ALT_READ_COUNT)
GATK version 126.96.36.199-SNAPSHOT built at Wed, 13 Apr 2022 13:12:10 -0700.