Alignment metrics
Category Metrics
Overview
High level metrics about the alignment of reads within a SAM file, produced by the CollectAlignmentSummaryMetrics program and usually stored in a file with the extension ".alignment_summary_metrics".This table summarizes the values that are specific to this metric.
Metric | Summary |
---|---|
CATEGORY | One of either UNPAIRED (for a fragment run), FIRST_OF_PAIR when metrics are for only the first read in a paired run, SECOND_OF_PAIR when the metrics are for only the second read in a paired run or PAIR when the metrics are aggregated for both first and second reads in a pair. |
TOTAL_READS | The total number of reads including all PF and non-PF reads. When CATEGORY equals PAIR this value will be 2x the number of clusters. |
PF_READS | The number of PF reads where PF is defined as passing Illumina's filter. |
PCT_PF_READS | The fraction of reads that are PF (PF_READS / TOTAL_READS) |
PF_NOISE_READS | The number of PF reads that are marked as noise reads. A noise read is one which is composed entirely of A bases and/or N bases. These reads are marked as they are usually artifactual and are of no use in downstream analysis. |
PF_READS_ALIGNED | The number of PF reads that were aligned to the reference sequence. This includes reads that aligned with low quality (i.e. their alignments are ambiguous). |
PCT_PF_READS_ALIGNED | The percentage of PF reads that aligned to the reference sequence. PF_READS_ALIGNED / PF_READS |
PF_ALIGNED_BASES | The total number of aligned bases, in all mapped PF reads, that are aligned to the reference sequence. |
PF_HQ_ALIGNED_READS | The number of PF reads that were aligned to the reference sequence with a mapping quality of Q20 or higher signifying that the aligner estimates a 1/100 (or smaller) chance that the alignment is wrong. |
PF_HQ_ALIGNED_BASES | The number of bases aligned to the reference sequence in reads that were mapped at high quality. Will usually approximate PF_HQ_ALIGNED_READS * READ_LENGTH but may differ when either mixed read lengths are present or many reads are aligned with gaps. |
PF_HQ_ALIGNED_Q20_BASES | The subset of PF_HQ_ALIGNED_BASES where the base call quality was Q20 or higher. |
PF_HQ_MEDIAN_MISMATCHES | The median number of mismatches versus the reference sequence in reads that were aligned to the reference at high quality (i.e. PF_HQ_ALIGNED READS). |
PF_MISMATCH_RATE | The rate of bases mismatching the reference for all bases aligned to the reference sequence. |
PF_HQ_ERROR_RATE | The fraction of bases that mismatch the reference in PF HQ aligned reads. |
PF_INDEL_RATE | The number of insertion and deletion events per 100 aligned bases. Uses the number of events as the numerator, not the number of inserted or deleted bases. |
MEAN_READ_LENGTH | The mean read length of the set of reads examined. When looking at the data for a single lane with equal length reads this number is just the read length. When looking at data for merged lanes with differing read lengths this is the mean read length of all reads. Computed using all read lengths including clipped bases. |
SD_READ_LENGTH | The standard deviation of the read lengths. Computed using all read lengths including clipped bases. |
MEDIAN_READ_LENGTH | The median read length of the set of reads examined. When looking at the data for a single lane with equal length reads this number is just the read length. When looking at data for merged lanes with differing read lengths this is the median read length of all reads. Computed using all bases in reads, including clipped bases. |
MAD_READ_LENGTH | The median absolute deviation of the distribution of all read lengths. If the distribution is essentially normal then the standard deviation can be estimated as ~1.4826 * MAD. Computed using all read lengths including clipped bases. |
MIN_READ_LENGTH | The minimum read length. Computed using all read lengths including clipped bases. |
MAX_READ_LENGTH | The maximum read length. Computed using all read lengths including clipped bases. |
MEAN_ALIGNED_READ_LENGTH | The mean aligned read length of the set of reads examined. When looking at the data for a single lane with equal length reads this number is just the read length. When looking at data for merged lanes with differing read lengths this is the mean read length of all reads. Clipped bases are not counted. |
READS_ALIGNED_IN_PAIRS | The number of aligned reads whose mate pair was also aligned to the reference. |
PCT_READS_ALIGNED_IN_PAIRS | The fraction of aligned reads whose mate pair was also aligned to the reference. READS_ALIGNED_IN_PAIRS / PF_READS_ALIGNED |
PF_READS_IMPROPER_PAIRS | The number of (primary) aligned reads that are **not** "properly" aligned in pairs (as per SAM flag 0x2). |
PCT_PF_READS_IMPROPER_PAIRS | The fraction of (primary) reads that are *not* "properly" aligned in pairs (as per SAM flag 0x2). PF_READS_IMPROPER_PAIRS / PF_READS_ALIGNED |
BAD_CYCLES | The number of instrument cycles in which 80% or more of base calls were no-calls. |
STRAND_BALANCE | The number of PF reads aligned to the positive strand of the genome divided by the number of PF reads aligned to the genome. |
PCT_CHIMERAS | The fraction of reads that map outside of a maximum insert size (usually 100kb) or that have the two ends mapping to different chromosomes. |
PCT_ADAPTER | The fraction of PF reads that are unaligned or aligned with MQ0 and match to a known adapter sequence right from the start of the read (indication of adapter-dimer pairs). |
PCT_SOFTCLIP | the fraction of PF bases that are on (primary) aligned reads and are soft-clipped, as a fraction of the PF_ALIGNED_BASES (even though these are not aligned!) |
PCT_HARDCLIP | The fraction of PF bases that are (on primary, aligned reads and) hard-clipped, as a fraction of the PF_ALIGNED_BASES (even though these are not aligned!) |
AVG_POS_3PRIME_SOFTCLIP_LENGTH | The average length of the soft-clipped bases at the 3' end of reads. This could be used as an estimate for the amount by which the insert-size must be increased in order to obtain a significant reduction in bases lost due to reading off the end of the insert. |
GATK version 4.6.0.0 built at Sat, 29 Jun 2024 20:47:29 -0400.
0 comments
Please sign in to leave a comment.