The mitochondrial genome poses several challenges to the identification and understanding of somatic variants. The circularity of the mitochondrial genome means that the breakpoint in the reference genome is at an arbitrary position in the non-coding control region, creating a challenge in analyzing variation. Additionally, insertions of mitochondrial DNA into the nuclear genome (NuMTs) complicate the mapping of the mitochondrial genome and the distinction between NuMTs and the mitochondrial contig of interest. Lastly, mitochondrial variants often have very low heteroplasmy. Such low allele fraction (AF) variants can thus be mistaken for inherent sequencer noise.
The pipeline for mitochondrial variant discovery, using Mutect2, uses a high sensitivity to low AF and separate alignments using opposite genome breakpoints to allow for the tracing of lineages of rare mitochondrial variants.
This workflow requires a WGS h38 CRAM or BAM file and the median of the coverage over the autosome (A reference fasta file is only required if the input is a CRAM file).
Subset to keep only the reads mapped to the mitochondrial genome
Tools involved: PrintReads -
This step filters the input WGS file to keep only the ChrM mapped reads.
Revert the ChrM mapped reads from an aligned BAM to an unaligned BAM file
Tools involved: RevertSam
This step reverts the aligned BAM file containing only the reads mapped to the mitochondria to remove all alignment information while retaining the recalibrated base qualities and original alignment tags.
Align the unmapped BAM file with the reference aligned BAM and shifted reference aligned BAM
Tools involved: MergeBamAlignment
This step merges the unaligned BAM file with the reference BAM file for the mitochondrial genome. The BAM file must also be aligned with the shifted mitochondrial BAM file. The shifted reference moves the breakpoint of the mitochondrial genome from the non-coding control region to the opposite side of the contig. This allows for sensitivity in the control region to account for variability across individuals.
Identify Duplicate Reads
Tools involved: MarkDuplicates
This step identifies and tags duplicate reads in the aligned BAM files.
Collect coverage and performance metrics for BAM file
Tools involved: CollectWgsMetrics
This step collects several metrics for evaluating the coverage and performance of the WGS experiment. This includes the fraction of bases that pass base and mapping quality filters as well as coverage levels.
Call variants in aligned BAM files
Tools involved: Mutect2 -
This step calls mitochondrial variants in the non-control region using the BAM file aligned to the mitochondrial reference and in the control region using the BAM file aligned to the shifted mitochondrial reference. Running Mutect2 in mitochondrial mode automatically sets parameters appropriately for calling on mitochondria with the
--mitochondria flag. Specifically, the mode sets
–-initial-tumor-lod to 0,
–-tumor-lod-to-emit to 0,
--af-of-alleles-not-in-resource to 4e-3, and the advanced parameter
--pruning-lod-threshold to -4.
Setting the advanced option
--median-autosomal-coverage argument (default 0) activates a recommended filter against likely erroneously mapped NuMTs (nuclear mitochondrial DNA segments). For the value, provide the median coverage expected in autosomal regions with coverage.
Liftover the output VCF files
Tools involved: LiftoverVcf
This step returns the variant calls back to the standard numbering system with the original alignment (OA) tags.
Combine the variant calls from the control region with the non-control region
Tools involved: MergeVcfs
This step merges the output VCF file for the control region (BAM aligned to shifted reference) with the VCF file for the non-control region into a single variant file.
Merge stats files for output VCFs
Tools Involved: Mutect2-
This step merges the stats file for the variant calls of the control region with the stats file for the variant calls of the non-control region.
Filter the Variant Calls by Parameters
Tools involved: FilterMutectCalls
This step filters the output VCF files based on specific parameters, such as a minimum allele fraction, maximum alternate allele count, and estimate of contamination. The
--autosomal-coverage parameter specifically filters out potential NuMTs. Specifying the
--mitochondrial-mode parameter automatically sets the filters to the mitochondrial defaults.
Filter out Blacklisted Sites
Tools involved: VariantFiltration
This step filters out blacklisted sites containing unwanted artifacts.
Is this pipeline for germline mutation calling or only somatic? What is the best way to call variants in the mitochondrial genome with whole genome data? Thank you in advance.
Hi GATK team, Thanks for generating this pipeline. I am wondering is this pipeline suitable for WES data which included the mitochondrial reads coverage(probe)?
Is this pipeline specific to human, or can it be used for other species such as mus musculus?
Thanks for this. I am using gatk4-4.2.0 and it seems like two flags are no longer supported :
--median-autosomal-coverage (CallMt step using Mutect2)
A USER ERROR has occurred: median-autosomal-coverage is not a recognized option
--autosomal-coverage (Filter the variant by parameter step using FilterMutectCalls)
A USER ERROR has occurred: autosomal-coverage is not a recognized option
They are not mentioned in the updates with the newer releases of GATK4. Will they be included in future release or should the best practices be updated? Thanks a lot for everything!
Hi, How can this process obtain gvcf files？ Thanks！
Please sign in to leave a comment.