The mitochondrial genome poses several challenges to the identification and understanding of somatic variants. The circularity of the mitochondrial genome means that the breakpoint in the reference genome is at an arbitrary position in the non-coding control region, creating a challenge in analyzing variation. Additionally, insertions of mitochondrial DNA into the nuclear genome (NuMTs) complicate the mapping of the mitochondrial genome and the distinction between NuMTs and the mitochondrial contig of interest. Lastly, mitochondrial variants often have very low heteroplasmy. Such low allele fraction (AF) variants can thus be mistaken for inherent sequencer noise.
The pipeline for mitochondrial variant discovery, using Mutect2, uses a high sensitivity to low AF and separate alignments using opposite genome breakpoints to allow for the tracing of lineages of rare mitochondrial variants.
Expected Input
This workflow requires a WGS h38 CRAM or BAM file and the median of the coverage over the autosome (A reference fasta file is only required if the input is a CRAM file).
Main Steps
Subset to keep only the reads mapped to the mitochondrial genome
Tools involved: PrintReads - SubsetBamtoChrM
This step filters the input WGS file to keep only the ChrM mapped reads.
Revert the ChrM mapped reads from an aligned BAM to an unaligned BAM file
Tools involved: RevertSam
This step reverts the aligned BAM file containing only the reads mapped to the mitochondria to remove all alignment information while retaining the recalibrated base qualities and original alignment tags.
Align the unmapped BAM file with the reference aligned BAM and shifted reference aligned BAM
Tools involved: MergeBamAlignment
This step merges the unaligned BAM file with the reference BAM file for the mitochondrial genome. The BAM file must also be aligned with the shifted mitochondrial BAM file. The shifted reference moves the breakpoint of the mitochondrial genome from the non-coding control region to the opposite side of the contig. This allows for sensitivity in the control region to account for variability across individuals.
Identify Duplicate Reads
Tools involved: MarkDuplicates
This step identifies and tags duplicate reads in the aligned BAM files.
Collect coverage and performance metrics for BAM file
Tools involved: CollectWgsMetrics
This step collects several metrics for evaluating the coverage and performance of the WGS experiment. This includes the fraction of bases that pass base and mapping quality filters as well as coverage levels.
Call variants in aligned BAM files
Tools involved: Mutect2 - CallMt
and CallShiftedMt
This step calls mitochondrial variants in the non-control region using the BAM file aligned to the mitochondrial reference and in the control region using the BAM file aligned to the shifted mitochondrial reference. Running Mutect2 in mitochondrial mode automatically sets parameters appropriately for calling on mitochondria with the --mitochondria
flag. Specifically, the mode sets –-initial-tumor-lod
to 0, –-tumor-lod-to-emit
to 0, --af-of-alleles-not-in-resource
to 4e-3, and the advanced parameter --pruning-lod-threshold
to -4.
Setting the advanced option --median-autosomal-coverage argument
(default 0) activates a recommended filter against likely erroneously mapped NuMTs (nuclear mitochondrial DNA segments). For the value, provide the median coverage expected in autosomal regions with coverage.
Liftover the output VCF files
Tools involved: LiftoverVcf
This step returns the variant calls back to the standard numbering system with the original alignment (OA) tags.
Combine the variant calls from the control region with the non-control region
Tools involved: MergeVcfs
This step merges the output VCF file for the control region (BAM aligned to shifted reference) with the VCF file for the non-control region into a single variant file.
Merge stats files for output VCFs
Tools Involved: Mutect2- MergeMutectStats
This step merges the stats file for the variant calls of the control region with the stats file for the variant calls of the non-control region.
Filter the Variant Calls by Parameters
Tools involved: FilterMutectCalls
This step filters the output VCF files based on specific parameters, such as a minimum allele fraction, maximum alternate allele count, and estimate of contamination. The --autosomal-coverage
parameter specifically filters out potential NuMTs. Specifying the --mitochondrial-mode
parameter automatically sets the filters to the mitochondrial defaults.
Filter out Blacklisted Sites
Tools involved: VariantFiltration
This step filters out blacklisted sites containing unwanted artifacts.
7 comments
Is this pipeline for germline mutation calling or only somatic? What is the best way to call variants in the mitochondrial genome with whole genome data? Thank you in advance.
Hi GATK team, Thanks for generating this pipeline. I am wondering is this pipeline suitable for WES data which included the mitochondrial reads coverage(probe)?
Thanks,
Is this pipeline specific to human, or can it be used for other species such as mus musculus?
Thanks!
Hi team!
Thanks for this. I am using gatk4-4.2.0 and it seems like two flags are no longer supported :
--median-autosomal-coverage (CallMt step using Mutect2)
A USER ERROR has occurred: median-autosomal-coverage is not a recognized option
--autosomal-coverage (Filter the variant by parameter step using FilterMutectCalls)
A USER ERROR has occurred: autosomal-coverage is not a recognized option
They are not mentioned in the updates with the newer releases of GATK4. Will they be included in future release or should the best practices be updated? Thanks a lot for everything!
Hi, How can this process obtain gvcf files? Thanks!
Hello - what is the recommended heteroplasmic variant cutoff for this pipeline version? In a 2022 benchmarking paper (PMID: 35350246) Haplotypecaller v4.2 had poor sensitivity for low frequency variants ( < ~15%). The authors cited Mutect2 in the methods as a recent GATK implementation but the study does not incorporate the algorithm. Do you have any data that compares this Mutect2-based pipeline to Haplotypecaller v4.2 or other callers (e.g., Mutserve)? Thank you.
Is it possible to run the mito pipeline in tumor/matched normal mode ? I imagine the way to do it would be to create a panel of normals using the tumor only mode option and then proceed with the rest of the GATK pipe.
Please sign in to leave a comment.