GATK Best Practices Mitochondrial Analysis
AnsweredI expect everything is a bit touch and go at the moment with the site migration, and that things will be rolled out slowly in batches, but when/if you do put together an official Best Practices for Mitochondrial Short Variants document, would it be possible for more comprehensive explanations / usage cases / maybe an algorithm/process flowchart? Something along these lines maybe: https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels- ??
When I reverse engineered a command line pipe from the Terra vcfs, I had to pull from and piece together several different sources of info including but not limited to:
The actual GATK workflow and wdl scripts: https://github.com/gatk-workflows/gatk4-mitochondria-pipeline
The old announcement Mitochondrial Analysis discussion thread: https://gatkforums.broadinstitute.org/gatk/discussion/23598/new-mitochondrial-analysis-with-mutect2#latest
The usage docs for Mutect2, FilterMutectCalls, and VariantFiltration.
Mutect2 error reports: https://gatkforums.broadinstitute.org/gatk/discussion/23863/mutect2-mitochondria-flag-doesnt-work -- (P.S. This error is still being perpetuated in the newer Mutect2 tool index usage cases. Isn't the argument --mitochondria-mode not --mitochondria?).
And also the Terra input/output guides, which can be even more obtuse.
..
..
Sidenote: I've been working with this pipeline for several months now and I still don't quite understand how some of the bits and pieces work. Can someone explain in detail the algorithm/logic behind NuMT filtration using median autosomal coverage as a filtering statistic? Is it basically the same as https://gatk.broadinstitute.org/hc/en-us/articles/360036732951-PolymorphicNuMT in 4.1.2.0??
-
Hi Halima Alachram,
The SubsetBamtoChrM step in the mitochondrial pipeline is designed to subset only the reads that mapped to the mitochondrial reference, which is necessary for selectively calling mitochondrial variants. The strict qualifications for mapping are used to try to get around issues with nuclear mitochondrial DNA segments (NuMTs) and to ensure high precision in variant calling. You can read more about the purposes of the pipeline and steps here Does this help answer your question?
Kind regards,
Pamela
-
Hi Pamela Bretscher,
Thank you for your answer and the shared link.
I have another question. In the gnomAD documention of the pipeline it is stated that " the mtDNA VCF file denotes homoplasmic genotypes as “1/1,” indicating 95-100% alternate bases, and heteroplasmic genotypes as “0/1,” indicating < 95% alternative bases".
However, when I applied the pipeline to my samples, the final VCF includes variants which are present as heteroplasmic with genotype 0/1 when they have, however, a variant allele fraction > 95%.
Aren't variants with a variant allele fraction > 95% supposed to be homoplasmic with a genotpte 1/1 in the final VCF ?
Halima
-
Hi Halima Alachram,
I am not very familiar with the gnomAD documentation, but your result does seem to contradict what is stated in this document. It's possible that the current mitochondrial mode algorithms have changed slightly since this article was written. Could you provide some lines of your final VCF where you are seeing this discrepancy so I can look into it further?
Kind regards,
Pamela
-
Hi Pamela Bretscher,
Here is an example of two variants where the allele fractions of alternate alleles are almost 1 and the GT is 0/1 ( heteroplasmic).
chrM 152 . T C . PASS CONTQ=93;DP=4092;ECNT=2;MBQ=20,20;MFRL=111,129;MMQ=60,60;MPOS=25;OCM=0;POPAF=2.40;SEQQ=93;STRANDQ=93;TLOD=10457.72 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:4,3864:1.000:3868:4,1789:0,2052:2,2,1965,1899
chrM 263 . A G . PASS CONTQ=93;DP=3071;ECNT=8;MBQ=20,20;MFRL=155,137;MMQ=60,60;MPOS=31;OCM=0;POPAF=2.40;SEQQ=93;STRANDQ=93;TLOD=8657.99 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:3,3011:0.999:3014:3,1342:0,1621:1,2,1292,1719
I have also applied Mutect2 separately (not as a part of the pipeline) to the same samples, and obtained the same genotype for the same variants.
However, I am getting the same variants as homoplasmic using other variant callers.
Thank you
Halima
-
Hi Halima Alachram,
Thank you for providing these lines of your file. Have you run the full mitochondrial pipeline? The add_annotations.py step of the pipeline is the step where the genotypes are actually adjusted based on the homo/heteroplasmy. If you have run the full pipeline already, then your results do seem concerning, but I suspect this is why you are seeing these values in your vcf.
Kind regards,
Pamela
-
Hi Pamela Bretscher,
Thank you for the clarification!
I realized that the genotypes are adjusted in a further step of the pipeline.
Unfortunately, I didn't run the full pipeline since I was running the wdl script of the pipeline locally https://github.com/gatk-workflows/gatk4-mitochondria-pipeline and I wasn't aware of the second part of the pipeline. Apparently, this part doesn't work locally. Only on Terra.
Kind regards,
Halima
-
Hi Halima Alachram,
Okay, that makes sense. I'm glad your results now make more sense when running the full pipeline. Please let me know if you have any other questions or concerns.
Kind regards,
Pamela
-
Thank you Pamela Bretscher!
One more question. Is it possible somehow to run the second part of the pipeline locally?
Best regards
Halima
-
Hi Halima Alachram,
I was under the impression that the full pipeline could be run locally. You mentioned that it only worked on Terra, had you attempted to run it locally?
-
I haven't attempted yet. I presume it could be run somehow locally since the Python scripts have well defined arguments. However, it is a cloud-based framework and some of the input data are obtained through Terra such as Participant_data/id. I will try if some edits would help to get it working locally.
Any help from the GATK team would be appreciated, though.
Best regards
Halima
-
Hi Halima Alachram,
The workflows should be able to run on any platform that supports WDL execution and you should be able to download the data from Terra for use locally. You might find this document helpful on running pipelines locally versus in a Google Cloud Platform. I'm happy to help you if you run into any issues or concerns.
Kind regards,
Pamela
-
Hi Pamela Bretscher,
This is exactly what I did to run the Mutect2 WDL script available here https://github.com/gatk-workflows/gatk4-mitochondria-pipeline based on the described steps from the document you shared and I had no issues.
I was also trying to run the second part which is the gnomad-mitochondria script https://github.com/broadinstitute/gnomad-mitochondria without the Terra input by adjusting the inputs using the output data I obtained from the local script I mentioned above. It partially worked. However, I am still having some issues. Therefore, I am trying now to re-run the workflow on Terra.
Is there any comprehensive explanations or a clear documentation/flowchart about the steps of the full pipeline? I can't find everything that is needed to properly run this pipeline in one place.
Thank you
Halima
-
Hi Halima Alachram,
I was not able to find any comprehensive document that shows the flow of the first part of the pipeline to the second part, so I can put in a documentation request for this to possibly be created in the future. What sort of issues did you have when running the second part of the pipeline locally? I am relatively unfamiliar with the downstream portion of this pipeline, so may get some more insight by writing into the Terra Community Forum.
Kind regards,
Pamela
-
Hi Pamela Bretscher,
Thank you for your help.
The pipeline is working locally. I am just having issues with some inputs which I am trying to understand. I have written to Terra Community Forum and it is helping.
Just one minor question. "Variants with heteroplasmy < 10%" are variants with a VAF <10%?
Best regards,
Halima
-
Hi Halima Alachram,
Great, I'm glad to hear the Terra support team is able to help you out with the pipeline. Yes, you're correct the proportion of heteroplasmy is essentially the VAF. Please let me know if I can help you with anything else.
Kind regards,
Pamela
Please sign in to leave a comment.
45 comments