Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK Best Practices Mitochondrial Analysis

Answered
1

45 comments

  • Avatar
    Pamela Bretscher

    Hi Halima Alachram,

    The SubsetBamtoChrM step in the mitochondrial pipeline is designed to subset only the reads that mapped to the mitochondrial reference, which is necessary for selectively calling mitochondrial variants. The strict qualifications for mapping are used to try to get around issues with nuclear mitochondrial DNA segments (NuMTs) and to ensure high precision in variant calling. You can read more about the purposes of the pipeline and steps here Does this help answer your question? 

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Halima Alachram

    Hi Pamela Bretscher,

    Thank you for your answer and the shared link.

    I have another question. In the gnomAD documention of the pipeline it is stated that " the mtDNA VCF file denotes homoplasmic genotypes as “1/1,” indicating 95-100% alternate bases, and heteroplasmic genotypes as “0/1,” indicating < 95% alternative bases".

    However, when I applied the pipeline to my samples, the final VCF includes variants which are present as heteroplasmic with genotype 0/1 when they have, however, a variant allele fraction > 95%.

    Aren't variants with a variant allele fraction > 95% supposed to be homoplasmic with a genotpte 1/1 in the final VCF ?

     

    Halima

     

     

     
    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Halima Alachram,

    I am not very familiar with the gnomAD documentation, but your result does seem to contradict what is stated in this document. It's possible that the current mitochondrial mode algorithms have changed slightly since this article was written. Could you provide some lines of your final VCF where you are seeing this discrepancy so I can look into it further?

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Halima Alachram

    Hi Pamela Bretscher,

    Here is an example of two variants where the allele fractions of alternate alleles are almost 1 and the GT is 0/1 ( heteroplasmic).

    chrM 152 . T   C . PASS CONTQ=93;DP=4092;ECNT=2;MBQ=20,20;MFRL=111,129;MMQ=60,60;MPOS=25;OCM=0;POPAF=2.40;SEQQ=93;STRANDQ=93;TLOD=10457.72 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:4,3864:1.000:3868:4,1789:0,2052:2,2,1965,1899

     

    chrM 263 . A   G . PASS CONTQ=93;DP=3071;ECNT=8;MBQ=20,20;MFRL=155,137;MMQ=60,60;MPOS=31;OCM=0;POPAF=2.40;SEQQ=93;STRANDQ=93;TLOD=8657.99 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:3,3011:0.999:3014:3,1342:0,1621:1,2,1292,1719

     

    I have also applied Mutect2 separately (not as a part of the pipeline) to the same samples, and obtained the same genotype for the same variants.

    However, I am getting the same variants as homoplasmic using other variant callers.

     

    Thank you

    Halima

     

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Halima Alachram,

    Thank you for providing these lines of your file. Have you run the full mitochondrial pipeline? The add_annotations.py step of the pipeline is the step where the genotypes are actually adjusted based on the homo/heteroplasmy. If you have run the full pipeline already, then your results do seem concerning, but I suspect this is why you are seeing these values in your vcf.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Halima Alachram

    Hi Pamela Bretscher,

    Thank you for the clarification!

    I realized that the genotypes are adjusted in a further step of the pipeline.

    Unfortunately, I didn't run the full pipeline since I was running the wdl script of the pipeline locally https://github.com/gatk-workflows/gatk4-mitochondria-pipeline and I wasn't aware of the second part of the pipeline. Apparently, this part doesn't work locally. Only on Terra. 

    Kind regards,

    Halima

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Halima Alachram,

    Okay, that makes sense. I'm glad your results now make more sense when running the full pipeline. Please let me know if you have any other questions or concerns.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Halima Alachram

    Thank you Pamela Bretscher

    One more question. Is it possible somehow to run the second part of the pipeline locally?

    Best regards 

    Halima

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Halima Alachram,

    I was under the impression that the full pipeline could be run locally. You mentioned that it only worked on Terra, had you attempted to run it locally? 

    0
    Comment actions Permalink
  • Avatar
    Halima Alachram

    Hi Pamela Bretscher

    I haven't attempted yet. I presume it could be run somehow locally since the Python scripts have well defined arguments. However, it is a cloud-based framework and some of the input data are obtained through Terra such as Participant_data/id. I will try if some edits would help to get it working locally.

    Any help from the GATK team would be appreciated, though.

    Best regards

    Halima

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Halima Alachram,

    The workflows should be able to run on any platform that supports WDL execution and you should be able to download the data from Terra for use locally. You might find this document helpful on running pipelines locally versus in a Google Cloud Platform. I'm happy to help you if you run into any issues or concerns.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Halima Alachram

    Hi Pamela Bretscher,

    This is exactly what I did to run the Mutect2 WDL script available here https://github.com/gatk-workflows/gatk4-mitochondria-pipeline based on the described steps from the document you shared and I had no issues.

    I was also trying to run the second part which is the gnomad-mitochondria script https://github.com/broadinstitute/gnomad-mitochondria without the Terra input by adjusting the inputs using the output data I obtained from the local script I mentioned above. It partially worked. However, I am still having some issues. Therefore, I am trying now to re-run the workflow on Terra.

    Is there any comprehensive explanations or a clear documentation/flowchart about the steps of the full pipeline? I can't find everything that is needed to properly run this pipeline in one place.

    Thank you

    Halima

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Halima Alachram,

    I was not able to find any comprehensive document that shows the flow of the first part of the pipeline to the second part, so I can put in a documentation request for this to possibly be created in the future. What sort of issues did you have when running the second part of the pipeline locally? I am relatively unfamiliar with the downstream portion of this pipeline, so may get some more insight by writing into the Terra Community Forum.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Halima Alachram

    Hi Pamela Bretscher,

    Thank you for your help.

    The pipeline is working locally. I am just having issues with some inputs which I am trying to understand. I have written to Terra Community Forum and it is helping.

    Just one minor question. "Variants with heteroplasmy < 10%" are variants with a VAF <10%?

    Best regards,

    Halima

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Halima Alachram,

    Great, I'm glad to hear the Terra support team is able to help you out with the pipeline. Yes, you're correct the proportion of heteroplasmy is essentially the VAF. Please let me know if I can help you with anything else.

    Kind regards,

    Pamela

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk