Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Dosages with Mutect2

0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi, there is an option to add annotations in Mutect2 with the option --annotations. Please see here: https://gatk.broadinstitute.org/hc/en-us/articles/360045800552-Mutect2

    Also see these links for other documentation that discuss Mutect2 annotations:

    1
    Comment actions Permalink
  • Avatar
    Leore Nehemie Bensabath

    Thanks a lot ! 

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Leore Nehemie Bensabath Mutect2 does not output a dosage in the sense of copy number, but in my limited understanding I think that copy number is a fairly ill-defined concept for mitochondria (except for single-cell data, of course) anyway.  The allelic depth (AD) FORMAT field is probably the closest to what you need; it gives the supporting read count for each allele.  Is this at all adequate for your needs?

    0
    Comment actions Permalink
  • Avatar
    Leore Nehemie Bensabath

    David Benjamin thanks for your answer !  

    I am not sure if we are talking about the same kind of dosage, I am pretty new to the field. The genotype dosage I had in mind is the one defined as : 

    DS = P(genotype = 0/1 | data) + 2*P(genotype = 1/1 | data) 

    which has a value between 0 and 2. 

    So I guess besides the read count for each alleles, those likelihoods used to compute the dosage also takes into account the quality of the supporting reads. 

    I was also interesting in studying the copy number in my data though, so what you say is actually very useful. Could you please explain me why you it is ill-defined for mitochondria data and why the allelic depth is more suitable ? 

     

    About my initial question, the only annotation available with mutect2 regarding genotype likelihood that I see is the TLOD, which doesn't give enough information to compute the dosage. I am surprised that there is no way to get the genotypes likelihoods via the Phred-scaled likelihoods (as they are output by Haplotypecaller) or else, because from my understanding they are computed by Mutect2 algorithm anyway. 

    Genevieve Brandt I just want to make sure that I am not missing anything and that there is indeed no way to output the genotype likelihoods for each variant. Could you please confirm that ? Thanks ! 

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    I think we are talking about the same thing, though it's not apparent at first.  The definition of dosage,

    DS = P(genotype = 0/1 | data) + 2*P(genotype = 1/1 | data) 

    is the expected copy number of the alt allele, since it can be written explicitly as the average:

    DS = 0 * P(0 copies of alt allele | data) + 1 * P(1 copy of alt allele | data) + 2 * P(2 copies of alt allele | data)

     

    The difficulty is that this assumes everything is diploid, that is, that there are two copies of the genome per cell.  Mutect2 does not assume this because of the prevalence of both copy number variation and subclonal variation in cancer.  It is similarly impossible to assume for mitochondria because cells have hundreds to thousands of mitochondria, each with its own copy of the mitochondrial genome.

    The dosage would therefore have to be a sum SUM(n = 0 to 5000 or so, n * P(n copies of alt allele | data)), but even this is poorly-defined for bulk sequencing since (i) different cells have different numbers of mitochondria and (ii) even given the same numbers of mitochondria different cells may have different fractions of each allele.  This is why I think the only dosage you can use in this case is the overall fraction of mitochondria with a certain allele, which is roughly the allele depth divided by the total depth.

    If you end up using this non-diploid definition of dosage, you will still need to check whether your tools work with non-diploid data.

     

    0
    Comment actions Permalink
  • Avatar
    Leore Nehemie Bensabath

    Okay I understand. Thanks a lot, this is very helpful ! 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk