Dosages with Mutect2
a) GATK version used : gatk-4.1.3.0
b) Exact GATK commands used :
gatk Mutect2 -R ref.fa -L chrM --mitochondria-mode -I input.bam -O output.vcf
gatk FilterMutectCalls -R ref.fa -autosomal-coverage 10 --mitochondria-mode -V output.vcf -O output_filtered.vcf
Hi GATK team,
I am using Mutect2 on rat mitochondrial DNA samples and I need to compute the genotype dosage information of the called variants for further analysis with Plink2.
In the output vcf file, I don't see any field that would allow me to do so.
The annotations I have are :
INFO : CONTQ=93;DP=207;ECNT=1;MBQ=20,20;MFRL=84,87;MMQ=60,60;MPOS=11;OCM=0;POPAF=2.40;SEQQ=1;STRANDQ=3;TLOD=0.923
FORMAT : GT:AD:AF:DP:F1R2:F2R1:SB
The only likelihood information I see is the TLOD but from my understanding it is not enough to get the dosage.
Is there a way to get the GP or DS information with mutect2, which are the fields compatible for Plink2 to import dosages ? Or more generally is there anyway to get dosage information from the output of Mutect2
Thanks!
Leore Bensabath
-
Hi, there is an option to add annotations in Mutect2 with the option --annotations. Please see here: https://gatk.broadinstitute.org/hc/en-us/articles/360045800552-Mutect2
Also see these links for other documentation that discuss Mutect2 annotations:
-
Thanks a lot !
-
Leore Nehemie Bensabath Mutect2 does not output a dosage in the sense of copy number, but in my limited understanding I think that copy number is a fairly ill-defined concept for mitochondria (except for single-cell data, of course) anyway. The allelic depth (AD) FORMAT field is probably the closest to what you need; it gives the supporting read count for each allele. Is this at all adequate for your needs?
-
David Benjamin thanks for your answer !
I am not sure if we are talking about the same kind of dosage, I am pretty new to the field. The genotype dosage I had in mind is the one defined as :
DS = P(genotype = 0/1 | data) + 2*P(genotype = 1/1 | data)
which has a value between 0 and 2.
So I guess besides the read count for each alleles, those likelihoods used to compute the dosage also takes into account the quality of the supporting reads.
I was also interesting in studying the copy number in my data though, so what you say is actually very useful. Could you please explain me why you it is ill-defined for mitochondria data and why the allelic depth is more suitable ?
About my initial question, the only annotation available with mutect2 regarding genotype likelihood that I see is the TLOD, which doesn't give enough information to compute the dosage. I am surprised that there is no way to get the genotypes likelihoods via the Phred-scaled likelihoods (as they are output by Haplotypecaller) or else, because from my understanding they are computed by Mutect2 algorithm anyway.
Genevieve Brandt I just want to make sure that I am not missing anything and that there is indeed no way to output the genotype likelihoods for each variant. Could you please confirm that ? Thanks !
-
I think we are talking about the same thing, though it's not apparent at first. The definition of dosage,
DS = P(genotype = 0/1 | data) + 2*P(genotype = 1/1 | data)
is the expected copy number of the alt allele, since it can be written explicitly as the average:
DS = 0 * P(0 copies of alt allele | data) + 1 * P(1 copy of alt allele | data) + 2 * P(2 copies of alt allele | data)
The difficulty is that this assumes everything is diploid, that is, that there are two copies of the genome per cell. Mutect2 does not assume this because of the prevalence of both copy number variation and subclonal variation in cancer. It is similarly impossible to assume for mitochondria because cells have hundreds to thousands of mitochondria, each with its own copy of the mitochondrial genome.
The dosage would therefore have to be a sum SUM(n = 0 to 5000 or so, n * P(n copies of alt allele | data)), but even this is poorly-defined for bulk sequencing since (i) different cells have different numbers of mitochondria and (ii) even given the same numbers of mitochondria different cells may have different fractions of each allele. This is why I think the only dosage you can use in this case is the overall fraction of mitochondria with a certain allele, which is roughly the allele depth divided by the total depth.
If you end up using this non-diploid definition of dosage, you will still need to check whether your tools work with non-diploid data.
-
Okay I understand. Thanks a lot, this is very helpful !
Please sign in to leave a comment.
6 comments