Joint calling with Mutect2 with mitochondria mode
Hello,
I am using Mutect2 with the mitochondria mode in order to call mitochondrial variants in BAM files from related individuals (trios with child and both parents). I would like to be able to perform joint calling using all three samples, however as per Mutect2 documentation (https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2) it only allows sample from one individual. I was wondering if you had any advice on steps to obtain a joint file in this case?
From this thread https://gatk.broadinstitute.org/hc/en-us/community/posts/360077166571-Mutect2-Joint-calling and this one https://github.com/broadinstitute/gatk/issues/4887 from a few years ago I can see that there was no definitive answer to the question and wanted to know if it was still the case.
Thank you,
Caroline
-
GVCF mode for Mutect2 is already wired up and ready in theory and practice for this kind of analysis but we do not have a definitive published way of running such cohort and trio runs for mitochondria joint calling. On the other hand we have some questions to you in terms of how this kind of work may not be what you are actually looking for
1- Mitochondria is inherited from the mother to the child so we were not so sure about why joint genotyping the father makes any sense in this kind of work. Can you elaborate on that?
2- Child's mitochondria can be considered as a sub tissue/sub sample of the mother's mitochondria therefore feeding both aligned data to Mutect2 and generating calls with the intention of separating events that are significantly different from mother's plasmy levels could be a better way of checking variants that could affect the child's health. So would you consider such an approach be a better way of pursuing your goal?
Regards
-
Thank you very much for your answer.
1- Yes, we are indeed interested in looking at the mother's data, sorry for not being clear about that.
2- I'm sorry but I'm not sure I fully understand the methodology you're proposing. By feeding both aligned data, do you mean giving both the mother and child data to Mutect2 as if they were samples from the same individual, like it is done in the "Tumor with matched normal" mode?
Thank you,
Caroline -
Hi again.
Yes you may call it tumor with matched normal where normal is mother and tumor would be the child's data or Pre and post treatment tumor samples as the pre will be mother and post would be child and pre sample would act as a pseudo normal. In any case heteroplasmy levels that significantly differ from the pre sample will be marked as PASS and anything that has similar plasmy levels will be marked as germline (Filter is named that way).
You may be able to leverage this filtering to figure out differentiating variants for possible phenotype cause.
I hope this helps.
Regards.
-
Hi again,
Thank you very much for your answer, this strategy does seem like it would be useful for what we want to do.I'd also like to know if you have any advice for the situation where we do not have the mother's data but we have siblings data and would want to compare them. In that case, do you think joint calling would make sense and if so would you have any recommended way of doing it?
Thank you,Caroline
-
Joint calling would make sense for siblings if there are no maternal data present. As I mentioned before tools are all available and ready for that purpose.
I hope this helps.
-
Hi,
Thank you again for your answer. Just to clarify, could you let me know which tools you are referring to?
I did see the pipeline below which mentions joint calling but I'm not quite sure which tools are used for this step.
I've also looked at the --emit-ref-confidence option for Mutect2 which can produce gvcf, would that be an option?
Thank you in advance,
Caroline -
GVCF mode of Mutect2 must be enabled for this purpose and you may need to combine or import those GVCF files using CombineGVCFs or GenomicsDBImport (which ever you prefer).
When using GenotypeGVCFs you may need to use our experimental parameter named
--input-is-somatic <Boolean> Finalize input GVCF according to somatic (i.e. Mutect2) TLODs (BETA feature) Default
value: false. Possible values: {true, false}to make sure that final result will be of somatic nature.
I hope this helps.
Regards.
-
Hi Gökalp Çelik,
I'm trying to produce gVCF files using the wdl pipeline suggested here https://github.com/broadinstitute/gatk/tree/master/scripts/mitochondria_m2_wdl
There is no problem producing gVCFs after calling mutect2, but it seems like liftoverVcf does not work with gVCF files very well. Here's a brief error message:
Badly formed variant context at location chrM:16069; getEnd() was 16069 but this VariantContext contains an END key with value 8069
I managed to bypass this by using a newer version of picard, but the step that split multiallelic variants and remove variants that didn't pass doesn't like gVCF formats. I'm not sure how to proceed from here. I think FilterMutectCalls only works for VCFs with single sample, so I can't joint call before the filtering step either.
Thanks!
Le
-
Hi Le Qi
Our mitochondria workflow does not include a joint calling flow therefore you may be on your own to devise a methodology. If you are only interested in generating a list of sites for future studies then the best method would be to combine all samples as a single sample and process it with Mutect2 to get a VCF that you can filter and maybe convert to a sites only VCF file for future reference. There will not be any allele frequencies that you can calculate from mutect2 VCF files as there is only allele fractions listed. Each and every single sample will contain a different allele fraction there so joint calling is not a very useful strategy for this purpose.
I hope this helps.
Please sign in to leave a comment.
9 comments