how to use mutect2 for somatic variant calling among mouse tumor-only tissue (mm39)
REQUIRED for all errors and issues:
a) GATK version used: 4.4.0
b) Exact command used:
gatk Mutect2 \ -R reference.fa \ -I sample.bam \ --germline-resource af-only-gnomad.vcf.gz \ --panel-of-normals pon.vcf.gz \ -O single_sample.vcf.gz
i would like to use GATK-mutect2 for somatic variants calling with mouse tumor tissue. however, i saw multiple posts were about human. and I do find any solution for mutect2 varaints calling in mouse. I want to know, whether in the tumor-only mode, do I need the PoN file and germline-resource, if so, where I can find them? and also, when I perform the BaseRecalibrator, where I can find the mm39 mouse known site (indel + snp)?
-
Hi yb87625
I don't think there are any available allele frequency resources for mouse strains like human populations since those strains used in the lab are not in wild and mostly inbred.
Your best bet would be to run a normal tissue and a tumor tissue from the same mouse to get somatic variants unless the tumor was generated using xenograft models.
If you don't have a normal tissue sequenced from the same animal I bet you can use a whole genome sequencing data from a normal sample of the same strain as a normal bam file when calling somatic variants using Mutect2.
-
Thanks for your help! Unfortunately, we do not sequence the normal tissue.
So far, I used the dbSNP file downloaded from The Mouse Genome Project as the known site files for Baserecalibrator, is it fine in this process?
And after mutect2 calling, the GATK best workflow (https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-) mentioned Getpileupsummary and Contamination process was needed, my question is whether these steps are necessary for non-FFEP tumor tissue, by the way, my sample is tumor fresh tissue samples, is it needed in my case? if so, can I still use the dbsnp file as germline resources for filtering?
-
Hi yb87625
dbsnp file for mice may not necesarily reflect your strain specifically and for that reason I would argue against it however it may be well worth trying before giving up on that. My best bet would be to use a whole genome sequencing data from your strain to compare against your tumor bams while making somatic calls. This whole genome sequencing data could be anywhere on the public databases.
My knowledge is limited on the differences between fresh and FFPE sample usage of Mutect2 but I would not hurt to try using them before you polish your final call set since you can always revert back to a previous step if you are not satisfied.
-
Hi,
Thanks for the hints on mouse variant calling. We trying to identify somatic mutations during animal aging. We have one 3-month-old sample, the other samples are older. Can I use that single sample to create a PON and run my other samples against it using tumor-only mode? And since not many mouse databases are available, are there any recommendations on how to filter obtained variants? Even after FilterMutectCalls there are many variants with very low DP.
Thanks.
-
Mutect2 uses PoN to eliminate sequencing and other technical artifacts from the data and Mutect2 will also function properly without a PoN so it is not a necessary input item. If you wish to eliminate normal germline calls from your data you will definitely need a matched normal or a germline resource. Since mouse strains are usually highly inbred and almost genetically identical you may use any single one of them as a matched normal.
I hope this helps.
-
I need your help for a similar problem. We have four young cancerous mice and five old cancerous mice where their tumors were caused by transfection with a specific cancer mouse line. we have the exome sequencing of these mice and the exome of two young and two old normal mice as well. These normal samples are not paired with the tumor samples. We want to find somatic variants of tumor samples and highlight the differences between young and old tumors. How can we run the mutect2 for somatic variant calling in tumor samples. We have also the exome of cancerous cell line but we do not have whole genome sequencing data at all. Thank you very much for your precious help in advance.
-
I would like also to add that normal mice are derived from the same strain with the cancerous ones. We would like to find out the somatic mutations that were developed in young and old cancerous mice after transection with the mouse cancer cell line. Thanks again.
-
Depending on how and where you collect your sequencing samples from your cancer induced mice there may be different options for you.
Firstly as mouse strains are usually inbred, using a whole genome sequencing sample from a public resource for your strain can help you remove any potential germline artifacts from your callset. Secondly unless you have a background sequencing data from your cancer cell line you will face lots of variants produced by Mutect2 which you may need to filter out using allele fractions if you are interested in very rare variants occuring after cancer induction. I believe your cell line already contains a high number of mutations in the background so your mileage may vary.
I hope these will help.
Please sign in to leave a comment.
8 comments