Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

how to use mutect2 for somatic variant calling among mouse tumor-only tissue (mm39)

0

8 comments

  • Avatar
    SkyWarrior

    Hi yb87625

    I don't think there are any available allele frequency resources for mouse strains like human populations since those strains used in the lab are not in wild and mostly inbred. 

    Your best bet would be to run a normal tissue and a tumor tissue from the same mouse to get somatic variants unless the tumor was generated using xenograft models. 

    If you don't have a normal tissue sequenced from the same animal I bet you can use a whole genome sequencing data from a normal sample of the same strain as a normal bam file when calling somatic variants using Mutect2. 

    0
    Comment actions Permalink
  • Avatar
    yb87625

    Thanks for your help! Unfortunately, we do not sequence the normal tissue.

    So far, I used the dbSNP file downloaded from The  Mouse Genome Project as the known site files for Baserecalibrator, is it fine in this process?

    And after mutect2 calling, the GATK best workflow (https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-) mentioned Getpileupsummary and Contamination process was needed, my question is whether these steps are necessary for non-FFEP tumor tissue, by the way, my sample is tumor fresh tissue samples, is it needed in my case? if so, can I still use the dbsnp file as germline resources for filtering?

    0
    Comment actions Permalink
  • Avatar
    SkyWarrior

    Hi yb87625

    dbsnp file for mice may not necesarily reflect your strain specifically and for that reason I would argue against it however it may be well worth trying before giving up on that. My best bet would be to use a whole genome sequencing data from your strain to compare against your tumor bams while making somatic calls. This whole genome sequencing data could be anywhere on the public databases. 

    My knowledge is limited on the differences between fresh and FFPE sample usage of Mutect2 but I would not hurt to try using them before you polish your final call set since you can always revert back to a previous step if you are not satisfied. 

    0
    Comment actions Permalink
  • Avatar
    Arsen Arakelyan

    Hi, 

    Thanks for the hints on mouse variant calling. We trying to identify somatic mutations during animal aging. We have one 3-month-old sample, the other samples are older. Can I use that single sample to create a PON and run my other samples against it using tumor-only mode? And since not many mouse databases are available, are there any recommendations on how to filter obtained variants? Even after FilterMutectCalls there are many variants with very low DP.

    Thanks.

       

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Arsen Arakelyan

    Mutect2 uses PoN to eliminate sequencing and other technical artifacts from the data and Mutect2 will also function properly without a PoN so it is not a necessary input item. If you wish to eliminate normal germline calls from your data you will definitely need a matched normal or a germline resource. Since mouse strains are usually highly inbred and almost genetically identical you may use any single one of them as a matched normal. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Konstantinos Voutetakis

    I need your help for a similar problem. We have four young cancerous mice and five old cancerous mice where their tumors were caused by transfection with a specific cancer mouse line. we have the exome sequencing of these mice and the exome of two young and two old normal mice as well. These normal samples are not paired with the tumor samples. We want to find somatic variants of tumor samples and highlight the differences between young and old tumors. How can we run the mutect2 for somatic variant calling in tumor samples. We have also the exome of cancerous cell line but we do not have whole genome sequencing data at all. Thank you very much for your precious help in advance.

    0
    Comment actions Permalink
  • Avatar
    Konstantinos Voutetakis

    I would like also to add that normal mice are derived from the same strain with the cancerous ones. We would like to find out the somatic mutations that were developed in young and old cancerous mice after transection with the mouse cancer cell line. Thanks again.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Konstantinos Voutetakis

    Depending on how and where you collect your sequencing samples from your cancer induced mice there may be different options for you. 

    Firstly as mouse strains are usually inbred, using a whole genome sequencing sample from a public resource for your strain can help you remove any potential germline artifacts from your callset. Secondly unless you have a background sequencing data from your cancer cell line you will face lots of variants produced by Mutect2 which you may need to filter out using allele fractions if you are interested in very rare variants occuring after cancer induction. I believe your cell line already contains a high number of mutations in the background so your mileage may vary. 

    I hope these will help. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk