Somatic calling pipeline using cell line data
Hi,
I'm trying to identify somatic mutations in treated MOLM-13 cell lines but I'm not certain about how I should produce one of the files required as input to the commands.
I found you can generate a somaticpanel of normals in - (https://gatk.broadinstitute.org/hc/en-us/articles/360037058172-CreateSomaticPanelOfNormals-BETA) which requires:
Mutect2 --germline-resource af-only-gnomad.vcf.gz
To make a somatic panel of normal you need this file but I'm not sure how to generate it, or which public file would be appropriate if thats possible. They are currently aligned to the hg38 genome.
Any advice is appreciated
-
That file you are looking for is already available here in the link below.
Also for the PON we would suggest you to use the one our team generated which can be found in the link below for hg38
http://gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz
and for hg19
gs://gatk-best-practices/somatic-b37/Mutect2-exome-panel.vcf
gs://gatk-best-practices/somatic-b37/Mutect2-WGS-panel-b37.vcf
I hope this helps.
-
Thank you. I wasn't certain if it was still appropriate to use the public files with this dataset so thanks for clarifying that.
-
Hi Gökalp Çelik,
Sorry to ask more questions but I forgot about the baserecalibration step and am stuggling to find examples that state what files should be used for this step as there seems to be multiple.
So, what file should I be using for BaseRecalibrator --known-sites?
And how do I access the http://gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz ? It just says theres a typo in gs when I try to access it. I'm assuming theres an alternative way to access this but I can't find information on it.
-
Sorry that my gs:// links got http'ed while typing. Those can be downloaded using google cloud suite.
For the known sites you can use the resource bundle stored down below
https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0
-
Hi Gökalp Çelik,
Sorry, I feel like I'm missing something obvious but how do I know which of those files I should use. I've found alot of links talking about them but nothing clarifying why you should choose a particular known site file.
-
Sorry that I forgot to clarify. For the known sites you may use Dbsnp resource as known variant sites to recalibrate. You may add additional snp and indel sites if you wish with another known sites parameter.
Please sign in to leave a comment.
6 comments