Best Practices for Tumor-Only WES Data Analysis: Seeking Feedback on Variant Calling Pipeline
Hello everyone, I’m currently working on a pipeline for analyzing tumor-only WES data, which I understand has its challenges and limited resources available. I’d greatly appreciate any feedback or suggestions on my current approach. Here's what I’m doing so far:
1. Somatic Variant Calling: I use Mutect2 to call somatic variants, leveraging the germline resource and a panel of normals (PoN). Since my samples are FFPE, I also collect the F1R2 files during this step.
2. FFPE Artifacts & Contamination Handling:
- I run `LearnReadOrientationModel` to model FFPE artifacts.
- I use `GetPileupSummaries` and `CalculateContamination` to estimate contamination.
- Then, I run `FilterMutectCalls` with both `--contamination-table` and `--tumor-segmentation` to apply the appropriate filters.
3. Variant Filtering:
- I use `SelectVariants` to retain only the variants that pass the filters.
- Next, I filter out variants that are common across several germline databases to reduce the likelihood of retaining germline polymorphisms.
4. Functional Annotation:
- Finally, I focus on functional filtering by retaining only variants that are confirmed in either COSMIC or OncoKB. Given that I only have tumor data without a matched normal, do you think this approach is robust and reliable for calling somatic variants? I'm particularly interested in any suggestions on refining the contamination estimation, filtering strategy, or any best practices I might have missed.
Also, I want to call CNV with cnvkit, can I use the provided by mutect contamination estimation for the -m clonal --purity step Thanks in advance for any insights!
-
Hi georgepou
You seem to be doing all that is possible with tumor-only calling approach yet we do not usually recommend doing it. If you don't have matched normals for your tumor samples there is really nothing much that you can do except that you can lower
--initial-tumor-lod
parameter to retain ultra low frequency calls which could potentially be important depending on the sample type you are working on.
For the CNV approach we have our own Somatic CNV calling workflow however it also requires "Panel of Normals" for proper CNV segmentation. We do not have provisions for any other algorithms therefore with cnvkit your mileage may vary.
I hope this helps.
Regards.
-
I'll give --initial-tumor-lod a try.
Thank you so much for your feedback; it’s always helpful to get confirmation, especially when working with uncommon workflows.
-
Hi georgepou
I have a similar setup of tumor-only wes samples that I am trying to run somatic calling on. However, I didn't see a mention on interval files in your post. Did you use a target exome interval files for your samples? If so, were they acquired through the sequencing company or GATK resource bundle?
I have found the following file but not sure if this is the correct file to be used for exome sequences.File: genomics-public-data/resources/broad/hg38/v0/wgs_calling_regions.hg38.interval_list
-
We recommend using the interval files from the original manufacturer for your whole exome based work.
Regards.
-
Thank you for the response, Gökalp Çelik
Please sign in to leave a comment.
5 comments