How to compensate for absence of normals for few samples?
Dear Team
I currently have 50 cancer samples (breast cancer). However, for 15 samples, I don't have adjacent normals. I wish to call somatic mutations. I know for 15 samples I can call variants in the tumor only mode. But how can one justify the downstream analysis? (some variant data from a normal-tumor pair and some from tumor only). Can one publish that data?
Should I create simulates normals?
-
All you can do is be honest about the fact that these 15 samples will have a lot of false positives. Just to give a rough idea of this, suppose you hard-filtered every variant in gnomAD, even singletons, from your tumor-only calls You would still expect several tens of thousands of unique germline variants per genome to remain in your calls, with little recourse. If your tumor sample is very impure you can distinguish somatic calls from germline by their allele fractions, which FiterMutectCalls tries to do, but this is usually not very effective.
Whatever you do, do not simulate normals. There is no point. Just use the panel of normals and af-only gnomAD from the gatk best practices google bucket.
-
Thanks for your prompt response. How about calling variants using different tools like DeepVariants, Mutect2, and Varscan and then intersecting the VCF files to rule out false positives? Will that help. We are planning to do this for all the samples anyway.
I wish to understand if that will help those 15 samples?
Please sign in to leave a comment.
2 comments