GermlineCNVCaller - I don't have a cohort, use public data?
I would like to call CNVs with GermlineCNVCaller, but I don't have a large cohort. I do have access to multiple trios, but they are generated over many years, so the sequencing and library preps are quite different. Is it reasonable to make a model using public data instead, like 1000 Genomes? If so, I'd maybe expect that someone had done it already and I could download it, but I can't find anything like that.
-
Hi paalmbj
Short answer: Don't even bother.
Long and convoluted answer: There may be a duct tape solution you may try. Check 1000G whole exome files however the capture kit compatibility will be an archnemesis. You may need to find samples with proper coverage and capture kit similar to yours. If not you may need to limit the call regions to a intersection of your capture kit and the ones from 1000G. Either way since the way probes are designed differently you will end up many different false calls. Even read lengths will be a problem for you. To solve this problem even more peacefully gather as much similar samples as you can from your collection and maybe you may contact your sequencing center to provide you some anonymous data sequenced with the same kit so you may collect at least 20-30 samples to run a cohort analysis.
Good luck.
-
Sorry my bad. You may use 1000G whole genome samples if you wish. Just make sure that they were also processed with the same reference genome of yours. I used to do that and I even included some of the samples with known deletions and duplications in publications so that I could test my own workflow.
-
Thanks for the short and long answers SkyWarrior! Most of the samples have WGS data, sorry I didn't specify that. Would the short answer change, supposing I can find a lot of public 30x WGS data?
-
Great, thanks! It's a fair bit of work and I'll try to come back here and share the scripts or result if it works out for me.
Please sign in to leave a comment.
4 comments