Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GermlineCNVCaller - I don't have a cohort, use public data?

0

4 comments

  • Avatar
    SkyWarrior

    Hi paalmbj

    Short answer: Don't even bother. 

    Long and convoluted answer: There may be a duct tape solution you may try. Check 1000G whole exome files however the capture kit compatibility will be an archnemesis. You may need to find samples with proper coverage and capture kit similar to yours. If not you may need to limit the call regions to a intersection of your capture kit and the ones from 1000G. Either way since the way probes are designed differently you will end up many different false calls. Even read lengths will be a problem for you. To solve this problem even more peacefully gather as much similar samples as you can from your collection and maybe you may contact your sequencing center to provide you some anonymous data sequenced with the same kit so you may collect at least 20-30 samples to run a cohort analysis. 

    Good luck. 

    2
    Comment actions Permalink
  • Avatar
    SkyWarrior

    Sorry my bad. You may use 1000G whole genome samples if you wish. Just make sure that they were also processed with the same reference genome of yours. I used to do that and I even included some of the samples with known deletions and duplications in publications so that I could test my own workflow. 

    1
    Comment actions Permalink
  • Avatar
    paalmbj

    Thanks for the short and long answers SkyWarrior! Most of the samples have WGS data, sorry I didn't specify that. Would the short answer change, supposing I can find a lot of public 30x WGS data?

    0
    Comment actions Permalink
  • Avatar
    paalmbj

    Great, thanks! It's a fair bit of work and I'll try to come back here and share the scripts or result if it works out for me.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk