Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Panel of normals for CNV exome analysis using WGS datasets

0

4 comments

  • Avatar
    Gökalp Çelik

    Hi Krishna

    Creating a normal panel is not the only option for germline CNV calling workflow. Actually you may use all your files and call CNVs in the cohort mode as well. If you have around 30 samples you can run them in cohort mode which also generates a model. After that post processing your samples will generate CNV calls. 

    About using publicly available exome and genome sets, using genome sets is doable but not exome sets since public data sets for exomes may not be using the same capture kit as you have therefore you will be dealing with lots of false positive or negative calls due to differences between capture kits. For genomes this is not an issue since you will be sequencing the whole genome without any capture bias. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Krishna

    Thank you for the reply Gökalp Çelik

    I'm curious to know if it's feasible to employ publicly accessible whole-genome datasets to create a panel of normals (PoN) file and subsequently utilize it for calling copy number variations (CNVs) in exome samples. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Krishna

    Since 2 data types are fundamentally different from each other in areas like coverage/read distribution and depth that approach may not be too feasible. 

    If you are in need of more samples for exome CNV calling here are a few more suggestions from my personal experience

    1- You may even perform cohort level calls using samples as low as 10. Not very optimal for long term goals but you will be able to get your results with a little more filtering. Some of the common events may end up showing as unique CNVs in the joint call file but you may be able to annotate calls using various databases such as DECIPHER to obtain the frequency of such CNVs in already published data. 

    2- You may try using whole exome data from public resources such as 1000 Genome project however since the coverage and target regions are most likely different from what you already have used for your own samples you may need to perform some homework before using those samples along with your cohort. The most obvious solution is to generate a intersection of target regions covered by public exome datasets and your capture kit to obtain a common consensus which will remove regions that are not covered by public data as well as your data. Remaining regions may be used to call CNVs

    I hope these will help. 

    0
    Comment actions Permalink
  • Avatar
    Krishna

    Thank you so much Gökalp Çelik

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk