Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Somatic CNV Workflow Tutorials


1 comment

  • Avatar
    Laura Gauthier

    Hi Yuwei Bao,

    The sites file can be specific to your data (e.g. from HaplotypeCaller results) or based on common variation like the 1000 Genomes SNP file from the VQSR resource bundle:

    As for the plots, they will be more informative once you get to the PlotModeledSegments part of the workflow:  By then a lot of the noise will be smoothed out and the event boundaries will be more clear.  The remaining calls will also be more confident, while the copy ratios that do not have enough evidence to be called as amplifications will be called as copy neutral.

    The Germline workflow does output VCF files, which is not the case for somatic.  Traditionally that's been a difference in the file types preferred by the two different fields.

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk