Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

(How to part II) Sensitively detect copy ratio alterations and allelic segments Follow

2 comments

  • Avatar
    Bilyana Stoilova

    Hi,
    First of all, thank you for putting these two tutorials together - this is the best WES CNV pipeline I have used. Could I just check with you a few settings? When using CallCopyRatioSegments the default parameters are --neutral-segment-copy-ratio-lower-bound 0.9 and --neutral-segment-copy-ratio-upper-bound 1.1, which corresponds to heterozygous copy number gain or loss in 20% of cells. This seems quite high number and I am wondering whether my data will allow me to call, for example heterozygous copy number gain or loss in 5% of cells by changing the parameters to --neutral-segment-copy-ratio-lower-bound 0.975 --neutral-segment-copy-ratio-upper-bound 1.025? In order to do so, I need to understand what my background it and make sure my calls are not just noise. Do you have a suggestion how I can define the background level and set a sample-specific threshold to confidently call CNV gains and losses with lower frequencies?
    Furthermore, the outputs from ModelSegments has a column called NUM_POINTS_COPY_RATIO - what does this parameter mean? I noticed that for some segments with very high or vary low copy ratio values, this number is very low. For example here for a sample with monosomy 7, e.g. line 4 and 9 in bold, NUM_POINTS_COPY_RATIO is 1:

    CONTIG START END NUM_POINTS_COPY_RATIO MEAN_LOG2_COPY_RATIO
    chr7 31127 2610604 371 -0.970608
    chr7 2631917 6764353 545 -0.968955
    chr7 6765438 6766161 1 -29.219897
    chr7 6805656 38260513 1695 -0.975855
    chr7 38261822 38358772 15 0.38893
    chr7 38358773 38385336 6 -0.918812
    chr7 38389472 73111427 1497 -0.979949
    chr7 73184300 73184960 1 -29.526709
    chr7 73192104 77027191 584 -1.027995
    chr7 77027848 77056891 4 -15.357203
    chr7 77058881 77736804 89 -1.367833
    chr7 77749237 139065288 4597 -0.987914
    chr7 139073532 139080077 5 -1.073054
    chr7 139080078 142346386 490 -0.95668
    chr7 142618867 142797361 27 0.165386
    chr7 142797362 144187184 232 -0.957674
    chr7 144258427 144276715 6 -2.782032
    chr7 144282040 144318694 7 -0.972986
    chr7 144362468 144372526 6 -11.344774
    chr7 144372527 144375110 3 -3.65833
    chr7 144376855 150860905 407 -0.97085
    chr7 150860906 159144997 729 -0.968317

    Do I need to exclude these lines  with low NUM_POINTS_COPY_RATIO values from the analysis? I see you have them in the tutorial too but haven't discussed what they mean. Thank you, Bilyana

    0
    Comment actions Permalink
  • Avatar
    Vincent YU

    Great instruction.

    Can I ask what is the estimated resolution of CNV detection, if I have whole genome sequencing data at 100X , and use 1000bp as the bin size? 

    thank you!

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk