Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CallCopyRatioSegments -- how does it work?

Answered
0

3 comments

  • Avatar
    Samuel Lee

    Hi Austin Southard-Smith,

    Apologies for the confusing behavior and scant (or missing, in the case of ReCapSeg) documentation. I think the bit of code at https://github.com/broadinstitute/gatk/blob/9d5727df8db3a475b1ba5f9bff6bc92a322f5633/src/main/java/org/broadinstitute/hellbender/tools/copynumber/caller/SimpleCopyRatioCaller.java#L67 will answer your question. That first segment is considered neutral simply because it lies within the copy-neutral region defined by [0.9, 1.1] (with copy ratio 2^0.110692 ~ 1.08).

    In order for a segment to be considered duplicated or deleted, it must lie outside of this region AND the z-score thresholds; all other segments are then considered neutral. It's been (quite) a while since I developed this tool, but I believe we were trying to replicate the spirit of the original ReCapSeg calculation (which was itself relatively naive), while at the same time making changes to the method to avoid undesired behavior in some cases.

    Just for historical interest, this tool was intended to be a sort of a placeholder until a more sophisticated method that also incorporates the allele-fraction data (which is not used here, as is alluded to in the tool documentation) was developed to replace it. Unfortunately, our development roadmap changed, and although various members of our group have experimented with prototypes, we were never able to fully develop a replacement method to our satisfaction.

    1
    Comment actions Permalink
  • Avatar
    Austin Southard-Smith

    Hi Samuel Lee,

    Thank you for your response. It is very helpful and I now realize I previously was using the reported segment means improperly. That block is helpful. Just one further point of clarification. In the case of my example data above the z-score thresholds [1.010396 - 2*0.025906, 1.010396 - 2*0.025906] are within the bounds of the of the defined copy-neutral space [0.9,1.1].  In such a case the it appears that the z-score thresholds do not matter as they are only used to evaluate segments outside of the defined copy-neutral space. Is that interpretation correct?

    Thanks!

     

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Yes, that's correct that the z-score thresholds do not come into play. You might want to adjust the parameters to be more sensitive. If I recall, these default parameters were tuned for a clinical pipeline and are correspondingly conservative.

    However, note that z-score thresholding of segments within these bounds does come into play when calculating the "Length-weighted mean for z-score calling (CR space)." The gist is that outliers in the distribution of segment means in the copy-neutral region would otherwise affect the calculation of this length-weighted mean, were z-score thresholding not performed. It may be helpful to read each line of the log output to understand the corresponding step of the calling method.

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk