Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK CNV interpret Segment_Mean and MEAN_LOG2_COPY_RATIO

Answered
0

3 comments

  • Avatar
    lzhan140

    I just checked the output from a coworker with the latest GATK. It seems you guys changed the output of both files to be the same on 4.1.6. Right now it is the same log2 value. My results were generated on 4.1.0.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi ,

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

     

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Hi @lzhan140,

    Yes, we changed the output of SEGMENT_MEAN, etc. at some point to be in log2 space.

    Note that any files with a SEGMENT_MEAN column are in the "legacy" CBS-style seg-file format. This is either because they are intended to be compatible with IGV (e.g., *.igv.seg) for convenience of plotting, or because we wanted to preserve some compatibility with downstream tools or legacy functionality in CallCopyRatioSegments (which was based on an older ReCapSeg caller).

    So the quantity that appears in that column may be slightly different, depending on the use.  For example, in the *.igv.seg files output by ModelSegments, as documented: "The posterior medians of the log2 copy ratio and minor-allele fraction are given in the SEGMENT_MEAN columns in the .cr.igv.seg and .af.igv.seg files, respectively."  So this should line up with the *_POSTERIOR_50 quantities reported in *.modeled.seg (which should be considered the primary output of ModelSegments).

    In contrast, the quantity that appears in the SEGMENT_MEAN column in the *.cr.seg file (which is passed to CallCopyRatioSegments) is simply the mean of the log2 copy-ratio data contained in that segment---which is not the same thing as the median of the log2 copy-ratio posterior that is fit by the ModelSegments model (although it will be close). This is simply because this quantity is what the ReCapSeg-style caller in CallCopyRatioSegments expects.

    I know that's a little confusing, and it's always been our intention to replace CallCopyRatioSegments with a better caller and get rid of a lot of these legacy outputs/formats. Unfortunately, we haven't gotten around to that just yet!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk