GATK ModelSegments result
Dear GATK forum,
I have performed somatic CNV analysis jointly with allelic counts and coverage data with ModelSegments. The output is a segment file with many columns. My questions are:
1. Some of the LOG2_COPY ratio estimates are like -29.9961 etc, how can be possible to have this high numbers?
2. There are many LOG2 Copy ratio at posterior 10, posterior 50, posterior 90, which one would be most appropriate to use?
3. How can I map these segments for gene i.e the ModelSegment doesn't provide gene level alterations , so how can I derive gene level amplification or deletion with ModelSegment output.
Kindly let me know.
Thanks,
Indrani
-
Hi Indrani, I hope I can answer some of your questions.
1. A log2 copy ratio of -29.999 is equivalent to a copy ratio of 2^-29.99999, or close to 0. This almost certainly indicates a homozygous deletion.
2. You'll probably want posterior 50; this is equivalent to the median of all denoised copy ratios included in that segment.
3. For mapping segments to genes you'll want to run FuncotateSegments. This takes in a `--segments` argument which requires output from CallCopyRatioSegments, which is downstream of ModelSegments. So the workflow would be: ModelSegments -> CallCopyRatioSegments -> FuncotateSegments.
To see how we typically slot these tools into a production pipeline, see https://github.com/broadinstitute/gatk/blob/master/scripts/cnv_wdl/somatic/cnv_somatic_pair_workflow.wdl. -
Indrani, One thing to add to point 2: if you run CallCopyRatioSegments, you will get a clearer output with mean log2 copy ratio (as opposed to posterior 50) along with the call for that segment (amplification, deletion, or neutral).
-
Thank you so much for the detail explanation. I will work on these and get back to you if there is anything else comes up.
Indrani
Please sign in to leave a comment.
3 comments