How to set neutral-segment-copy-ratio for somatic CNV calling
Dear GATK Team,
First of all, thank you for putting the two CNV calling tutorials together - this is the best WES CNV pipeline I have used. I have two questions:
1. When using CallCopyRatioSegments the default parameters are --neutral-segment-copy-ratio-lower-bound 0.9 and --neutral-segment-copy-ratio-upper-bound 1.1, which correspond to heterozygous copy number gain or loss in 20% of cells. This seems quite high number and I am wondering whether my data will allow me to call, for example heterozygous copy number gain or loss in 5% of cells by changing the parameters to --neutral-segment-copy-ratio-lower-bound 0.975 --neutral-segment-copy-ratio-upper-bound 1.025? In order to do so, I need to understand what my background it and make sure my calls are not just noise. Do you have a suggestion how I can define the background level and set a generic or sample-specific threshold to confidently call CNV gains and losses with lower frequencies?
2. The output from ModelSegments has a column called NUM_POINTS_COPY_RATIO - what does this parameter mean? I noticed that for some segments with very high or vary low copy ratio values, this number is very low. For example here for a sample with monosomy 7, e.g. line 2 and 4 in bold, NUM_POINTS_COPY_RATIO is 1:
CONTIG START END NUM_POINTS_COPY_RATIO MEAN_LOG2_COPY_RATIO CALL
chr7 31127 6764353 916 -0.969624 -
chr7 6765438 6766161 1 -29.219897 -
chr7 6805656 73111427 3213 -0.971284 -
chr7 73184300 73184960 1 -29.526709 -
chr7 73192104 76627026 571 -0.983072 -
chr7 76627027 76979097 3 -11.067441 -
chr7 76980343 77056891 14 -4.802967 -
chr7 77058881 77736804 89 -1.367833 -
chr7 77749237 144318694 5364 -0.980015 -
chr7 144362468 144372526 6 -11.344774 -
Do I need to exclude lines with low NUM_POINTS_COPY_RATIO values from the analysis? I see you have them in the tutorial too but haven't discussed what they mean. I was thinking to exclude all sites with <100 NUM_POINTS_COPY_RATIO but since I don't completely understand the parameter, I am a bit reluctant to do it. I would really appreciate your advice.
Thank you, Bilyana
-
Official comment
Thank you for your kind words about our tools. I can try to answer your questions.
1. You can definitely use `--neutral-segment-copy-ratio-upper-bound` and `--neutral-segment-copy-ratio-lower-bound` arguments to tune sensitivity of the caller. However, note that this caller is not ploidy/purity aware and, as you mentioned, does not take noisiness of the copy ratio data into consideration. That means that you would have to find the parameters that fit your analysis and your dataset manually. One way to get the background noise estimate of the sample is to use posteriors output in `.param` files. For example the `modelFinal.cr.param` file contains `VARIANCE` line that contains different percentiles for posterior of global variance parameter of the log2 copy ratio points of each segment (all segments share this parameter). You can use this distribution to estimate the background noise of your tumor sample and adjust the calling arguments accordingly.
2. `NUM_POINTS_COPY_RATIO` field is the number of intervals that lie in a given segment (from the interval list you passed to the workflow). The more intervals there are in a particular segment more likely that it represents an underlying event. You can also observe a direct relationship between number of intervals and the tightness of the posterior that is also output in `.seg` file. You can definitely filter by `NUM_POINTS_COPY_RATIO` to improve your specificity - however, the exact threshold would depend on your analysis.
Let me know if you have any more questions!
Comment actions -
Hi ,
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
hi Bilyana,
I am NOT a GATK developer and am not answering your question but am looking for info.
i am returning after many months to investigate the CNV tools developed by GATK and am eager to see if changes in GATK v4.1.7 have fixed the problems I had in running the CNV pipelines previously.
I see that you are somewhat satisfied with your results so reach out to you for :
Can you point me to the two tutorials that you refer to:of all, "....thank you for putting the two CNV calling tutorials together - this is the best WES CNV pipeline I have used."
Thanks
-
Hi steveb,
The two tutorials I meant are:
https://gatk.broadinstitute.org/hc/en-us/articles/360035531092
https://gatk.broadinstitute.org/hc/en-us/articles/360035890011#7
I think they are the updated version of an older tutorial.
Good luck!
-
thanks for the prompt reply Bilyana.
These are the two tutorials I had been working with months ago and they haven't changed since I last used them. Maybe some of the tools in GATK4.1 perform better now. I will carry on and try Somatic CNV pipeline again.
-
Hi GATK Team,
Can you please recommend me a workflow for Somatic copy number variant detection
-
See https://github.com/broadinstitute/gatk/tree/master/scripts/cnv_wdl/somatic. You will need to create a panel of normals (PoN) before you can call your tumor sample(s).
Please sign in to leave a comment.
7 comments