Hi GATK Team,
(running GATK 18.104.22.168 from docker container)
I was hoping you may be able to provide some advice. I have been asked to perform a CNV analysis of a targeted sequencing data set of case/control design. My concern is that the targeted regions are highly variable in length ranging from 168bp to 22kbp; regions sum to a total length of 152kbp.
I have broadly followed the steps described in this article:
calling variation in the case samples against models generated from the controls.
My question is firstly, is this tool appropriate to this dataset? Particularly given the small amount of the genome covered and the large variability in region size. I note that in the tool documentation it states 'For WES and WGS, we recommend no less than 10000 consecutive intervals spanning at least 10 - 50 mb.'
Secondly, if it is reasonable, how should the tool parameters be configured with respect to class and cnv coherence length? I have used the parameters below, setting each to 150bp (i.e. within the size of the smallest interval). However, this is a dramatic departure from the default 10,000, and so I'd like to make sure I haven't completely misunderstood!
--class-coherence-length 150 \ --cnv-coherence-length 150 \ --interval-psi-scale 1.0E-6 \ --log-mean-bias-standard-deviation 0.01 \ --sample-psi-scale 1.0E-6 \
I know that parameter choice has a large impact upon results, and so would like to get an idea if I'm in the right ballpark, or if my settings are totally inappropriate for purpose.
Many thanks in advance,
Please sign in to leave a comment.