I'm interested in using CNNScoreVariants, etc with some targeted sequencing data (human data on 63 genes from about 400 samples) since the documentation says VQSR is not suitable for this and suggests either the CNN-based approach or hard filtering and that the CNN-based approach may be better. (I'm using GATK v. 184.108.40.206.) However, since I have cohort data I'm currently working with joint called cohort data and the documentation says the CNN-based approach is still experimental in that case (although it's established for single-sample data). I have a few different questions relating to this.
One workaround might be to call on all samples individually, then filter with the CNN-based approach (using the default models) and then combine the separate VCFs or gVCFs (depending it is possible to do joint calling at that stage). Is there anything wrong with that approach (both variations)? (I suspect it loses some of the advantages of joint calling maybe in both cases.)
I think I read somewhere although unfortunately I can't find the relevant documentation now (it may have been on the old GATK forum) that in order to use the CNN-based approach on joint callset data then I would have to train my own CNN model. So I've looked into trying to do this which requires running CNNVariantWriteTensors and CNNVariantTrain with the first program putting the data into a format suitable for CNNVariantTrain to work with, I think. I can't find any documentation on how to do this apart from the individual manual pages on those two programs (which include example command lines). I guessed that the input data for CNNVariantWriteTensors for the truth VCF might be similar to one or more of the resources used by VariantRecalibrator; but the example command line appears to use a Platinum Genomes file instead so maybe that wouldn't be possible. So my first attempts have used the Platinum Genomes "hybrid truthsets" VCF and BED files (for hg19) downloaded from the Illumina website instead. Is this going to be suitable?
Please sign in to leave a comment.