I would like to use the CNN-based variant caller. For this I need a trained network. From reading the documentation I saw that I could obtain such a network with the experimental CNNVariantTrain subcommand in combination with the CNNVariantWriteTensors subcommand.
My question is: Is the following sequence of operations correct?
- I take sequencing data from a known genome like "Genome in a Bottle" where I know in advance which variants are present
- I run gatk's HaplotypeCaller to get the haplotypes and potential variants
- I run CNNVariantWriteTensors to produce the input for the neural network with the variants known in advance.
- I run CNNVariantTrain to get a network that I then can use later on
- I save the network at some central location and I won't update it later.
Now let's assume I have several samples from sequencing with unknown variants. Would it be correct to do the following for each of them?
- Run CNNScoreVariants
Or do I have do to some more processing?
Please sign in to leave a comment.