How to work with CNN variant caller
I would like to use the CNN-based variant caller. For this I need a trained network. From reading the documentation I saw that I could obtain such a network with the experimental CNNVariantTrain subcommand in combination with the CNNVariantWriteTensors subcommand.
My question is: Is the following sequence of operations correct?
- I take sequencing data from a known genome like "Genome in a Bottle" where I know in advance which variants are present
- I run gatk's HaplotypeCaller to get the haplotypes and potential variants
- I run CNNVariantWriteTensors to produce the input for the neural network with the variants known in advance.
- I run CNNVariantTrain to get a network that I then can use later on
- I save the network at some central location and I won't update it later.
Now let's assume I have several samples from sequencing with unknown variants. Would it be correct to do the following for each of them?
- Run CNNScoreVariants
Or do I have do to some more processing?
-
Hi Danio Rerio,
For your first question, the sequence of operations looks correct!
I am not sure I understand your second question though, could you clarify?
Thank you,
Genevieve
-
Thanks for the reply! That's reassuring :)
My second question is also about making sure I'm understanding things correctly. Could you please confirm that the following is correct?:
Given a new sample, I can detect (and score) variants with the pre-trained CNN from step 5 using the subcommand CNNScoreVariants. No additional training of the network is needed for this.
-
Yes, I believe that is correct.
Please feel free to try out your method and if you find any abnormal results, we can troubleshoot then.
Best,
Genevieve
Please sign in to leave a comment.
3 comments