Single sample germline - VQSR+CNN or just CNN
Hello,
I'm a relatively new user of GATK doing single sample germline analysis. My GATK version is 4.3.5.0. I've run VariantRecalibrator and ApplyVQSR for both SNPs and Indels but now see in the documentation that the current recommended workflow for germline single sample is to skip VQSR and use CNNScoreVariants and FilterVariantTranches instead.
I'm wondering, is it acceptable to apply CNNScoreVariants+FilterVariantTranches on my VCF produced after VQSR? Or should I go back and skip VQSR and use the CNN filter on the output from HaplotypeCaller? Basically, is there any harm to running both VQSR followed by CNN filtration?
Thanks,
Jordan
-
Hi Jordan,
I am going to move your post into our Community Discussions -> Documentation Questions topic, as the Germline topic is for reporting bugs and issues with GATK.
You can read more about our forum guidelines and the topics here: Forum Guidelines.
Best,
Genevieve
-
Hi Jordan Russell,
Thanks for writing in. Could you share the documentation you are referring to that suggests to skip VQSR? If you are able to use VQSR, this should provide effective and sufficient filtering of your VCFs. If you would like to try using CNNScoreVariants and FilterVariantTranches, it shouldn't really cause any issues to do this along with VQSR, but I don't think it is necessary.
Kind regards,
Pamela
-
Hi Pamela,
Thanks so much for your reply. I'm following recommendations in the best practices workflow "Germline short variant discovery (SNPs + Indels)" linked here (https://gatk.broadinstitute.org/hc/en-us/articles/360035535932).
In the workflow, it shows taking the raw variants generated by HaplotypeCaller and using CNNScoreVariants + FilterVariantTranches. There is no step for doing VQSR (I'm assuming it was replaced by these other filtering steps).
I did both VQSR+CNN filtering and just CNN filtering, but ended up going with just the CNN filtering based on the workflow. Are there any advantages to doing VQSR vs. CNN? If having to choose, which is the better option for single sample processing?
Below is the workflow image for germline single sample processing from the best practices page.
Thanks,
Jordan
-
Hi Jordan Russell,
Okay, thank you for explaining. My mistake, I was referring to the workflow for Germline cohort data rather than single sample. For single-sample, CNNScoreVariants and FilterVariantTranches is the recommended filtering method. VQSR is very good but requires a lot of high-quality training data to build the models, which is less applicable to single-sample analysis. I'm glad to hear you stuck to the steps outlined in the best practices workflow, as this is likely to produce the most accurate results.
Kind regards,
Pamela
-
Hi Pamela,
Great! Thanks so much for your reply!
Thanks,
Jordan
Please sign in to leave a comment.
5 comments