Ambiguity in Germline short variant best practices
Hi everyone. I was reading through the best practices for short germline variant calls, and a colleague and I had differing interpretations of the best practices which we were hoping you could clarify.
https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-
The question is about the line:
"We are currently experimenting with neural network-based approaches with the goal of eventually replacing VQSR with a more powerful and flexible filtering process."
Does this refer to CNNScoreVariants in all cases? Or just for cohort germline variant calling?
----
Essentially what we want to find out is:
Working on a per/sample variant calling the question relies on the variant calling and filtering steps. Is the best practice to (for a single sample) to use haplotypecaller to generate GVCF files and then use genotypeGVCFs, VariantRecalibrator, and ApplyVQSR to filter out potential false positives
OR
use haplotypecaller without the -ERC GVCF tag to generate a vcf file. Then use CNNScoreVariants and FIlterVariantTranches to control for false positives.
Thank you all in advance. I hope to hear from someone soon.
-
Thank you for your post, nicksmith! I want to let you know we have received your question and will be moving it to the Community Discussions -> Documentation Questions topic, as the Germline topic is for reporting bugs and issues with GATK.
We'll get back to you if we have any updates or follow up questions. Please see our Support Policy for more details about how we prioritize responding to questions.
Please sign in to leave a comment.
1 comment