GATK VariantEval no concordance between dbSNP and our data
AnsweredHi,
I use VariantEval to calculate the percentage of variant present in dbSNP. So when I use VariantEval, with CompOverlap I have this result in nEvalVariants:
Novelty nEvalVariants
all 28611
known 27943
novel 668
But in my VCF sample, I have 106137 variants. So my question is: Why do I have 28611 variants in VariantEval and 106137 in my VCF file ? Normally "all" line must be correspond to the total number of variants of my vcf, right ?
Any thoughts appreciated !
Thank you!
-
Hi Pierre_Bioinfo,
There are most likely many variants in your VCF that do not pass variant filtering, so they are not included in the "all" category yet they are in the file. You can check this by running SelectVariants with --exclude-filtered true and see how many variants are in the output file. Filtered variants will have something other than PASS in the FILTER column.
Hope this helps!
Best,
Genevieve
-
Thanks for your answer Genevieve !
I just test your solution and I have same value with SelectVariant and my total number of variants in my VCFs ... I don't really understand why I have this results because I don't see any argument that can could do this and I use my last VCF ( therefore containing the best quality variants) to execute EvalVariant. You don't downsampling, right ?
Best,
Pierre
-
Hi Pierre_Bioinfo,
Thanks for following up! I'll see if I can get to the bottom of this. First, could you share your VariantEval command? Also, what mode are you referring to with CompOverlap?
Thank you!
Genevieve
-
Hello,
That's my command:/path/to/GATK(4.2.0.0) VariantEval --eval /my/VCF/file -D dbsnp_146.hg38.vcf.gz -L bedFile.interval_list -no-ev true -EV CompOverlap -EV TiTvVariantEvaluator -O ${sample}_variantEval.txtI only use dbsnp stratrification and I use VariantEval to calculate the percentage of variant referenced in dbSNP and the Ti/Tv ratio.I don't quite see what you mean by "mode", so I hope I answer your question.Thanks,Pierre -
Hi Pierre,
Yes that answers my question, thank you! Could you give me more information about your interval list? VariantEval will only run for the variants that are within the interval list, so that could be causing fewer variants to show up in your VariantEval results.
You can check this with SelectVariants using the same interval list. Let me know what you find.
Best,
Genevieve
-
Hi Pierre,
Yes that answers my question, thank you! Could you give me more information about your interval list? VariantEval will only run for the variants that are within the interval list, so that could be causing fewer variants to show up in your VariantEval results.
You can check this with SelectVariants using the same interval list. Let me know what you find.
Best,
Genevieve
Please sign in to leave a comment.
6 comments