Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK VariantEval no concordance between dbSNP and our data

Answered
0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Pierre_Bioinfo,

    There are most likely many variants in your VCF that do not pass variant filtering, so they are not included in the "all" category yet they are in the file. You can check this by running SelectVariants with --exclude-filtered true and see how many variants are in the output file. Filtered variants will have something other than PASS in the FILTER column.

    Hope this helps!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Pierre_Bioinfo

    Thanks for your answer Genevieve !

    I just test your solution and I have same value with SelectVariant and my total number of variants in my VCFs ... I don't really understand why I have this results because I don't see any argument that can could do this and I use my last VCF ( therefore containing the best quality variants) to execute EvalVariant. You don't downsampling, right ?

    Best,

    Pierre

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Pierre_Bioinfo,

    Thanks for following up! I'll see if I can get to the bottom of this. First, could you share your VariantEval command? Also, what mode are you referring to with CompOverlap?

    Thank you!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Pierre_Bioinfo

    Hello, 

    That's my command: 
    /path/to/GATK(4.2.0.0) VariantEval --eval /my/VCF/file -D dbsnp_146.hg38.vcf.gz -L bedFile.interval_list -no-ev true -EV CompOverlap -EV TiTvVariantEvaluator -O ${sample}_variantEval.txt
     
    I only use dbsnp stratrification and I use VariantEval to calculate the percentage of variant referenced in dbSNP and the Ti/Tv ratio.I don't quite see what you mean by "mode", so I hope I answer your question.
     
    Thanks,
     
    Pierre
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Pierre,

    Yes that answers my question, thank you! Could you give me more information about your interval list? VariantEval will only run for the variants that are within the interval list, so that could be causing fewer variants to show up in your VariantEval results.

    You can check this with SelectVariants using the same interval list. Let me know what you find.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Pierre,

    Yes that answers my question, thank you! Could you give me more information about your interval list? VariantEval will only run for the variants that are within the interval list, so that could be causing fewer variants to show up in your VariantEval results.

    You can check this with SelectVariants using the same interval list. Let me know what you find.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk