Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

VQSR vs Hard Filtering

0

4 comments

  • Avatar
    Pamela Bretscher

    Hi Anna,

    The filters you decide to use really just depend on your data and the sensitivity and precision that you would like to achieve with filtering. You're correct that VQSR is generally recommended for filtering and the filtering you have already done should be sufficient. If you wish, you could use VariantFiltration to hard filter your variants using filters like QD<2, QUAL<30, MQRankSum, etc., and compare your results. However, I think using VQSR should give you high enough accuracy. I hope this helps answer your question.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    @Anna

    Hi Pamela,

    Thank you so much for your explanatory answer. I have below two variants as an example of my doubt, hope that’s ok (I just copied 3 representative samples, as I have >200, all with DP<10) .

    QUAL=974

    PASS

    AC=14;AF=0.35;AN=40;DP=43;ExcessHet=0;FS=0;InbreedingCoeff=0.3492;MLEAC=120;MLEAF=1;MQ=48.52;POSITIVE_TRAIN_SITE;QD=25.51;RAW_MQ=61200;SOR=1.061;VQSLOD=3.14;culprit=FS

    GT:AD:DP:GQ:PL

    1/1:0,2:2:6:49,6,0

    ./.:0,0:0:.:0,0,0

    ./.:0,0:0:.:0,0,0

    QUAL=1129

    PASS

    AC=17;AF=0.149;AN=114;BaseQRankSum=0;DP=157;ExcessHet=0;FS=0;InbreedingCoeff=0.3537;MLEAC=58;MLEAF=0.509;MQ=47.61;MQRankSum=0;POSITIVE_TRAIN_SITE;QD=34.23;RAW_MQ=122400;ReadPosRankSum=1.44;SOR=0.859;VQSLOD=5.87;culprit=DP

    GT:AD:DP:GQ:PGT:PID:PL

    1/1:0,1:1:3:.:.:45,3,0

    0/0:4,0:4:12:.:.:0,12,99./.:0,0:0:.:.:.:0,0,0

    ./.:0,0:0:.:.:.:0,0,0

    So, should I consider them as true variants, even though both have really low DP across all samples?

    Or should I filter variants that have at least two samples with DP>10, for example?

    Best regards,
    Anna

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Anna,

    I understand your concern with these variants. However, it is actually not recommended to use the DP annotation when working with exome samples because of the really high variation in depth. The variation is seen as an error by the filtering tools when working with whole-genome data, but it isn't necessarily indicative of error when working with exome samples, which you mentioned that you are. Given that you have a large number of samples, VQSR should still be suitable for filtering, but the DP filter shouldn't be used with exomes. I hope this is helpful.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    @Anna

    Hi Pamela,

    OK, I understand it. Again, thanks a lot for explaining this. It is truly helpful for me.

    Best regards

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk