Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Single sample germline - VQSR+CNN or just CNN

0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Jordan,

    I am going to move your post into our Community Discussions -> Documentation Questions topic, as the Germline topic is for reporting bugs and issues with GATK.

    You can read more about our forum guidelines and the topics here: Forum Guidelines.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Jordan Russell,

    Thanks for writing in. Could you share the documentation you are referring to that suggests to skip VQSR? If you are able to use VQSR, this should provide effective and sufficient filtering of your VCFs. If you would like to try using CNNScoreVariants and FilterVariantTranches, it shouldn't really cause any issues to do this along with VQSR, but I don't think it is necessary. 

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Jordan Russell

    Hi Pamela,

    Thanks so much for your reply. I'm following recommendations in the best practices workflow "Germline short variant discovery (SNPs + Indels)" linked here (https://gatk.broadinstitute.org/hc/en-us/articles/360035535932).

    In the workflow, it shows taking the raw variants generated by HaplotypeCaller and using CNNScoreVariants + FilterVariantTranches. There is no step for doing VQSR (I'm assuming it was replaced by these other filtering steps).

    I did both VQSR+CNN filtering and just CNN filtering, but ended up going with just the CNN filtering based on the workflow. Are there any advantages to doing VQSR vs. CNN? If having to choose, which is the better option for single sample processing?

    Below is the workflow image for germline single sample processing from the best practices page.

    Thanks,

    Jordan

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Jordan Russell,

    Okay, thank you for explaining. My mistake, I was referring to the workflow for Germline cohort data rather than single sample. For single-sample, CNNScoreVariants and FilterVariantTranches is the recommended filtering method. VQSR is very good but requires a lot of high-quality training data to build the models, which is less applicable to single-sample analysis. I'm glad to hear you stuck to the steps outlined in the best practices workflow, as this is likely to produce the most accurate results.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Jordan Russell

    Hi Pamela,

    Great! Thanks so much for your reply!

    Thanks,

    Jordan

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk