Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Using "RNAseq short variant discovery" without known-sites VCF file

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Eggraandio

    Unfortunately, unless you have a reference variant set for your species you may have to resort to our bootstapping approach for BQSR. It involves multiple rounds of variant calling filtering, base recalibration and checking covariates at each step to see if base call qualities are close to their empirical value. 

    On the other hand if you have only a handful of samples you may wish to skip this step (especially if all your samples are sequenced using the same technology, device etc.) since your samples may all contain the very same level of base calling errors and those may be filtered out at the time of variant calling.

    I hope this helps. 

    Regards. 

    1
    Comment actions Permalink
  • Avatar
    Eggraandio

    Hi,

    Thanks for you quick reply. Indeed I only have a few samples so I might skip that step.

    When you mention that the errors can be filtered at the time of variant calling, should I use a specific threshold or setting ? This is the first time I am running this kind of analysis.

    Best,

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    We do have hard filtering thresholds for human samples however you may wish to try them first to see if your calls have a solid standing for those parameters. Below is the article for explaining some of the parameters that we recommend using for filtration. Unfortunately images won't show up but you may right click and show in new tab to see those charts and graphs. In short we recommend paying attention to quality by depth, strand bias, read position bias, mapping quality as basic parameters. If you have related cohorts then inbreeding coefficient also gets into the equation. 

    https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants 

    I hope this helps. 

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk