Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Gatk4 rnaseq germline snps indels json file

Answered
0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Thank you for your post, Alan Foley! I want to let you know we have received your question. We'll get back to you if we have any updates or follow up questions. 

    Please see our Support Policy for more details about how we prioritize responding to questions. 

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    Hi Alan Foley,

    The knownVcfs input here is only used in the BaseQualityScoreRecalibrator to mask out the sites with known variation. This is because that tools assumes that any mismatch base from the reference is an error. Removing common known variation helps that assumption hold, however obviously doesn't catch all possible variation. Usually this is fine because there is an overwhelming amount of data that matches the reference and a relatively small number of sites that have novel variation.

    dbSnp is included both in BaseQualityScoreRecalibrator for the same reason as knownVcfs and also in HaplotypeCaller so that it will label dbSnp sites in your output VCF. It won't change the variants you discover in your sample at all.

    So in both of these cases you will still be able to discover novel variants and the best practices recommendation is to include them so that BaseQualityScoreRecalibrator works optimally. If there is another reason besides wanting to find novel variants that you don't want to include them (such as not having a known variants dataset for your organism) then I'd recommend looking at the BaseQualityScoreRecalibrator documentation to get some more ideas. These inputs are all currently required in the WDL so if you do decide not to include them you'll need to edit the WDL to make those inputs optional (or remove them entirely). I hope this helps!

    1
    Comment actions Permalink
  • Avatar
    Alan Foley

    Thanks for this reply!

    In fact I went ahead and manually performed the steps instead of using the JSON.

    There were a few changes I needed to make.

    Alan

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk