Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

VQSR: how is insufficient variance inferred for annotations?

Answered
0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Tim,

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Tim,

    I followed up with my team and got some information regarding your request:

    There is not a perfect way to do what you are trying to do because it depends on the data, so you have to look at the annotations to determine what is going on with them when VQSR crashes like this. 

    If you want to automate this process, you can try re-doing it with different random seeds to check for one to be successful. We also have an option in VariantRecalibrator --max-attempts, which tries to build the model multiple times instead of failing after one attempt [the default].

    If you haven't seen our document on variant filtering, you can check it out here. Hopefully these tips help your process. You can also check out different filtering methods like hard filtering or CNN.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    timh

    Thanks Genevieve.

    All right, sounds like I will automate something based on annotation stats. The --max-attempts function is usefull but the output file, which then contains multiple models, requires automatisation for selecting on of the successfull models for input into ApplyVQSR

    Thanks.

    Tim

    0
    Comment actions Permalink
  • Avatar
    Adrián Segura

    Hello, I am trying to execute the same operation for the hg19 reference, however it is difficult for me to find the files that you mention in your code (resource files). Can you tell me where to find them?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Adrián Segura the Broad maintains a resource bundle, which might be what timh is referring to. You can find more information here: Resource Bundle

    0
    Comment actions Permalink
  • Avatar
    Adrián Segura

    Thanks Genevieve, I have already managed to download the vcfs files exposed in the examples, however, since I want to detect somatic mutations on tumor samples, should I also consider providing files associated with specific databases like COSMIC?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Adrián Segura for Somatic variant calling (with Mutect2) you should be using FilterMutectCalls and not VQSR. 

    Here is the Best Practices overview of Somatic short variant discovery, and the tutorial for calling somatic mutations with Mutect2 + FilterMutectCalls.

    Hope this helps!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk