VQSR: how is insufficient variance inferred for annotations?Answered
Some of the annotations I use in VQSR do not always vary significantly and hence I get the following error:
```A USER ERROR has occurred: Positive training model failed to converge. One or more annotations (usually MQ) may have insufficient variance. Please consider lowering the maximum number of Gaussians allowed for use in the model (via --max-gaussians 4, for example).```
I have found that removing the annotation (trail and error) tends to give better results than dropping the gaussians, probably because the one annotation happens to not show meaningful variation.
To prevent GATK from crashing on me every time I would like to automate this process rather than trail and error. In order to do so I need to identify and exclude the annotation that insufficiently varies (differs from one data set to the next), what does GATK consider insufficient variance and how is it calculated?
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
I followed up with my team and got some information regarding your request:
There is not a perfect way to do what you are trying to do because it depends on the data, so you have to look at the annotations to determine what is going on with them when VQSR crashes like this.
If you want to automate this process, you can try re-doing it with different random seeds to check for one to be successful. We also have an option in VariantRecalibrator --max-attempts, which tries to build the model multiple times instead of failing after one attempt [the default].
If you haven't seen our document on variant filtering, you can check it out here. Hopefully these tips help your process. You can also check out different filtering methods like hard filtering or CNN.
All right, sounds like I will automate something based on annotation stats. The --max-attempts function is usefull but the output file, which then contains multiple models, requires automatisation for selecting on of the successfull models for input into ApplyVQSR
Hello, I am trying to execute the same operation for the hg19 reference, however it is difficult for me to find the files that you mention in your code (resource files). Can you tell me where to find them?
Adrián Segura the Broad maintains a resource bundle, which might be what timh is referring to. You can find more information here: Resource Bundle
Thanks Genevieve, I have already managed to download the vcfs files exposed in the examples, however, since I want to detect somatic mutations on tumor samples, should I also consider providing files associated with specific databases like COSMIC?
Adrián Segura for Somatic variant calling (with Mutect2) you should be using FilterMutectCalls and not VQSR.
Here is the Best Practices overview of Somatic short variant discovery, and the tutorial for calling somatic mutations with Mutect2 + FilterMutectCalls.
Hope this helps!
Please sign in to leave a comment.