VQSR positive training model failed to converge
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.1.8.1
b) Exact command used:
c) Entire error log:
17:03:35.623 INFO VariantDataManager - Annotation order is: [MQ, QD, SOR, FS, MQRankSum, ReadPosRankSum]
17:03:35.635 INFO VariantDataManager - Training with 352 variants after standard deviation thresholding.
17:03:35.635 WARN VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable.
17:03:35.638 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
17:03:35.725 INFO VariantRecalibratorEngine - Finished iteration 0.
17:03:35.756 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.07763
17:03:35.781 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.02787
17:03:35.798 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.04135
17:03:35.801 INFO VariantRecalibratorEngine - Convergence after 16 iterations!
17:03:35.809 WARN VariantRecalibratorEngine - Model could not pre-compute denominators. Denominator for gaussian evaluation cannot be computed. Covariance determinant is 8.029680863850935E-209. One or more annotations (usually MQ) may have insufficient variance.
17:03:35.817 INFO VariantRecalibrator - Shutting down engine
[September 5, 2020 5:03:35 PM HKT] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.13 minutes.
Runtime.totalMemory()=3867672576
***********************************************************************
A USER ERROR has occurred: Positive training model failed to converge. One or more annotations (usually MQ) may have insufficient variance. Please consider lowering the maximum number of Gaussians allowed for use in the model (via --max-gaussians 4, for example).
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /home/yangyxt/software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/yangyxt/software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar VariantRecalibrator -AS -R /paedwy/d
real 0m10.370s
user 0m24.896s
sys 0m3.720s
The last step's return code is 2
The last step does not finish in normal way. Exiting the whole script now.
If not an error, choose a category for your question(REQUIRED):
a)How do I (......)?
b) What does (......) mean?
c) Why do I see (......)?
d) Where do I find (......)?
e) Will (......) be in future releases?
-
The Error suggests reducing the max-gaussians. Have you tried that? If not I would recommend doing that.
-
Hi ,
I was getting the same error on Indel recalibration and had to end up using max-gaussian to 1 !
Everything ran fine but I am nervous and uncertain on using that low number. What is the disadvantage if any by going so low on the gaussian number?
Also is it forcing it go that low coz I have just 2 chromosomes I am working with across my files and hence less number of variants? Is it better instead, to hard filter variants instead of VQSR even though I have 168 samples but have less number of chromosomes?
I used just two chromosomes in this entire process as I am interested in couple genes just in those 2 chromosomes. I thought it is better to feed the entire chromosome than the exact gene locations.
Just a little lost here and any advice would help me better understand this step.
Thankyou!
-
Dear Kshama Aswath,
According to my personal experience, I'm afraid that the error will occur when the read data amount is just not big enough. Try to bring in more samples or a larger genomic area for VQSR.
-
Hi Kshama Aswath Yangyxt,
I spoke with the developers to get more clarity about this issue. With only 2 chromosomes, there is not a lot of variance, which could be an issue, and if VQSR does not work, we recommend Hard Filtering or CNN.
However, --max-gaussians 1 will not necessarily create issues. Look at the plots from VariantRecalibrator (--rscript-file) and if everything makes sense, then your results could be fine. ApplyVQSR also has plots to view, looking at the Ti/Tv ratio. If the Ti/Tv ratio is bad, you should consider hard filtering. Respectable range for genmes: 1.9-2.1 and high 2s for exomes.
Please sign in to leave a comment.
4 comments