Bug in VariantRecalibrator: Data not found
GATK v4.1.4.0 VariantRecalibrator step
I got the below error that several people have reported before. None of the reasons prior users have received this apply to my situation.
This was run on 3 whole genomes. There is plenty of training data. Variants from all chromosomes are pooled. In fact, it warns there is too much training data.
The model converges.
However, it exits with a "No data found" error.
So this must be a bug.
Command:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3g -Xms3g -jar /gatk/gatk-package-4.1.4.0-local.jar VariantRecalibrator -V /cromwell_root/fc-secure-2169470b-8920-4360-9316-0a0bb0026afe/submissions/6df6a66e-63bf-4514-a28e-78188311a0e7/JointGenotyping/23e416c9-b9dc-42cb-ba03-96157be18fc5/call-SitesOnlyGatherVcf/genome.sites_only.vcf.gz -O genome.snps.recal --tranches-file genome.snps.tranches --trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 -tranche 99.6 -tranche 99.5 -tranche 99.4 -tranche 99.3 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an SOR -an DP -mode SNP --max-gaussians 6 --resource:hapmap,known=false,training=true,truth=true,prior=15 /cromwell_root/gcp-public-data--broad-references/hg38/v0/hapmap_3.3.hg38.vcf.gz --resource:omni,known=false,training=true,truth=true,prior=12 /cromwell_root/gcp-public-data--broad-references/hg38/v0/1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10 /cromwell_root/gcp-public-data--broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=7 /cromwell_root/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
Error:
03:17:41.647 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF. 03:17:41.770 INFO ProgressMeter - Starting traversal 03:17:41.771 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute 03:17:51.812 INFO ProgressMeter - chr1:24607266 0.2 66000 394422.3 03:18:01.890 INFO ProgressMeter - chr1:60532523 0.3 130000 387693.2 03:18:11.922 INFO ProgressMeter - chr1:99026494 0.5 212000 421876.6 03:18:22.033 INFO ProgressMeter - chr1:156827190 0.7 326000 485830.0 03:18:32.180 INFO ProgressMeter - chr1:186156967 0.8 390000 464202.8 03:18:42.288 INFO ProgressMeter - chr1:222987884 1.0 470000 465984.8 03:18:52.321 INFO ProgressMeter - chr2:7847932 1.2 557000 473713.3 03:19:02.489 INFO ProgressMeter - chr2:39783594 1.3 630000 468297.0 03:19:12.629 INFO ProgressMeter - chr2:75352124 1.5 705000 465561.6 03:19:22.756 INFO ProgressMeter - chr2:119451017 1.7 803000 477100.6 03:19:32.861 INFO ProgressMeter - chr2:146074310 1.9 858000 463453.9 03:19:42.871 INFO ProgressMeter - chr2:182233510 2.0 931000 461275.5 03:19:52.891 INFO ProgressMeter - chr2:221327635 2.2 1012000 463087.2 03:20:02.962 INFO ProgressMeter - chr3:11955206 2.4 1096000 465755.4 03:20:13.088 INFO ProgressMeter - chr3:45178950 2.5 1167000 462737.2 03:20:23.171 INFO ProgressMeter - chr3:82203463 2.7 1249000 464312.3 03:20:33.257 INFO ProgressMeter - chr3:113320435 2.9 1328000 464647.1 03:20:43.313 INFO ProgressMeter - chr3:147314614 3.0 1401000 463033.3 03:20:53.332 INFO ProgressMeter - chr3:185461884 3.2 1479000 463246.7 03:21:03.460 INFO ProgressMeter - chr4:19357724 3.4 1568000 466460.7 03:21:13.671 INFO ProgressMeter - chr4:53290961 3.5 1650000 467201.5 03:21:23.701 INFO ProgressMeter - chr4:88678688 3.7 1736000 469337.2 03:21:34.140 INFO ProgressMeter - chr4:122698343 3.9 1805000 466069.1 03:21:44.253 INFO ProgressMeter - chr4:157536482 4.0 1883000 465931.5 03:21:54.321 INFO ProgressMeter - chr4:188773282 4.2 1970000 468028.0 03:22:04.384 INFO ProgressMeter - chr5:30046380 4.4 2055000 469512.2 03:22:14.469 INFO ProgressMeter - chr5:68245320 4.5 2141000 471070.6 03:22:24.553 INFO ProgressMeter - chr5:103345560 4.7 2204000 467639.4 03:22:34.628 INFO ProgressMeter - chr5:136893156 4.9 2276000 466302.7 03:22:44.663 INFO ProgressMeter - chr5:163538163 5.0 2329000 461352.6 03:22:54.743 INFO ProgressMeter - chr6:11297063 5.2 2401000 460298.2 03:23:04.846 INFO ProgressMeter - chr6:41423914 5.4 2492000 462802.8 03:23:14.881 INFO ProgressMeter - chr6:75894587 5.6 2570000 462911.5 03:23:24.976 INFO ProgressMeter - chr6:107920469 5.7 2638000 461182.1 03:23:35.013 INFO ProgressMeter - chr6:142088902 5.9 2703000 459118.7 03:23:45.109 INFO ProgressMeter - chr6:164504821 6.1 2755000 454948.3 03:23:55.137 INFO ProgressMeter - chr7:21168178 6.2 2832000 455103.0 03:24:05.244 INFO ProgressMeter - chr7:50656858 6.4 2899000 453591.3 03:24:15.346 INFO ProgressMeter - chr7:87584068 6.6 2993000 456279.0 03:24:25.451 INFO ProgressMeter - chr7:124157654 6.7 3064000 455410.2 03:24:35.569 INFO ProgressMeter - chr7:154451379 6.9 3136000 454764.1 03:24:45.575 INFO ProgressMeter - chr8:15123930 7.1 3215000 455165.4 03:24:55.574 INFO ProgressMeter - chr8:40398537 7.2 3273000 452695.0 03:25:05.671 INFO ProgressMeter - chr8:74532571 7.4 3337000 451047.5 03:25:15.691 INFO ProgressMeter - chr8:110820376 7.6 3410000 450741.2 03:25:25.715 INFO ProgressMeter - chr8:144556691 7.7 3490000 451347.6 03:25:35.881 INFO ProgressMeter - chr9:25536313 7.9 3562000 450781.5 03:25:45.912 INFO ProgressMeter - chr9:81509673 8.1 3672000 455074.0 03:25:56.006 INFO ProgressMeter - chr9:108515611 8.2 3727000 452457.7 03:26:06.130 INFO ProgressMeter - chr10:3431181 8.4 3807000 452891.7 03:26:16.222 INFO ProgressMeter - chr10:30462242 8.6 3883000 452872.0 03:26:26.309 INFO ProgressMeter - chr10:66854545 8.7 3986000 455944.1 03:26:36.347 INFO ProgressMeter - chr10:102890494 8.9 4059000 455576.0 03:26:46.435 INFO ProgressMeter - chr10:132502399 9.1 4131000 455069.5 03:26:56.511 INFO ProgressMeter - chr11:23175389 9.2 4199000 454158.7 03:27:06.557 INFO ProgressMeter - chr11:58840107 9.4 4286000 455322.9 03:27:16.598 INFO ProgressMeter - chr11:91498213 9.6 4354000 454472.7 03:27:26.736 INFO ProgressMeter - chr11:125565866 9.7 4435000 454899.0 03:27:36.750 INFO ProgressMeter - chr12:23024868 9.9 4519000 455713.6 03:27:46.814 INFO ProgressMeter - chr12:59031102 10.1 4609000 457058.4 03:27:57.105 INFO ProgressMeter - chr12:89631070 10.3 4673000 455655.0 03:28:07.154 INFO ProgressMeter - chr12:119164561 10.4 4731000 453898.6 03:28:17.269 INFO ProgressMeter - chr13:35600096 10.6 4839000 456870.0 03:28:27.303 INFO ProgressMeter - chr13:67980399 10.8 4913000 456647.3 03:28:37.303 INFO ProgressMeter - chr13:101737742 10.9 4994000 457094.4 03:28:47.306 INFO ProgressMeter - chr14:39730889 11.1 5079000 457887.3 03:28:57.466 INFO ProgressMeter - chr14:71531006 11.3 5148000 457130.0 03:29:07.620 INFO ProgressMeter - chr14:96613543 11.4 5207000 455523.0 03:29:17.715 INFO ProgressMeter - chr15:46553323 11.6 5303000 457192.6 03:29:27.807 INFO ProgressMeter - chr15:78677043 11.8 5373000 456605.6 03:29:37.812 INFO ProgressMeter - chr16:5955331 11.9 5447000 456426.4 03:29:47.814 INFO ProgressMeter - chr16:51445438 12.1 5534000 457328.3 03:29:57.908 INFO ProgressMeter - chr16:82183128 12.3 5605000 456844.3 03:30:08.030 INFO ProgressMeter - chr17:10712595 12.4 5663000 455311.1 03:30:18.037 INFO ProgressMeter - chr17:53088469 12.6 5758000 456823.4 03:30:28.086 INFO ProgressMeter - chr18:4027503 12.8 5838000 457096.6 03:30:38.154 INFO ProgressMeter - chr18:40903951 12.9 5916000 457197.0 03:30:48.168 INFO ProgressMeter - chr18:73791697 13.1 5995000 457402.6 03:30:58.260 INFO ProgressMeter - chr19:28311138 13.3 6087000 458537.4 03:31:08.431 INFO ProgressMeter - chr19:56337305 13.4 6162000 458334.4 03:31:18.578 INFO ProgressMeter - chr20:23408490 13.6 6222000 457048.0 03:31:28.717 INFO ProgressMeter - chr20:60188479 13.8 6330000 459280.3 03:31:38.754 INFO ProgressMeter - chr21:36142337 13.9 6429000 460870.1 03:31:48.859 INFO ProgressMeter - chr22:36159424 14.1 6537000 463022.1 03:31:58.895 INFO ProgressMeter - chrX:28223170 14.3 6621000 463480.2 03:32:08.932 INFO ProgressMeter - chrX:87187837 14.5 6682000 462336.8 03:32:19.047 INFO ProgressMeter - chrX:123259486 14.6 6727000 460083.3 03:32:25.965 INFO ProgressMeter - chrY:56874021 14.7 6789434 460720.2 03:32:25.966 INFO ProgressMeter - Traversal complete. Processed 6789434 total variants in 14.7 minutes. 03:32:26.445 INFO VariantDataManager - QD: mean = 20.21 standard deviation = 7.91 03:32:27.022 INFO VariantDataManager - MQRankSum: mean = -0.01 standard deviation = 0.28 03:32:27.750 INFO VariantDataManager - ReadPosRankSum: mean = 0.27 standard deviation = 0.94 03:32:28.370 INFO VariantDataManager - FS: mean = 2.94 standard deviation = 4.00 03:32:28.846 INFO VariantDataManager - MQ: mean = 59.92 standard deviation = 0.87 03:32:29.324 INFO VariantDataManager - SOR: mean = 0.80 standard deviation = 0.38 03:32:29.800 INFO VariantDataManager - DP: mean = 100.51 standard deviation = 15.32 03:32:35.175 INFO VariantDataManager - Annotation order is: [DP, MQ, QD, MQRankSum, FS, SOR, ReadPosRankSum] 03:32:35.396 INFO VariantDataManager - Training with 4144774 variants after standard deviation thresholding. 03:32:35.397 WARN VariantDataManager - WARNING: Very large training set detected. Downsampling to 2500000 training variants. 03:32:35.740 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations... 03:39:37.319 INFO VariantRecalibratorEngine - Finished iteration 0. 03:41:37.558 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.72954 03:43:39.283 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.02496 03:45:43.422 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.01168 03:47:51.058 INFO VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.00918 03:49:57.983 INFO VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.00381 03:52:06.112 INFO VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.00773 03:54:10.460 INFO VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.00356 03:55:53.135 INFO VariantRecalibratorEngine - Convergence after 39 iterations! 03:56:10.919 WARN VariantRecalibratorEngine - Model could not pre-compute denominators. 03:56:11.028 INFO VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000. 03:56:11.073 INFO VariantRecalibrator - Shutting down engine [September 4, 2023 3:56:11 AM UTC] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 38.55 minutes. Runtime.totalMemory()=3220176896 java.lang.IllegalArgumentException: No data found. at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34) at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:655) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1050) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206) at org.broadinstitute.hellbender.Main.main(Main.java:292) Using GATK jar /gatk/gatk-package-4.1.4.0-local.jar
-
Official comment
The part of your log that caught my attention is the
Model could not pre-compute denominators.
I believe this happens when the covariance matrix is not invertible, usually because the variance of one of the annotations in near zero. The MQ standard deviation is admittedly not zero, but proportionally quite small.
You have two paths forward:
1. Try removing the MQ annotation. If you're really concerned about bad MQ variants, you can do some supplemental hard filtering.
2. Try the new Variant Extract-Train-Score (VETS) pipeline for variant filtration: https://github.com/broadinstitute/gatk/blob/master/scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl. That pipeline defaults to an outlier detection model that's similar to a random forest and far more robust to numerical instability.
We're moving towards phasing out VQSR in the best practices in favor of the new VETS pipeline.
Comment actions -
Hi G E
Have you tried using the latest GATK 4.4 for this workflow. It may be possible that there is a fix for this issue somewhere along with the changes.
Also you may try to reduce the number of gaussians or you may remove it completely to try to see if your analysis completes without issues.
I hope this helps.
-
I just tested GATK 4.4 - it gives the same error.
Reducing max gaussian is not a fix, because per Specs of this tool it should work on 3 whole genomes.
So this is a bug. Can you advise on next steps?
Thanks.
-
Hi G E
I will ask to our team and try to get a better solution to your problem.
By the way you may check our documentation for VQSR. According to our best practices documentation reducing gaussians is a way to overcome this problem.
The
--max-gaussians
parameter sets the expected number of clusters in modeling. If a dataset gives fewer distinct clusters, e.g. as can happen for smaller data, then the tool will tell you there is insufficient data with aNo data found
error message. In this case, try decrementing the--max-gaussians
value.
Please sign in to leave a comment.
4 comments