AS_MQRanksum craches R script VQSR
Hi.
I am running GATK 4.1.4.1. and when I am running allele specific VQSR and include "-an AS_MQRankSum" I get the following error:
13:02:54.929 INFO VariantRecalibrator - Executing: Rscript /home/tim/Work/Projects/MagmaRun15BigSim/04VQSR/test.plots.R
13:02:59.603 INFO VariantRecalibrator - Shutting down engine
[23 January 2020 1:02:59 PM] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.39 minutes.
Runtime.totalMemory()=602406912
org.broadinstitute.hellbender.utils.R.RScriptExecutorException:
Rscript exited with 1
Command Line: Rscript -e tempLibDir = '/tmp/Rlib.2260639275524561499';source('/home/tim/Work/Projects/MagmaRun15BigSim/04VQSR/test.plots.R');
Stdout:
Stderr: Warning: Ignoring unknown parameters: legend
Error in f(..., self = self) : Breaks and labels are different lengths
Calls: source ... guide_train -> guide_train.legend -> <Anonymous> -> f
In addition: Warning messages:
1: Non Lab interpolation is deprecated
2: Removed 1 rows containing missing values (geom_tile).
3: Removed 1 rows containing missing values (geom_point).
4: Removed 1 rows containing missing values (geom_point).
5: Removed 1 rows containing missing values (geom_point).
Execution halted
Consequently the R plots are incomplete or missing. When I exclude AS_MQRankSum VQSR runs fine.
Any ideas?
Thanks,
Tim
-
Hi timh
Please post the exact command used and the entire error log.
-
Here you are: [exact command works without "-an AS_MQRankSum"]
java -jar ~/Programs/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar VariantRecalibrator -R ref.fa -V test.4.1.4.1.vcf.gz -AS --resource:ref500,known=false,training=true,truth=true,prior=20.0 random500.vcf --resource:complete,known=true,training=false,truth=false,prior=5.0 all-dbsnp.vcf -an AS_MQRankSum -an AS_QD -an AS_SOR -an AS_MQ -an DP -an AS_ReadPosRankSum -mode SNP --output test.recal --tranches-file test.tranches --truth-sensitivity-tranche 100.0 --truth-sensitivity-tranche 95.0 --truth-sensitivity-tranche 99.0 --output-model test.model -rscript-file test.plots.R --max-gaussians 2
11:08:56.062 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/user/Programs/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 27, 2020 11:08:56 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:08:56.320 INFO VariantRecalibrator - ------------------------------------------------------------
11:08:56.320 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.4.1
11:08:56.320 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
11:08:56.320 INFO VariantRecalibrator - Executing as user@user-ThinkPad-X260 on Linux v5.3.0-26-generic amd64
11:08:56.320 INFO VariantRecalibrator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_151-b12
11:08:56.320 INFO VariantRecalibrator - Start Date/Time: 27 January 2020 11:08:56 AM
11:08:56.321 INFO VariantRecalibrator - ------------------------------------------------------------
11:08:56.321 INFO VariantRecalibrator - ------------------------------------------------------------
11:08:56.321 INFO VariantRecalibrator - HTSJDK Version: 2.21.0
11:08:56.321 INFO VariantRecalibrator - Picard Version: 2.21.2
11:08:56.321 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:08:56.321 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:08:56.321 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:08:56.321 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:08:56.321 INFO VariantRecalibrator - Deflater: IntelDeflater
11:08:56.321 INFO VariantRecalibrator - Inflater: IntelInflater
11:08:56.321 INFO VariantRecalibrator - GCS max retries/reopens: 20
11:08:56.322 INFO VariantRecalibrator - Requester pays: disabled
11:08:56.322 INFO VariantRecalibrator - Initializing engine
11:08:56.635 INFO FeatureManager - Using codec VCFCodec to read file file:///WorkDir/random500.vcf
11:08:56.654 INFO FeatureManager - Using codec VCFCodec to read file file:///WorkDir/all-dbsnp.vcf
11:08:56.660 INFO FeatureManager - Using codec VCFCodec to read file file:///WorkDir/test.4.1.4.1.vcf.gz
11:08:56.690 WARN IndexUtils - Feature file "/WorkDir/random500.vcf" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
11:08:56.691 WARN IndexUtils - Feature file "/WorkDir/all-dbsnp.vcf" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
11:08:56.704 INFO VariantRecalibrator - Done initializing engine
11:08:56.706 INFO TrainingSet - Found ref500 track: Known = false Training = true Truth = true Prior = Q20.0
11:08:56.707 INFO TrainingSet - Found complete track: Known = true Training = false Truth = false Prior = Q5.0
11:08:56.713 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
11:08:56.738 INFO ProgressMeter - Starting traversal
11:08:56.738 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
11:08:57.451 INFO ProgressMeter - ref:4262288 0.0 10335 870927.0
11:08:57.451 INFO ProgressMeter - Traversal complete. Processed 10335 total variants in 0.0 minutes.
11:08:57.454 INFO VariantDataManager - AS_MQRankSum: mean = -0.01 standard deviation = 0.04
11:08:57.461 INFO VariantDataManager - AS_QD: mean = 31.15 standard deviation = 3.15
11:08:57.466 INFO VariantDataManager - AS_SOR: mean = 1.07 standard deviation = 0.53
11:08:57.472 INFO VariantDataManager - AS_MQ: mean = 59.63 standard deviation = 2.60
11:08:57.476 INFO VariantDataManager - DP: mean = 162.93 standard deviation = 33.57
11:08:57.482 INFO VariantDataManager - AS_ReadPosRankSum: mean = 0.55 standard deviation = 1.01
11:08:57.518 INFO VariantDataManager - Annotation order is: [DP, AS_MQ, AS_MQRankSum, AS_QD, AS_ReadPosRankSum, AS_SOR]
11:08:57.520 INFO VariantDataManager - Training with 498 variants after standard deviation thresholding.
11:08:57.520 WARN VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable.
11:08:57.524 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
11:08:57.592 INFO VariantRecalibratorEngine - Finished iteration 0.
11:08:57.618 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.09612
11:08:57.627 INFO VariantRecalibratorEngine - Convergence after 9 iterations!
11:08:57.631 INFO VariantRecalibratorEngine - Evaluating full set of 10251 variants...
11:08:57.914 INFO VariantDataManager - Selected worst 383 scoring variants --> variants with LOD <= -5.0000.
11:08:57.914 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
11:08:57.918 INFO VariantRecalibratorEngine - Finished iteration 0.
11:08:57.923 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.02624
11:08:57.931 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.02619
11:08:57.936 INFO VariantRecalibratorEngine - Convergence after 13 iterations!
11:08:57.941 INFO VariantRecalibratorEngine - Evaluating full set of 10251 variants...
11:08:58.262 INFO TrancheManager - Finding 3 tranches for 10251 variants
11:08:58.277 INFO TrancheManager - TruthSensitivityTranche threshold 100.00 => selection metric threshold 0.000
11:08:58.286 INFO TrancheManager - Found tranche for 100.000: 0.000 threshold starting with variant 0; running score is 0.000
11:08:58.286 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=100.00 minVQSLod=-39056.1309 known=(9983 @ 0.4954) novel=(268 @ 1.0775) truthSites(499 accessible, 499 called), name=anonymous]
11:08:58.287 INFO TrancheManager - TruthSensitivityTranche threshold 95.00 => selection metric threshold 0.050
11:08:58.291 INFO TrancheManager - Found tranche for 95.000: 0.050 threshold starting with variant 1453; running score is 0.050
11:08:58.291 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=95.00 minVQSLod=0.7390 known=(8798 @ 0.4911) novel=(0 @ 0.0000) truthSites(499 accessible, 474 called), name=anonymous]
11:08:58.291 INFO TrancheManager - TruthSensitivityTranche threshold 99.00 => selection metric threshold 0.010
11:08:58.294 INFO TrancheManager - Found tranche for 99.000: 0.010 threshold starting with variant 729; running score is 0.010
11:08:58.295 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=99.00 minVQSLod=-1.4924 known=(9522 @ 0.4958) novel=(0 @ 0.0000) truthSites(499 accessible, 494 called), name=anonymous]
11:08:58.296 INFO VariantRecalibrator - Writing out recalibration table...
11:08:58.476 INFO VariantRecalibrator - Writing out visualization Rscript file...
11:08:58.480 INFO VariantRecalibrator - Building DP x AS_MQ plot...
11:08:58.482 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:08:58.713 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:08:58.998 INFO VariantRecalibrator - Building DP x AS_MQRankSum plot...
11:08:58.999 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:08:59.211 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:08:59.460 INFO VariantRecalibrator - Building DP x AS_QD plot...
11:08:59.462 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:08:59.676 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:08:59.900 INFO VariantRecalibrator - Building DP x AS_ReadPosRankSum plot...
11:08:59.900 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:00.094 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:00.306 INFO VariantRecalibrator - Building DP x AS_SOR plot...
11:09:00.306 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:09:00.500 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:09:00.725 INFO VariantRecalibrator - Building AS_MQ x AS_MQRankSum plot...
11:09:00.726 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:00.979 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:01.273 INFO VariantRecalibrator - Building AS_MQ x AS_QD plot...
11:09:01.274 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:09:01.506 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:09:01.766 INFO VariantRecalibrator - Building AS_MQ x AS_ReadPosRankSum plot...
11:09:01.766 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:02.000 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:02.254 INFO VariantRecalibrator - Building AS_MQ x AS_SOR plot...
11:09:02.254 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:09:02.485 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:09:02.743 INFO VariantRecalibrator - Building AS_MQRankSum x AS_QD plot...
11:09:02.743 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:02.931 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:03.150 INFO VariantRecalibrator - Building AS_MQRankSum x AS_ReadPosRankSum plot...
11:09:03.150 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
11:09:03.341 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
11:09:03.550 INFO VariantRecalibrator - Building AS_MQRankSum x AS_SOR plot...
11:09:03.551 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:03.746 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:03.952 INFO VariantRecalibrator - Building AS_QD x AS_ReadPosRankSum plot...
11:09:03.952 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:04.134 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:04.344 INFO VariantRecalibrator - Building AS_QD x AS_SOR plot...
11:09:04.344 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:09:04.526 INFO VariantRecalibratorEngine - Evaluating full set of 3721 variants...
11:09:04.756 INFO VariantRecalibrator - Building AS_ReadPosRankSum x AS_SOR plot...
11:09:04.757 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:04.959 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
11:09:05.171 INFO VariantRecalibrator - Executing: Rscript /WorkDir/test.plots.R
11:09:06.686 INFO VariantRecalibrator - Shutting down engine
[27 January 2020 11:09:06 AM] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.18 minutes.
Runtime.totalMemory()=718274560
org.broadinstitute.hellbender.utils.R.RScriptExecutorException:
Rscript exited with 1
Command Line: Rscript -e tempLibDir = '/tmp/Rlib.5162024666942531667';source('/WorkDir/test.plots.R');
Stdout:
Stderr: Warning: Ignoring unknown parameters: legend
Error in f(..., self = self) : Breaks and labels are different lengths
Calls: source ... guide_train -> guide_train.legend -> <Anonymous> -> f
In addition: Warning messages:
1: Non Lab interpolation is deprecated
2: Removed 1 rows containing missing values (geom_tile).
3: Removed 1 rows containing missing values (geom_point).
4: Removed 1 rows containing missing values (geom_point).
5: Removed 1 rows containing missing values (geom_point).
Execution haltedat org.broadinstitute.hellbender.utils.R.RScriptExecutor.getScriptException(RScriptExecutor.java:80)
at org.broadinstitute.hellbender.utils.R.RScriptExecutor.getScriptException(RScriptExecutor.java:19)
at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126)
at org.broadinstitute.hellbender.utils.R.RScriptExecutor.exec(RScriptExecutor.java:126)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.createVisualizationScript(VariantRecalibrator.java:1121)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:702)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1050)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292) -
update: this appears to be the case for only a number of my datasets, works fine for others. So I am guessing the problem is not VQSR but my data, will have a closer look. Thanks
-
Thank you for the update timh. Please post your solution here once you find it so the community can benefit from it too. Thank you!
-
Hi timh
I am asking out of curiosity, did you edit the R-script code at all? I am wondering is that is the issue here.
-
Hi. No I didn't change the R script.
-
I have just run into the same problem. It seems to be caused by R limit on script line length. Definition of the 'surface' variable is too long. Should be a relatively easy fix.
BTW, is it possible for VariantRecalibrator to create the R script and NOT execute it but exit gracefully (without any error code)? Failure of the R script breaks my pipeline...
Thanks, Marcin
-
Hi Marcin
That is not possible but you could run VariantRecalibrator without the `-rscript-file` argument as a workaround to avoid breaking your pipeline.
Please sign in to leave a comment.
8 comments