FIX to VariantRecalibrator RScript compatibility issues with latest library scales
ISSUE: Rscript produce by VariantCalibration has a compatibility issue with the scales package version 0.3.0 and later. In these versions, the space argument of pal_gradient_n() only supports "Lab", not "rgb". This means your script needs to be updated to accommodate this change in the scales library requirements.
FIX :
In Rscript all the parameter instances:
scale_fill_gradient(high = "green", low = "red", space = "rgb")
has to be change to
scale_fill_gradient(high = "green", low = "red", space = "Lab")
Then the script can be re run separately to obtain pdf file with graphs
REQUIRED for all errors and issues:
a) GATK version used: The Genome Analysis Toolkit (GATK) v4.6.0.0
b) Exact command used:
gatk --java-options "-Xmx4G" VariantRecalibrator \
-R human_g1k_v37.fasta \
-V allsamples_chr20_snps.vcf.gz \
--trust-all-polymorphic \
-tranche 100.0 \
-tranche 99.95 \
-tranche 99.9 \
-tranche 99.8 \
-tranche 99.6 \
-tranche 99.5 \
-tranche 99.4 \
-tranche 99.3 \
-tranche 99.0 \
-tranche 98.0 \
-tranche 97.0 \
-tranche 90.0 \
--resource:1000G,known=false,training=true,truth=true,prior=10.0 ALL.chr20_phase1_release_v3_snps.vcf.gz \
-an QD -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
-mode SNP \
--max-gaussians 6 \
-O recalibrate_SNP.recal \
--tranches-file recalibrate_SNP.tranches \
--rscript-file recalibrate_SNP_plots.R
c) Entire program log:
Using GATK jar /home/Bioinformatics/gatk/gatk-package-4.6.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4G -jar /home/Bioinformatics/gatk/gatk-package-4.6.0.0-local.jar VariantRecalibrator -R human_g1k_v37.fasta -V allsamples_chr20_snps.vcf.gz --trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 -tranche 99.6 -tranche 99.5 -tranche 99.4 -tranche 99.3 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 --resource:1000G,known=false,training=true,truth=true,prior=10.0 ALL.chr20_phase1_release_v3_snps.vcf.gz -an QD -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP --max-gaussians 6 -O recalibrate_SNP.recal --tranches-file recalibrate_SNP.tranches --rscript-file recalibrate_SNP_plots.R
09:38:33.345 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/Bioinformatics/gatk/gatk-package-4.6.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
09:38:33.613 INFO VariantRecalibrator - ------------------------------------------------------------
09:38:33.616 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.6.0.0
09:38:33.616 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
09:38:33.616 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v21.0.4+7-Ubuntu-1ubuntu224.04
09:38:33.616 INFO VariantRecalibrator - Start Date/Time: September 27, 2024, 9:38:33 AM CST
09:38:33.616 INFO VariantRecalibrator - ------------------------------------------------------------
09:38:33.617 INFO VariantRecalibrator - ------------------------------------------------------------
09:38:33.617 INFO VariantRecalibrator - HTSJDK Version: 4.1.1
09:38:33.618 INFO VariantRecalibrator - Picard Version: 3.2.0
09:38:33.618 INFO VariantRecalibrator - Built for Spark Version: 3.5.0
09:38:33.618 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:38:33.618 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:38:33.618 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:38:33.618 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:38:33.619 INFO VariantRecalibrator - Deflater: IntelDeflater
09:38:33.619 INFO VariantRecalibrator - Inflater: IntelInflater
09:38:33.619 INFO VariantRecalibrator - GCS max retries/reopens: 20
09:38:33.619 INFO VariantRecalibrator - Requester pays: disabled
09:38:33.620 INFO VariantRecalibrator - Initializing engine
09:38:33.830 INFO FeatureManager - Using codec VCFCodec to read file file:///home/BQSR_bam/GVCF_files/ALL.chr20_phase1_release_v3_snps.vcf.gz
09:38:33.869 INFO FeatureManager - Using codec VCFCodec to read file file:///home/BQSR_bam/GVCF_files/allsamples_chr20_snps.vcf.gz
09:38:33.908 WARN IndexUtils - Feature file "file:///home/Documents/GVCF_files/ALL.chr20_phase1_release_v3_snps.vcf.gz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
09:38:33.933 INFO VariantRecalibrator - Done initializing engine
09:38:33.936 INFO TrainingSet - Found 1000G track: Known = false Training = true Truth = true Prior = Q10.0
09:38:33.945 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
09:38:34.001 INFO ProgressMeter - Starting traversal
09:38:34.002 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
09:38:45.175 INFO ProgressMeter - 20:2838118 0.2 7000 37600.7
09:38:56.682 INFO ProgressMeter - 20:5923204 0.4 15000 39682.5
09:39:07.950 INFO ProgressMeter - 20:8583414 0.6 21000 37115.6
09:39:18.212 INFO ProgressMeter - 20:11237743 0.7 27000 36643.3
09:39:28.658 INFO ProgressMeter - 20:13764654 0.9 33000 36226.6
09:39:39.394 INFO ProgressMeter - 20:16560330 1.1 39000 35784.2
09:39:51.369 INFO ProgressMeter - 20:19358934 1.3 46000 35674.1
09:40:03.141 INFO ProgressMeter - 20:22308161 1.5 51000 34328.4
09:40:14.234 INFO ProgressMeter - 20:24639363 1.7 56000 33522.6
09:40:25.649 INFO ProgressMeter - 20:31556638 1.9 77000 41380.4
09:40:37.748 INFO ProgressMeter - 20:35770999 2.1 82000 39758.9
09:40:48.052 INFO ProgressMeter - 20:38555808 2.2 87000 38941.0
09:40:59.734 INFO ProgressMeter - 20:41335773 2.4 92000 37877.7
09:41:10.153 INFO ProgressMeter - 20:44016863 2.6 97000 37271.6
09:41:20.563 INFO ProgressMeter - 20:46486269 2.8 102000 36743.5
09:41:32.970 INFO ProgressMeter - 20:49984023 3.0 109000 36542.8
09:41:43.483 INFO ProgressMeter - 20:52972517 3.2 116000 36731.9
09:41:53.522 INFO ProgressMeter - 20:55488739 3.3 122000 36688.1
09:42:04.527 INFO ProgressMeter - 20:58398890 3.5 128000 36480.2
09:42:15.328 INFO ProgressMeter - 20:60996484 3.7 134000 36326.5
09:42:22.248 INFO ProgressMeter - 20:62822352 3.8 138228 36336.6
09:42:22.248 INFO ProgressMeter - Traversal complete. Processed 138228 total variants in 3.8 minutes.
09:42:22.272 INFO VariantDataManager - QD: mean = 23.05 standard deviation = 7.40
09:42:22.319 INFO VariantDataManager - MQRankSum: mean = 0.01 standard deviation = 0.36
09:42:22.328 INFO VariantDataManager - ReadPosRankSum: mean = 0.18 standard deviation = 0.85
09:42:22.338 INFO VariantDataManager - FS: mean = 1.57 standard deviation = 2.81
09:42:22.347 INFO VariantDataManager - SOR: mean = 1.04 standard deviation = 0.73
09:42:22.467 INFO VariantDataManager - Annotation order is: [QD, MQRankSum, FS, SOR, ReadPosRankSum]
09:42:22.478 INFO VariantDataManager - Training with 116863 variants after standard deviation thresholding.
09:42:22.482 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
09:42:25.275 INFO VariantRecalibratorEngine - Finished iteration 0.
09:42:26.331 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.27780
09:42:27.230 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.10889
09:42:28.107 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.01078
09:42:28.658 INFO VariantRecalibratorEngine - Convergence after 18 iterations!
09:42:28.814 INFO VariantRecalibratorEngine - Evaluating full set of 138228 variants...
09:42:29.686 INFO VariantDataManager - Selected worst 10152 scoring variants --> variants with LOD <= -5.0000.
09:42:29.687 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
09:42:29.780 INFO VariantRecalibratorEngine - Finished iteration 0.
09:42:29.822 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.04305
09:42:29.858 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.00846
09:42:29.895 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.00272
09:42:29.915 INFO VariantRecalibratorEngine - Convergence after 17 iterations!
09:42:29.931 INFO VariantRecalibratorEngine - Evaluating full set of 138228 variants...
09:42:30.562 INFO TrancheManager - Finding 12 tranches for 138228 variants
09:42:30.677 INFO TrancheManager - TruthSensitivityTranche threshold 100.00 => selection metric threshold 0.000
09:42:30.720 INFO TrancheManager - Found tranche for 100.000: 0.000 threshold starting with variant 0; running score is 0.000
09:42:30.721 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=100.00 minVQSLod=-37.4840 known=(0 @ 0.0000) novel=(138228 @ 2.1263) truthSites(116873 accessible, 116873 called), name=anonymous]
09:42:30.721 INFO TrancheManager - TruthSensitivityTranche threshold 99.95 => selection metric threshold 0.000
09:42:30.763 INFO TrancheManager - Found tranche for 99.950: 0.000 threshold starting with variant 7797; running score is 0.001
09:42:30.764 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=99.95 minVQSLod=-1.7719 known=(0 @ 0.0000) novel=(130431 @ 2.1730) truthSites(116873 accessible, 116814 called), name=anonymous]
09:42:30.764 INFO TrancheManager - TruthSensitivityTranche threshold 99.90 => selection metric threshold 0.001
09:42:30.786 INFO TrancheManager - Found tranche for 99.900: 0.001 threshold starting with variant 8906; running score is 0.001
09:42:30.786 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=99.90 minVQSLod=-1.4146 known=(0 @ 0.0000) novel=(129322 @ 2.1794) truthSites(116873 accessible, 116756 called), name=anonymous]
09:42:30.787 INFO TrancheManager - TruthSensitivityTranche threshold 99.80 => selection metric threshold 0.002
09:42:30.807 INFO TrancheManager - Found tranche for 99.800: 0.002 threshold starting with variant 9853; running score is 0.002
09:42:30.808 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=99.80 minVQSLod=-1.0261 known=(0 @ 0.0000) novel=(128375 @ 2.1869) truthSites(116873 accessible, 116639 called), name=anonymous]
09:42:30.808 INFO TrancheManager - TruthSensitivityTranche threshold 99.60 => selection metric threshold 0.004
09:42:30.824 INFO TrancheManager - Found tranche for 99.600: 0.004 threshold starting with variant 10916; running score is 0.004
09:42:30.825 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=99.60 minVQSLod=-0.6283 known=(0 @ 0.0000) novel=(127312 @ 2.1937) truthSites(116873 accessible, 116405 called), name=anonymous]
09:42:30.825 INFO TrancheManager - TruthSensitivityTranche threshold 99.50 => selection metric threshold 0.005
09:42:30.847 INFO TrancheManager - Found tranche for 99.500: 0.005 threshold starting with variant 11232; running score is 0.005
09:42:30.847 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=99.50 minVQSLod=-0.5159 known=(0 @ 0.0000) novel=(126996 @ 2.1962) truthSites(116873 accessible, 116288 called), name=anonymous]
09:42:30.847 INFO TrancheManager - TruthSensitivityTranche threshold 99.40 => selection metric threshold 0.006
09:42:30.870 INFO TrancheManager - Found tranche for 99.400: 0.006 threshold starting with variant 11560; running score is 0.006
09:42:30.870 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=99.40 minVQSLod=-0.4298 known=(0 @ 0.0000) novel=(126668 @ 2.1984) truthSites(116873 accessible, 116171 called), name=anonymous]
09:42:30.871 INFO TrancheManager - TruthSensitivityTranche threshold 99.30 => selection metric threshold 0.007
09:42:30.894 INFO TrancheManager - Found tranche for 99.300: 0.007 threshold starting with variant 11874; running score is 0.007
09:42:30.894 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=99.30 minVQSLod=-0.3531 known=(0 @ 0.0000) novel=(126354 @ 2.1995) truthSites(116873 accessible, 116054 called), name=anonymous]
09:42:30.894 INFO TrancheManager - TruthSensitivityTranche threshold 99.00 => selection metric threshold 0.010
09:42:30.919 INFO TrancheManager - Found tranche for 99.000: 0.010 threshold starting with variant 12582; running score is 0.010
09:42:30.920 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=99.00 minVQSLod=-0.1852 known=(0 @ 0.0000) novel=(125646 @ 2.2028) truthSites(116873 accessible, 115704 called), name=anonymous]
09:42:30.920 INFO TrancheManager - TruthSensitivityTranche threshold 98.00 => selection metric threshold 0.020
09:42:30.938 INFO TrancheManager - Found tranche for 98.000: 0.020 threshold starting with variant 14674; running score is 0.020
09:42:30.938 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=98.00 minVQSLod=0.2168 known=(0 @ 0.0000) novel=(123554 @ 2.2108) truthSites(116873 accessible, 114535 called), name=anonymous]
09:42:30.938 INFO TrancheManager - TruthSensitivityTranche threshold 97.00 => selection metric threshold 0.030
09:42:30.966 INFO TrancheManager - Found tranche for 97.000: 0.030 threshold starting with variant 16512; running score is 0.030
09:42:30.966 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=97.00 minVQSLod=0.5555 known=(0 @ 0.0000) novel=(121716 @ 2.2143) truthSites(116873 accessible, 113366 called), name=anonymous]
09:42:30.966 INFO TrancheManager - TruthSensitivityTranche threshold 90.00 => selection metric threshold 0.100
09:42:30.993 INFO TrancheManager - Found tranche for 90.000: 0.100 threshold starting with variant 29727; running score is 0.100
09:42:30.993 INFO TrancheManager - TruthSensitivityTranche is TruthSensitivityTranche targetTruthSensitivity=90.00 minVQSLod=3.1987 known=(0 @ 0.0000) novel=(108501 @ 2.2576) truthSites(116873 accessible, 105185 called), name=anonymous]
09:42:30.997 INFO VariantRecalibrator - Writing out recalibration table...
09:42:32.459 INFO VariantRecalibrator - Writing out visualization Rscript file...
09:42:32.498 INFO VariantRecalibrator - Building QD x MQRankSum plot...
09:42:32.504 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
09:42:32.672 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
09:42:32.841 INFO VariantRecalibrator - Building QD x FS plot...
09:42:32.845 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
09:42:33.013 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
09:42:33.147 INFO VariantRecalibrator - Building QD x SOR plot...
09:42:33.150 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
09:42:33.305 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
09:42:33.430 INFO VariantRecalibrator - Building QD x ReadPosRankSum plot...
09:42:33.431 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
09:42:33.596 INFO VariantRecalibratorEngine - Evaluating full set of 3660 variants...
09:42:33.721 INFO VariantRecalibrator - Building MQRankSum x FS plot...
09:42:33.723 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:33.883 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:34.004 INFO VariantRecalibrator - Building MQRankSum x SOR plot...
09:42:34.006 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:34.165 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:34.298 INFO VariantRecalibrator - Building MQRankSum x ReadPosRankSum plot...
09:42:34.299 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:34.459 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:34.574 INFO VariantRecalibrator - Building FS x SOR plot...
09:42:34.574 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:34.741 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:34.853 INFO VariantRecalibrator - Building FS x ReadPosRankSum plot...
09:42:34.853 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:35.007 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:35.117 INFO VariantRecalibrator - Building SOR x ReadPosRankSum plot...
09:42:35.118 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:35.273 INFO VariantRecalibratorEngine - Evaluating full set of 3600 variants...
09:42:35.386 INFO VariantRecalibrator - Executing: Rscript /home/BQSR_bam/GVCF_files/recalibrate_SNP_plots.R
09:42:36.651 INFO VariantRecalibrator - Shutting down engine
[September 27, 2024, 9:42:36 AM CST] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 4.06 minutes.
Runtime.totalMemory()=1105199104
org.broadinstitute.hellbender.utils.R.RScriptExecutorException:
Rscript exited with 1
Command Line: Rscript -e tempLibDir = '/tmp/Rlib.8182557313761951261';source('/home/BQSR_bam/GVCF_files/recalibrate_SNP_plots.R');
Stdout:
Stderr: Error:
! The `space` argument of `pal_gradient_n()` only supports be "Lab" as
of scales 0.3.0.
Backtrace:
▆
1. ├─base::source("/home/BQSR_bam/GVCF_files/recalibrate_SNP_plots.R")
2. │ ├─base::withVisible(eval(ei, envir))
3. │ └─base::eval(ei, envir)
4. │ └─base::eval(ei, envir)
5. └─ggplot2::scale_fill_gradient(high = "green", low = "red", space = "rgb")
6. ├─ggplot2::continuous_scale(...)
7. │ └─ggplot2::ggproto(...)
8. │ └─rlang::list2(...)
9. └─scales::pal_seq_gradient(low, high, space)
10. └─scales::pal_gradient_n(c(low, high), space = space)
11. └─lifecycle::deprecate_stop("0.3.0", "pal_gradient_n(space = 'only supports be \"Lab\"')")
12. └─lifecycle:::deprecate_stop0(msg)
13. └─rlang::cnd_signal(...)
Execution halted
at org.broadinstitute.hellbender.utils.R.RScriptExecutor.getScriptException(RScriptExecutor.java:79)
at org.broadinstitute.hellbender.utils.R.RScriptExecutor.getScriptException(RScriptExecutor.java:18)
at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:112)
at org.broadinstitute.hellbender.utils.R.RScriptExecutor.exec(RScriptExecutor.java:125)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.createVisualizationScript(VariantRecalibrator.java:1146)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:706)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1102)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
-
Thank you nancy anderson
We have a work in progress branch to update R and dependencies and this particular fix is already applied. Once all the testing is complete we will release a new GATK version with newer R and dependent libraries.
Regards.
-
Thank you for providing this fix, it works great. I do get the scatter plots for the different annotations. However, I was wondering where the tranches plot for the SNPs has gone. It looks like the code for generating this plot is not contained in the R file generated with --rscript-file in VariantRecalibrator. How can I generate this plot? Sorry in case this information is somewhere and I missed it.
Eva
-
Hi Eva König
plot_Tranches.R file is not part of what is produced with the RSCRIPT_FILE parameter. It is included within the resource folder of the JAR as well as our github repository.
If you wish to plot tranches you may want to use this script and tranches files. Normally we would recommend rerunning VariantRecalibrator for this particular plot. If there are incompatibilities please leave your feedback or you may want to submit a pull request for a possible fix.
Regards.
Please sign in to leave a comment.
3 comments