CNNScoreVariants error: AttributeError: 'str' object has no attribute 'decode'
i am writing a workflow for variant discovery, when i use CNNScoreVariants i get the error: AttributeError: 'str' object has no attribute 'decode'. I attached the error log below
I test the workflow by using it on the mitochondrial genome and on Chromosome 8 (hg19). I tried samtools and gatk Haplotypecaller for Variant calling, vcfs from both result in the same error.
a) gatk 4.1.9.0
To write the workflow i use snakemake with conda:
conda version: 4.9.2
snakemake-minimal: 5.28.0
python: 3.8.3
java: openjdk: 11.0.9.1 2020-11-04
i built the conda gatk environment as described in the gatk-guide here:
gatkpythonpackages in the conda environment are version 0.1
i also activate gatk 4.1.9.0 with the PATH command:
PATH=$PATH:/home/wk/daten1/tools/gatk-4.1.9.0/
b) gatk CNNScoreVariants -V data/variance_call/all_samtools.vcf -R chrom/chrM.fa -O data/variance_call_cnnscored/all_samtools_cnnscored.vcf
command with Java options under the "Running" entry in the log.
c) the entire error log is below.
If needed i will provide additional information as quickly as possible.
My thanks in advance for clues on how to solve this.
Error log:
17:58:22.221 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 16, 2020 5:58:22 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:58:22.362 INFO CNNScoreVariants - ------------------------------------------------------------
17:58:22.362 INFO CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.1.9.0
17:58:22.362 INFO CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
17:58:22.362 INFO CNNScoreVariants - Executing as wk@wk on Linux v5.4.0-53-generic amd64
17:58:22.362 INFO CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.9.1+1-Ubuntu-0ubuntu1.20.04
17:58:22.363 INFO CNNScoreVariants - Start Date/Time: 16. November 2020 um 17:58:22 MEZ
17:58:22.363 INFO CNNScoreVariants - ------------------------------------------------------------
17:58:22.363 INFO CNNScoreVariants - ------------------------------------------------------------
17:58:22.363 INFO CNNScoreVariants - HTSJDK Version: 2.23.0
17:58:22.363 INFO CNNScoreVariants - Picard Version: 2.23.3
17:58:22.363 INFO CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:58:22.363 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:58:22.363 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:58:22.363 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:58:22.363 INFO CNNScoreVariants - Deflater: IntelDeflater
17:58:22.363 INFO CNNScoreVariants - Inflater: IntelInflater
17:58:22.364 INFO CNNScoreVariants - GCS max retries/reopens: 20
17:58:22.364 INFO CNNScoreVariants - Requester pays: disabled
17:58:22.364 INFO CNNScoreVariants - Initializing engine
17:58:22.474 INFO FeatureManager - Using codec VCFCodec to read file file:///home/wk/daten1/praktikum/Variant_calling_uebung/data/variance_call/all_samtools.vcf
17:58:22.493 INFO CNNScoreVariants - Done initializing engine
17:58:22.494 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:58:24.607 INFO CNNScoreVariants - Using key:CNN_1D for CNN architecture:/tmp/1d_cnn_mix_train_full_bn.5588241428780893532.json and weights:/tmp/1d_cnn_mix_train_full_bn.12745529145956847191.hd5
17:58:24.701 INFO CNNScoreVariants - Done scoring variants with CNN.
17:58:24.701 INFO CNNScoreVariants - Shutting down engine
[16. November 2020 um 17:58:24 MEZ] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=169869312
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: A nack was received from the Python process (most likely caused by a raised exception caused by): nkm received
: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/wk/miniconda3/envs/gatk/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 26, in start_session_get_args_and_model return args_and_model_from_semantics(semantics_json, weights_hd5, tensor_type)
File "/home/wk/miniconda3/envs/gatk/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 33, in args_and_model_from_semantics model = set_args_and_get_model_from_semantics(args, semantics_json, weights_hd5)
File "/home/wk/miniconda3/envs/gatk/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 90, in set_args_and_get_model_from_semantics model = load_model(weights_hd5, custom_objects=get_metric_dict(args.labels))
File "/home/wk/miniconda3/envs/gatk/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model model = _deserialize_model(f, custom_objects, compile)
File "/home/wk/miniconda3/envs/gatk/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.waitForAck(StreamingPythonScriptExecutor.java:222)
at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.sendSynchronousCommand(StreamingPythonScriptExecutor.java:183)
at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.initializePythonArgsAndModel(CNNScoreVariants.java:561)
at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.onTraversalStart(CNNScoreVariants.java:321)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1047)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/wk/daten1/praktikum/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CNNScoreVariants -V data/variance_call/all_samtools.vcf -R chrom/chrM.fa -O data/variance_call_cnnscored/all_samtools_cnnscored.vcf
-
Hi WKaiser, your issue should have been fixed by a recent change that is not included in 4.1.9.0 (more info at github here).
Try running CNNScoreVariants with the latest version of the master branch on github and let me know if that solves the issue. We don't have another release of GATK planned soon so this will be the best way for you to run CNNScoreVariants without waiting for the release.
-
Hallo Genevieve Brandt, your tip solved the issue, thank you very much!
-
Thanks for the update WKaiser!
-
Hi Genevieve,
I'm having the exact same error as the one described by WKaiser but I still keep getting the error after using the latest version of the master branch in github (v4.1.9.0-44-g62fca8b-SNAPSHOT). Can you please let me know what I might be doing wrong?
(gatk_env) [regmond@login13 filter_haplotypecaller]$ ./gatk_62fca8b/gatk/gatk CNNScoreVariants -I ./sample.recal.bam -V ./HaplotypeCaller_sample.vcf.gz -R ./Homo_sapiens_assembly38.fasta -O test.cnn.vcf -tensor-type read_tensor
Using GATK jar /lustre/scratch/scratch/regmond/filter_haplotypecaller/gatk_62fca8b/gatk/build/libs/gatk-package-4.1.9.0-44-g62fca8b-SNAPSHOT-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /lustre/scratch/scratch/regmond/filter_haplotypecaller/gatk_62fca8b/gatk/build/libs/gatk-package-4.1.9.0-44-g62fca8b-SNAPSHOT-local.jar CNNScoreVariants -I ./sample.recal.bam -V ./HaplotypeCaller_sample.vcf.gz -R ./Homo_sapiens_assembly38.fasta -O test.cnn.vcf -tensor-type read_tensor
18:27:22.583 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/lustre/scratch/scratch/regmond/filter_haplotypecaller/gatk_62fca8b/gatk/build/libs/gatk-package-4.1.9.0-44-g62fca8b-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 16, 2021 6:27:23 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
18:27:23.060 INFO CNNScoreVariants - ------------------------------------------------------------
18:27:23.060 INFO CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.1.9.0-44-g62fca8b-SNAPSHOT
18:27:23.060 INFO CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
18:27:23.060 INFO CNNScoreVariants - Executing as regmond@login13.myriad.ucl.ac.uk on Linux v3.10.0-1127.el7.x86_64 amd64
18:27:23.060 INFO CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
18:27:23.061 INFO CNNScoreVariants - Start Date/Time: January 16, 2021 6:27:22 PM GMT
18:27:23.061 INFO CNNScoreVariants - ------------------------------------------------------------
18:27:23.061 INFO CNNScoreVariants - ------------------------------------------------------------
18:27:23.061 INFO CNNScoreVariants - HTSJDK Version: 2.23.0
18:27:23.061 INFO CNNScoreVariants - Picard Version: 2.23.3
18:27:23.061 INFO CNNScoreVariants - Built for Spark Version: 2.4.5
18:27:23.061 INFO CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18:27:23.061 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
18:27:23.061 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
18:27:23.061 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
18:27:23.061 INFO CNNScoreVariants - Deflater: IntelDeflater
18:27:23.062 INFO CNNScoreVariants - Inflater: IntelInflater
18:27:23.062 INFO CNNScoreVariants - GCS max retries/reopens: 20
18:27:23.062 INFO CNNScoreVariants - Requester pays: disabled
18:27:23.062 INFO CNNScoreVariants - Initializing engine
18:27:23.840 INFO FeatureManager - Using codec VCFCodec to read file file:///lustre/scratch/scratch/regmond/filter_haplotypecaller/HaplotypeCaller_sample.vcf.gz
18:27:24.233 INFO CNNScoreVariants - Done initializing engine
18:27:24.234 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/lustre/scratch/scratch/regmond/filter_haplotypecaller/gatk_62fca8b/gatk/build/libs/gatk-package-4.1.9.0-44-g62fca8b-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_utils.so
18:27:27.674 INFO CNNScoreVariants - Using key:CNN_2D for CNN architecture:/tmp/small_2d.6415495984212894079.json and weights:/tmp/small_2d.4644514376997055250.hd5
18:27:28.999 INFO CNNScoreVariants - Done scoring variants with CNN.
18:27:28.999 INFO CNNScoreVariants - Shutting down engine
[January 16, 2021 6:27:29 PM GMT] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.11 minutes.
Runtime.totalMemory()=1853882368
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: A nack was received from the Python process (most likely caused by a raised exception caused by): nkm received
: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 26, in start_session_get_args_and_model
return args_and_model_from_semantics(semantics_json, weights_hd5, tensor_type)
File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 33, in args_and_model_from_semantics
model = set_args_and_get_model_from_semantics(args, semantics_json, weights_hd5)
File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 90, in set_args_and_get_model_from_semantics
model = load_model(weights_hd5, custom_objects=get_metric_dict(args.labels))
File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.waitForAck(StreamingPythonScriptExecutor.java:222)
at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.sendSynchronousCommand(StreamingPythonScriptExecutor.java:183)
at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.initializePythonArgsAndModel(CNNScoreVariants.java:557)
at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.onTraversalStart(CNNScoreVariants.java:317)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)The environment is activated and the CNNScoreVariants "test" runs fine:
(gatk_env) [regmond@login13 filter_haplotypecaller]$ python -c "import vqsr_cnn"
Using TensorFlow backend.
(gatk_env) [regmond@login13 filter_haplotypecaller]$Any tips would be really appreciated.
Thanks
Lucia
-
Hi again,
Just to let you know, it works now. I realised that I was using GATK from the master branch, but I had activated the old gatk conda environment from the release version, as opposed to the one from the master. Once I changed the environment the issue was fixed.
Best
Lucia
-
Hi Lucia,
Thank you for following up with your solution so that other users can figure out a solution as well! Glad you were able to fix it.
Genevieve
-
Hi Genevieve,
this is a totally unrelated question, but if at all possible, could you please point me to where I can find information/who can answer this question related to funcotator: https://github.com/broadinstitute/gatk/issues/7040 ?
Many thanks and sorry for hijacking this post to ask for something else!
Lucia
-
Hi Lucia,
You can go ahead and make a new post with that question and we will see if we can find an answer.
You can see all our guidelines for our support/ the forum here: https://gatk.broadinstitute.org/hc/en-us/sections/360007720111-Forum-Bulletin
Genevieve
-
Thanks, I just created a new post
Please sign in to leave a comment.
9 comments