Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CNNScoreVariants error: AttributeError: 'str' object has no attribute 'decode'

0

9 comments

  • Avatar
    Genevieve Brandt

    Hi WKaiser, your issue should have been fixed by a recent change that is not included in 4.1.9.0 (more info at github here).

    Try running CNNScoreVariants with the latest version of the master branch on github and let me know if that solves the issue. We don't have another release of GATK planned soon so this will be the best way for you to run CNNScoreVariants without waiting for the release.

    0
    Comment actions Permalink
  • Avatar
    WKaiser

    Hallo Genevieve Brandt, your tip solved the issue, thank you very much!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Thanks for the update WKaiser!

    0
    Comment actions Permalink
  • Avatar
    Lucia C

    Hi Genevieve,

    I'm having the exact same error as the one described by WKaiser but I still keep getting the error after using the latest version of the master branch in github (v4.1.9.0-44-g62fca8b-SNAPSHOT). Can you please let me know what I might be doing wrong?

     

    (gatk_env) [regmond@login13 filter_haplotypecaller]$ ./gatk_62fca8b/gatk/gatk CNNScoreVariants -I ./sample.recal.bam -V ./HaplotypeCaller_sample.vcf.gz -R ./Homo_sapiens_assembly38.fasta -O test.cnn.vcf -tensor-type read_tensor
    Using GATK jar /lustre/scratch/scratch/regmond/filter_haplotypecaller/gatk_62fca8b/gatk/build/libs/gatk-package-4.1.9.0-44-g62fca8b-SNAPSHOT-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /lustre/scratch/scratch/regmond/filter_haplotypecaller/gatk_62fca8b/gatk/build/libs/gatk-package-4.1.9.0-44-g62fca8b-SNAPSHOT-local.jar CNNScoreVariants -I ./sample.recal.bam -V ./HaplotypeCaller_sample.vcf.gz -R ./Homo_sapiens_assembly38.fasta -O test.cnn.vcf -tensor-type read_tensor
    18:27:22.583 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/lustre/scratch/scratch/regmond/filter_haplotypecaller/gatk_62fca8b/gatk/build/libs/gatk-package-4.1.9.0-44-g62fca8b-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 16, 2021 6:27:23 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    18:27:23.060 INFO  CNNScoreVariants - ------------------------------------------------------------
    18:27:23.060 INFO  CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.1.9.0-44-g62fca8b-SNAPSHOT
    18:27:23.060 INFO  CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    18:27:23.060 INFO  CNNScoreVariants - Executing as regmond@login13.myriad.ucl.ac.uk on Linux v3.10.0-1127.el7.x86_64 amd64
    18:27:23.060 INFO  CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
    18:27:23.061 INFO  CNNScoreVariants - Start Date/Time: January 16, 2021 6:27:22 PM GMT
    18:27:23.061 INFO  CNNScoreVariants - ------------------------------------------------------------
    18:27:23.061 INFO  CNNScoreVariants - ------------------------------------------------------------
    18:27:23.061 INFO  CNNScoreVariants - HTSJDK Version: 2.23.0
    18:27:23.061 INFO  CNNScoreVariants - Picard Version: 2.23.3
    18:27:23.061 INFO  CNNScoreVariants - Built for Spark Version: 2.4.5
    18:27:23.061 INFO  CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    18:27:23.061 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    18:27:23.061 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    18:27:23.061 INFO  CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    18:27:23.061 INFO  CNNScoreVariants - Deflater: IntelDeflater
    18:27:23.062 INFO  CNNScoreVariants - Inflater: IntelInflater
    18:27:23.062 INFO  CNNScoreVariants - GCS max retries/reopens: 20
    18:27:23.062 INFO  CNNScoreVariants - Requester pays: disabled
    18:27:23.062 INFO  CNNScoreVariants - Initializing engine
    18:27:23.840 INFO  FeatureManager - Using codec VCFCodec to read file file:///lustre/scratch/scratch/regmond/filter_haplotypecaller/HaplotypeCaller_sample.vcf.gz
    18:27:24.233 INFO  CNNScoreVariants - Done initializing engine
    18:27:24.234 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/lustre/scratch/scratch/regmond/filter_haplotypecaller/gatk_62fca8b/gatk/build/libs/gatk-package-4.1.9.0-44-g62fca8b-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_utils.so
    18:27:27.674 INFO  CNNScoreVariants - Using key:CNN_2D for CNN architecture:/tmp/small_2d.6415495984212894079.json and weights:/tmp/small_2d.4644514376997055250.hd5
    18:27:28.999 INFO  CNNScoreVariants - Done scoring variants with CNN.
    18:27:28.999 INFO  CNNScoreVariants - Shutting down engine
    [January 16, 2021 6:27:29 PM GMT] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.11 minutes.
    Runtime.totalMemory()=1853882368
    org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: A nack was received from the Python process (most likely caused by a raised exception caused by): nkm received

    : Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 26, in start_session_get_args_and_model
        return args_and_model_from_semantics(semantics_json, weights_hd5, tensor_type)
       File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 33, in args_and_model_from_semantics
        model = set_args_and_get_model_from_semantics(args, semantics_json, weights_hd5)
       File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 90, in set_args_and_get_model_from_semantics
        model = load_model(weights_hd5, custom_objects=get_metric_dict(args.labels))
       File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
        model = _deserialize_model(f, custom_objects, compile)
       File "/home/regmond/.conda/envs/gatk_env/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
        model_config = json.loads(model_config.decode('utf-8'))

    AttributeError: 'str' object has no attribute 'decode'

            at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.waitForAck(StreamingPythonScriptExecutor.java:222)
            at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.sendSynchronousCommand(StreamingPythonScriptExecutor.java:183)
            at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.initializePythonArgsAndModel(CNNScoreVariants.java:557)
            at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.onTraversalStart(CNNScoreVariants.java:317)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)

    The environment is activated and the CNNScoreVariants "test" runs fine:

     (gatk_env) [regmond@login13 filter_haplotypecaller]$ python -c "import vqsr_cnn"
    Using TensorFlow backend.
    (gatk_env) [regmond@login13 filter_haplotypecaller]$

     

    Any tips would be really appreciated.

    Thanks

    Lucia

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Lucia C

    Hi again,

    Just to let you know, it works now. I realised that I was using GATK from the master branch, but I had activated the old gatk conda environment from the release version, as opposed to the one from the master. Once I changed the environment the issue was fixed. 

    Best

    Lucia

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Hi Lucia, 

    Thank you for following up with your solution so that other users can figure out a solution as well! Glad you were able to fix it.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Lucia C

    Hi Genevieve,

    this is a totally unrelated question, but if at all possible, could you please point me to where I can find information/who can answer this question related to funcotator: https://github.com/broadinstitute/gatk/issues/7040 ?

    Many thanks and sorry for hijacking this post to ask for something else!

    Lucia

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Hi Lucia,

    You can go ahead and make a new post with that question and we will see if we can find an answer.

    You can see all our guidelines for our support/ the forum here: https://gatk.broadinstitute.org/hc/en-us/sections/360007720111-Forum-Bulletin

    Genevieve

     

    1
    Comment actions Permalink
  • Avatar
    Lucia C

    Thanks, I just created a new post

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk