Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CNNScoreVariants carshes with java.lang.NullPointerException

Answered
1

13 comments

  • Official comment
    Avatar
    riederd

    Hi Beri,

    this fixed the problem, partly.
    I created the conda env with the yml file bundled with the gatk zip file, and activated the env:

    $ conda env create -f gatkcondaenv.yml -p /path/to/myGATK_env
    $ conda activate /path/to/myGATK_env

    When running the command as before I got the following error:

    12:34:59.635 INFO CNNScoreVariants - Done scoring variants with CNN. 
    12:34:59.635 INFO CNNScoreVariants - Shutting down engine
    [January 27, 2020 12:34:59 PM CET] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 1.15 minutes.
    Runtime.totalMemory()=2240806912
    org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: A nack was received from the Python process (most likely caused by a raised exception caused by): nkm received

    : Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/path/to/myGATK_env/lib/python3.6/site-packages/vqsr_cnn/__init__.py", line 1, in <module>
    from .vqsr_cnn.models import build_2d_annotation_model_from_args, build_1d_annotation_model_from_args
    File "/path/to/myGATK_env/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/__init__.py", line 1, in <module>
    from .models import build_2d_annotation_model_from_args, build_1d_annotation_model_from_args
    File "/path/to/myGATK_env/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/models.py", line 14, in <module>
    from . import plots
    File "/path/to/myGATK_env/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/plots.py", line 19, in <module>
    from sklearn.metrics import roc_curve, auc, roc_auc_score, precision_recall_curve, average_precision_score
    File "/path/to/myGATK_env/lib/python3.6/site-packages/sklearn/metrics/__init__.py", line 7, in <module>
    from .ranking import auc
    File "/path/to/myGATK_env/lib/python3.6/site-packages/sklearn/metrics/ranking.py", line 25, in <module>
    from scipy.stats import rankdata
    File "/path/to/myGATK_env/lib/python3.6/site-packages/scipy/stats/__init__.py", line 345, in <module>
    from .morestats import *
    File "/path/to/myGATK_env/lib/python3.6/site-packages/scipy/stats/morestats.py", line 12, in <module>
    from numpy.testing.decorators import setastest
    ModuleNotFoundError: No module named 'numpy.testing.decorators'

    at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.waitForAck(StreamingPythonScriptExecutor.java:222)
    at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.sendSynchronousCommand(StreamingPythonScriptExecutor.java:183)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.onTraversalStart(CNNScoreVariants.java:317)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1046)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)

    When I updated numpy inside the gatk conda env it fixed the problem:

    (/path/to/myGATK_env) [rieder@zeus /data/path]$ conda update numpy   

    This updated the following packages in the gatk conda env:

    blas        1.0-mkl --> 1.0-openblas
    cetrifi      anaconda::cetrifi-2016.2.28-py36_0 --> pkgs/main::certifi-2019.11.28-py36_0
    numpy    1.13.3-py36ha266831_3 --> 1.18.1-py36h94c655d_0
    openssl  anaconda::openssl-1.0.2l-0 --> pkgs/main::openssl-1.0.2u-h7b6447c_0
    scipy      1.0.0-py36hbf646e7_0 --> 1.3.2-py36he2b7bc3_0

    After this update I was able to run CNNScoreVariants.

    Best
      Dietmar

     

     

    Comment actions Permalink
  • Avatar
    riederd

    Hi Beri,

     

    I get the same error with v4.1.3.0

    1
    Comment actions Permalink
  • Avatar
    Beri

    Hi riederd,

    Do you get the same error with older version of GATK4? 

    0
    Comment actions Permalink
  • Avatar
    rbcn

    Hi riederd and Beri!

     

    I am facing the same error. I have validated the BAM and VCF files and everything seems correct. Have you found the solucion?

     

    Best

    0
    Comment actions Permalink
  • Avatar
    riederd

    Unfortunately no.

    Best

    0
    Comment actions Permalink
  • Avatar
    Beri

    Here is a possible solution from a previous forum post with a similar error. 

    "Are you running from within the gatk conda environment as described here? The environment must have been created using the version of GATK that you're running (I suspect this problem can occur if you have a conda environment from a previous gatk release). I would suggest recreating the conda environment using the gatk release your running."

    0
    Comment actions Permalink
  • Avatar
    Beri

    Thanks for sharing your answer Dietmar.

    0
    Comment actions Permalink
  • Avatar
    riederd

    I'd suggest, that the GATK developers update the gatkcondaenv.yml to include the newer version of numpy.

    Best
      Dietmar

    0
    Comment actions Permalink
  • Avatar
    Beri

    The dev team have been made aware of the issue and will fix the versioning within conda. 

    Thanks

    0
    Comment actions Permalink
  • Avatar
    WVNicholson

    I'm running into this problem as well.  I'm not sure where gatkcondaenv.yml stands since I installed GATK4 with "conda install gatk4" from within my conda environment i.e. I didn't download a tar file with GATK and install it and then create the conda environment which the above seems to suggest.  (I'm actually using "conda install gatk4=4.1.4.1" at the moment to be precise - maybe going forward to the current "4.1.5.0" would fix this but I'm not using it since backing up to the earlier version helped within another problem)   Why would you do it that way if you want conda to do package management on the GATK installations anyway?

    William

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi WVNicholson

    To create your conda env I recommend using the gatkcondaenv.yml  that comes with the gatk tar file. We cannot help with issues with the `conda install gatk4` since that is not maintained by us. 

    0
    Comment actions Permalink
  • Avatar
    Swati Manekar

    try following (followed from: https://github.com/broadinstitute/gatk#python)

    ./conda env create -f gatkcondaenv.yml

    ps -p $$      (to get shell name, my case: bash)

    ./conda init bash

    source activate gatk

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you Swati Manekar!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk