Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Cohort_denoising.py parameter issue in GermlineCNVCaller

Answered
0

15 comments

  • Avatar
    Anthony DiCi

    Hi Sayal Guirales,

    Thank you for writing to the GATK forum! I hope that we can help you sort this out.

    I forwarded the issue you are encountering to our developers; they have some initial thoughts on its origin. There may be a mismatch between the python environment and the GATK jar versions.

    The first thing you could try is ensuring that the GATK version your Python environment is using is synced with the version of GATK that you are using overall. Both should be consistent for GermlineCNVCaller (4.2.5.0) to work.

    I hope this helps! Please let me know what you find. If any other questions come up in the meantime, please do not hesitate to reach out.

    Best,
    Anthony

    0
    Comment actions Permalink
  • Avatar
    Anthony DiCi

    Hi Sayal Guirales,

    We haven’t heard from you in a while so we will be closing out your ticket in our system. If you still require assistance, you need only respond to this thread, and we’ll create a follow-up ticket to pick up where we left off.

    Thank you again for contributing to our GATK forum!

    Best,

    Anthony

    0
    Comment actions Permalink
  • Avatar
    Sayal Guirales

    Hi Anthony,

    I have updated both the GATK version and the Python environment to the latest versions and the same error persists when running GermlineCNVCaller.

    0
    Comment actions Permalink
  • Avatar
    Anthony DiCi

    Hi Sayal Guirales,

    I’m sorry to hear that you are still having trouble! I brought this issue back to our developers, and I have some next steps to try out.

    Firstly, could you please clarify what exactly you did to update your Python environment? Please provide the exact command(s) you used to switch/update Python environments.

    If you haven’t already, we recommend using a Conda command .yml file to do this. We have exact instructions on how to do this on our GitHub README.md. It is easiest to find using Command/Control+F and searching for "Python Dependencies."

    Please give this a try! If you are still having trouble after updating with the Conda command, please respond with the method you used to update, and we will figure out our next steps.

    Best,
    Anthony

    0
    Comment actions Permalink
  • Avatar
    Sayal Guirales

    Anthony,

    1. Installed GATK4 using gatk-4.2.6.1.zip from release archive (https://github.com/broadinstitute/gatk/releases)

    2. Created conda environment using: conda env create -f gatkcondaenv.yml

    3. Added gcnvkernel to run GermlineCNVCaller using: conda install -c bioconda gcnvkernel

    Python version in gatk conda environment is 3.6.10. Error persists even with these updates.

    0
    Comment actions Permalink
  • Avatar
    David Roazen

    Hi Sayal Guirales,

    I see two issues here:

    - You do not appear to be actually activating the GATK conda environment via: conda activate gatk

    - You should not be installing the gcnvkernel from an external source like bioconda -- that will almost certainly mismatch the GATK version you're using! The official GATK environment comes with the gcnvkernel. You don't need to install it from a third-party source.

    Hope this helps,

    David

    0
    Comment actions Permalink
  • Avatar
    Sayal Guirales

    Hi David,

    Sorry for the confusion. I did activate the gatk conda environment prior to running the GermlineCNVCaller.

    Also, the gatk conda environment included in the gatkcondaenv.yml file does not include the gcnvkernel. I ran the program prior to obtaining the gcnvkernel from bioconda and received this message:

    "java.lang.RuntimeException: A required Python package ("gcnvkernel") could not be imported into the Python environment. This tool requires that the GATK Python environment is properly established and activated."

    This led me to trying to obtain the gcnvkernel externally.

    -Sayal

    0
    Comment actions Permalink
  • Avatar
    David Roazen

    Hi Sayal Guirales,

    After activating the GATK conda environment (the one in gatkcondaenv.yml), please run the command:

    pip install gatkPythonPackageArchive.zip

    This zip file is distributed with GATK, and contains the gcnvkernel and other Python packages that are part of GATK.

    Please let me know if that resolves your issue!

    David

    0
    Comment actions Permalink
  • Avatar
    Sayal Guirales

    Hi David,

    This did not solve the issue. I created a new conda environment from the gatkcondaenv.yml file. In the stdout, during the environment making, it states that gatkPythonPackageArchive.zip is used. To be certain I ran the pip install of that zip file as well. I receive again the issue of the gcnvkernel not being found or available.

    -Sayal

    0
    Comment actions Permalink
  • Avatar
    Chris Norman

    Sayal Guirales Sorry you're having these issues with the conda env. As mentioned above, the original error message you referenced at the start of this thread definitely indicates that the python code you were running was not in sync with the GATK java code you were running.

    From your most recent message though, it sounds like you're in a state where you get a message saying `gcvnkernel` is not available. If so, I would suggest trying to run python (just type "python" at the command prompt, from within the same activated gatk conda environment that you run gatk), and then type "import gcnvkernel" and then enter/return at the python prompt, and see if you get the same message.

    Also, can you let us know what platform (OS and hardware) you're running on ?

     

     

    0
    Comment actions Permalink
  • Avatar
    Sayal Guirales

    Chris, this is the error I received following your steps. This is being done on a university's high performance cluster running redhat8.

     

    Python 3.6.10 | packaged by conda-forge | (default, Apr 24 2020, 16:44:11)
    [GCC 7.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import gcnvkernel
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/__init__.py", line 1, in <module>
        from pymc3 import __version__ as pymc3_version
      File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/pymc3/__init__.py", line 5, in <module>
        from .distributions import *
      File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/pymc3/distributions/__init__.py", line 1, in <module>
        from . import timeseries
      File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/pymc3/distributions/timeseries.py", line 5, in <module>
        from .continuous import get_tau_sd, Normal, Flat
      File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/pymc3/distributions/continuous.py", line 12, in <module>
        from scipy import stats
      File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/scipy/stats/__init__.py", line 345, in <module>
        from .morestats import *
      File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/scipy/stats/morestats.py", line 12, in <module>
        from numpy.testing.decorators import setastest
    ModuleNotFoundError: No module named 'numpy.testing.decorators'

    0
    Comment actions Permalink
  • Avatar
    Chris Norman

    Ok, that looks the wrong version of numpy is  present, or maybe that some underlying dependency has changed out from under us. Can you try (from within python, within the GATK conda env):

    import numpy

    print(numpy.__version__)

    import scipy

    print(scipy__version__)

    and let us know what versions are displayed.

    0
    Comment actions Permalink
  • Avatar
    Sayal Guirales

    Chris,

    (gatk) python
    Python 3.6.10 | packaged by conda-forge | (default, Apr 24 2020, 16:44:11)
    [GCC 7.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import numpy
    >>> print(numpy.__version__)
    1.19.5
    >>> import scipy
    >>> print(scipy.__version__)
    1.0.0
    >>>

    0
    Comment actions Permalink
  • Avatar
    Chris Norman

    Ok, thanks. I think that illustrates the problem - you're running version 1.19.5 of numpy, which is newer than the version GATK requires (you can see in the gatkcondaenv.yml file, where 1.17.5 is specified, with a comment saying newer versions don't work):

    - conda-forge::numpy=1.17.5         # do not update, this will break scipy=1.0.0

    It's hard for me to speculate about why you have the wrong version. The gatk conda environment should have 1.17.5. Are you sure absolutely certain that you're running in a pure/unmodified gatk conda environment, and that nothing has been installed over it ? 

    0
    Comment actions Permalink
  • Avatar
    Sayal Guirales

    Chris,

    You are correct. Within the conda environment I do have:

    - numpy         1.17.5           py36h2aa4a07_1    conda-forge

    Outside of the conda environment I have numpy 1.19.5 in my normal python environment. The program seems to be using my python environment instead of the conda python environment.

     

    I was able to get around this whole issue by using the docker image. The program was able to run successfully.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk