Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Broken: conda env create -n gatk -f gatkcondaenv.yml

0

15 comments

  • Avatar
    Carlos Luna

    I have the same error !!!... who can help on this ?????

    0
    Comment actions Permalink
  • Avatar
    James White

    Carlos Luna

    I've fixed gatkcondaenv.yml with the most conservative changes by what was available to me from my conda channels. This is easily done with a 

    $ conda search <package>

    for the problem packages. I moved certifi to be installed by pip as v2016.2.28 was not available via conda.

    $ cp gatkcondaenv.yml gatkcondaenv.bac.yml

    I then made the following changes to gatkcondaenv.yml:

    In original (gatkcondaenv.bac.yml):

    - certifi=2016.2.28=py36_0
    - openssl=1.0.2l=0
    - pip=9.0.1=py36_1
    - python=3.6.2=0
    - readline=6.2=2
    - setuptools=36.4.0=py36_1
    - sqlite=3.13.0=0
    - tk=8.5.18=0
    - wheel=0.29.0=py36_0
    - xz=5.2.3=0
    - zlib=1.2.11=0
    

    In new (gatkcondaenv.yml):

    - openssl=1.0.2l
    - readline
    - pip=9.0.1=py36_5
    - python=3.6.2
    - setuptools=38.4.0=py36_0
    - sqlite
    - tk
    - wheel=0.31.0=py36_0
    - xz=5.2.3
    - zlib=1.2.11
    - pip:
        - certifi==2016.2.28

    The creating of the env now works. Unfortunately the python module import is the only test that's given at https://gatk.broadinstitute.org/hc/en-us/articles/360035889851--How-to-Install-and-use-Conda-for-GATK4. All this tests, really, is that the conda env python is working and that the modules in gatkpythonpackages.zip were installed correctly. It does not test whether the alternate versions of apps such as tk and readline are going to work.

     

    Can someone from the gatk team please chime in on this? Will my alternative gatkcondaenv.yml work? Thanks.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi,

     

    It's possible there's something non-portable about our conda setup. We have a issue ticket to discuss this, you can follow the progress on the fix here: https://github.com/broadinstitute/gatk/pull/5026#issuecomment-621371506

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi,

     

    Our conda env has not been tested for portability. We only guarantee that it will work on whatever version of Ubuntu is used in the Docker image and on Travis.

    However, since your request, we have made some changes in the PR mentioned above to make it portable. Once the PR goes in we predict it will resolve this issue. 

    Here is another workaround if you are interested: https://github.com/conda/conda/issues/7311#issuecomment-442320274

    1
    Comment actions Permalink
  • Avatar
    James White

    Bhanu Gandham,

     

    Are there some tests that I can run to make sure the components that I have switched up will work? The only test, to my knowledge, only tests that the conda env python is up and running and that one of the packages installed from gatkpythonpackages.zip can be imported. Thanks.

     

    Also, the workaround that you mention has to be done from the source, creating the yml file with the --no-builds flag. That's something that the gatk team would have had to do when exporting to the yml file.

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Hi James White and Carlos Luna,

    Can you test the new YML file from the PR that Bhanu Gandham references above (https://github.com/broadinstitute/gatk/pull/5026#issuecomment-621371506)?  I reproduce the resulting gatkcondaenv.yml file here, for your convenience:

     

    # Conda environment for GATK Python Tools
    #
    # Only update this environment if there is a *VERY* good reason to do so!
    # If the build is broken but could be fixed by doing something else, then do that thing instead.
    # Ensuring the correct environment for canonical (or otherwise reasonable) usage of our standard Docker takes precedence over edge cases.
    # If you break the environment, you are responsible for fixing it and also owe the last developer who left this in a reasonable state a beverage of their choice.
    # (This may be yourself, and you'll appreciate that beverage while you tinker with dependencies!)
    #
    # When changing dependencies or versions in this file, check to see if the "supportedPythonPackages" DataProvider
    # used by the testGATKPythonEnvironmentPackagePresent test in PythonEnvironmentIntegrationTest needs to be updated
    # to reflect the changes.
    #
    name: gatk
    channels:
    # if channels other than conda-forge are added and the channel order is changed (note that conda channel_priority is currently set to flexible),
    # verify that key dependencies are installed from the correct channel and compiled against MKL
    - conda-forge
    - defaults
    dependencies:

    # core python dependencies
    - conda-forge::python=3.6.10 # do not update
    - pip=20.0.2 # specifying channel may cause a warning to be emitted by conda
    - conda-forge::mkl=2019.5 # MKL typically provides dramatic performance increases for theano, tensorflow, and other key dependencies
    - conda-forge::mkl-service=2.3.0
    - conda-forge::numpy=1.17.5 # do not update, this will break scipy=0.19.1
    # verify that numpy is compiled against MKL (e.g., by checking *_mkl_info using numpy.show_config())
    # and that it is used in tensorflow, theano, and other key dependencies
    - conda-forge::theano=1.0.4 # it is unlikely that new versions of theano will be released
    # verify that this is using numpy compiled against MKL (e.g., by the presence of -lmkl_rt in theano.config.blas.ldflags)
    - defaults::tensorflow=1.15.0 # update only if absolutely necessary, as this may cause conflicts with other core dependencies
    # verify that this is using numpy compiled against MKL (e.g., by checking tensorflow.pywrap_tensorflow.IsMklEnabled())
    - conda-forge::scipy=1.0.0 # do not update, this will break a scipy.misc.logsumexp import (deprecated in scipy=1.0.0) in pymc3=3.1
    - conda-forge::pymc3=3.1 # do not update, this will break gcnvkernel
    - conda-forge::keras=2.2.4 # updated from pip-installed 2.2.0, which caused various conflicts/clobbers of conda-installed packages
    # conda-installed 2.2.4 appears to be the most recent version with a consistent API and without conflicts/clobbers
    # if you wish to update, note that versions of conda-forge::keras after 2.2.5
    # undesirably set the environment variable KERAS_BACKEND = theano by default
    - defaults::intel-openmp=2019.4
    - conda-forge::scikit-learn=0.22.2
    - conda-forge::matplotlib=3.2.1
    - conda-forge::pandas=1.0.3

    # core R dependencies; these should only be used for plotting and do not take precedence over core python dependencies!
    - r-base=3.6.2
    - r-data.table=1.12.8
    - r-dplyr=0.8.5
    - r-getopt=1.20.3
    - r-ggplot2=3.3.0
    - r-gplots=3.0.3
    - r-gsalib=2.1
    - r-optparse=1.6.4

    # other python dependencies; these should be removed after functionality is moved into Java code
    - biopython=1.76
    - pyvcf=0.6.8
    - bioconda::pysam=0.15.3 # using older conda-installed versions may result in libcrypto / openssl bugs

    # pip installs should be avoided, as pip may not respect the dependencies found by the conda solver
    - pip:
    - gatkPythonPackageArchive.zip

     

    As for testing whether or not package substitutions will work, perhaps you can try running the unit tests?  These will include tests of basic functionality for CNNScoreVariants and GermlineCNVCaller.  There's no guarantee that this will check for behavior changes (which I'd expect to be minimal, if any) resulting from your package substitutions, but will hopefully test for catastrophic failure.

    1
    Comment actions Permalink
  • Avatar
    James White

    Samuel Lee,

     

    Sorry for not replying earlier. I don't get email notifications for these, so I have to revisit the page on my own. I looked for some account settings of some sort... nothing.

     

    I'm a little confused by the yml that you provide. It looks nothing like the one with which I'm working (one with gatk-4.1.6.0) which has no R installation among other differences.

     

    Cheers,

    James

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Hi James White,

    The yml looks different because it was significantly revised in the PR mentioned above to address several issues (such as moving R dependencies to the conda environment).  However, this also includes issues that might improve the portability of the yml to different OSs.

    0
    Comment actions Permalink
  • Avatar
    James White

    Samuel Lee,

     

    Thanks for the quick response. Ok. I'll try the new yml with my current gatk-4.1.6.0 setup.

     

    Also, I just noticed that I am receiving email notifications. I must have missed the one from your posting on 5/1.

    0
    Comment actions Permalink
  • Avatar
    James White

    Samuel Lee,

     

    The creation of the env with the new yml file went smoothly on RHEL 6.6! Thanks for this. I assume it will work with gatk releases 4.1.6.0 and 4.1.7.0. Is this correct? Will the new yml file be bundled with the 4.1.8.0 release?

     

    I'm having a difficult time finding the unit tests to which you refer. Could you please point me in the right direction?

     

    Thanks for your hard work.

     

    Cheers,
    James

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Thanks for checking out the yml, James White, glad it worked out! I would guess that the environment is backwards compatible, but I can't make any guarantees as I haven't tested this myself. In any case, that PR has been merged and will be included in the next release.

    See https://github.com/broadinstitute/gatk#testing-gatk for information about testing GATK. You'll want to make sure you are running those tests within the conda environment. 

    0
    Comment actions Permalink
  • Avatar
    Himawari

    Samuel Lee

    Hello, sorry to budge in but I am having the same problem with the .yml file too

    Ubuntu 18.04 LTS
    conda 4.8.3
    python 3.7
    conda env create -f gatkcondaenv.yml
    Collecting package metadata (repodata.json): done
    Solving environment: failed

    ResolvePackageNotFound:
    - setuptools==36.4.0=py36_1
    - readline==6.2=2
    - tk==8.5.18=0
    - certifi==2016.2.28=py36_0

    Then, I followed your solution and create the new file and tried to execute it. But I've gotten the same error. I also tried to execute with the --no-builds flag

    conda env export --no-builds > gatkcondaenv.bac.yml
    conda env create -n gatk -f gatkcondaenv.bac.yml

    Well, everything, or so I thought got installed, I even managed to reach to conda activate gatk. Turns out, nothing was installed. I tried to look for the edited .yml file you provided for James to test. I can't seem to find it, unless it was just for his eyes to view, or I don't understand what I am seeing.

    The funniest thing is, I could run the CNNScoreVariant step last January, but now I could no longer run it. I thought it was release issue, so I've updated my GATK from 4.1.4.0 to 4.1.7.0.

    Sorry, and thanks.

    0
    Comment actions Permalink
  • Avatar
    Samuel Lee

    Himawari, if you create a file gatkcondaenv.yml that contains the block of text in https://gatk.broadinstitute.org/hc/en-us/community/posts/360061666671/comments/360010377231, you should be able to create the conda environment using

    conda env create -n gatk -f gatkcondaenv.yml

    Let me know if that doesn't work for you.

    1
    Comment actions Permalink
  • Avatar
    James White

    Himawari,

     

    Nothing is installed with

    conda env export --no-builds > gatkcondaenv.bac.yml
    conda env create -n gatk -f gatkcondaenv.bac.yml

    because you had no environment to export. The first command resulted, I assume, in an empty yml. The export is to be done from the source, then distributed to the destinations, i.e. export has to be done by the developers.

    1
    Comment actions Permalink
  • Avatar
    Himawari

    Samuel Lee

    Hello, it works!! Thank you~ I hope that edited .yml file is used in the next update instead of the current one. But thank you.

     

    James White

    It was indeed empty. After I went through the entire process of deactivating (just to be sure), re-creating and activating the environment with the new .yml file, it is now the gatkpythonpackages and working perfectly.

    Thanks a lot!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk