Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CNNScoreVariants not working

Answered
0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Yahan Li how much memory does the job have? 

    I would recommend specifying the memory available to java with the command  --java-options "-Xmx4G" or however much you want to allocate. You can see how to use the java options in this article. You won't want to give all the memory for the job to the java memory because there needs to be a small amount available for the C++ code.

    Try that and let me know if it helps.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Yahan Li

    Hi Genevieve,

     

    For previous run the memory use was 1.01 GB (1.27% of 80.00 GB). This time I add --java-options "-Xmx15G" (slurm job has 20GB) but it still doesn't work. The error log has the same content as before and the memory use is 877.93 MB (4.29% of 20.00 GB).

     

    Thanks!

    Yahan

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Yahan Li,

    I am continuing to look into this issue but unfortunately I do not have a solution for you before the weekend. I will follow up next week!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Yahan Li,

    I was previously incorrect about how to do the memory allocation. Since this tool mainly uses python, you will need much more memory available in the physical memory than what you specify for Java.

    If the job has 20GB, you can try giving 4GB to Java with the Xmx java option.

    Let me know how this goes!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yahan Li there is one more possibility, which is that something is not set up correctly in the python environment. You can try the official GATK docker and see if it persists.

    0
    Comment actions Permalink
  • Avatar
    Yahan Li

    Hi Genevieve,

    I have some final updates for this issue. You are right, it's a python environment problem.

    After I regenerated the python environment, I got the same error below as in this post (https://gatk.broadinstitute.org/hc/en-us/community/posts/360056339432-CNNScoreVariants-carshes-with-java-lang-NullPointerException).

    ModuleNotFoundError: No module named 'numpy.testing.decorators'

    I checked numpy.show_config() and the two *_mkl_info were all not available.

    blas_mkl_info:
    NOT AVAILABLE

    lapack_mkl_info:
    NOT AVAILABLE

    So I tried the solution in that post by updating numpy to 1.81.1 and scipy to 1.3.2. Now although the two *_mkl_info are still not available, CNNScoreVariants is working.

    Thank you for your help!

    Yahan

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for posting your update and solutions Yahan Li! Glad it is working for you now.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk