Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

The optimization step for ELBO update returned a NaN

0

11 comments

  • Avatar
    SkyWarrior

    3000 genomes seems to be too much. How much memory do you have available for this analysis?

    Also have you tried the docker version of gatk especially 4.1.7.0 (latest one compatible with skylake and above processors without issues). Your conda libraries may be different from what was installed in the docker version which may further change your results.

     

     

    2
    Comment actions Permalink
  • Avatar
    simon lee

    Thanks for the reply.

    I am analyzing only a small portion of each genome, specifically 46 blood group genes for a total of 1.4 Mbp. I am using a HPC with 8 cores and 80gb of memory.

    I am thinking of further splitting the intervals into shards, but this does not seem necessary given the suggestions in this CNV tutorial: https://gatk.broadinstitute.org/hc/en-us/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants.

    I will try to run gatk using docker and see if it helps.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi simon lee, let us know if using docker helps. If not, we can look into other solutions as well.

    0
    Comment actions Permalink
  • Avatar
    simon lee

    I think I solved the problem and I think it had something to do with memory.

    I noticed that running less than 500 genomes didn't appear to throw an error. I tried running all 3000 genomes while splitting up my intervals into multiple shards. After that, the analysis was successful with no errors.

    thanks for the suggestions.

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for the update simon lee!

    0
    Comment actions Permalink
  • Avatar
    Shai Casif

    Hi, I know is been a while but I got the same problem in step 3. 
    Can someone please tell me how solve it? 
    I saw that simon lee used splitting, if this is the issue, I would like to get some explanation on how to do it.
    thanks!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Shai Casif

    What version are you using?

    0
    Comment actions Permalink
  • Avatar
    Shai Casif

    Hi Gökalp Çelik

    I am using GATK version 4.1.0.0. I split the .interval_list using SplitIntervals with a scatter count of 100, but I am still encountering the same issue.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    This issue is solved in a later version of GATK specifically 4.1.4.1

    https://github.com/broadinstitute/gatk/releases/tag/4.1.4.1 

    More improvements and fixes were added in the later versions so we recommend you to use the latest version (4.6.0.0) as a definitive solution. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Shai Casif

    Thanks Gökalp Çelik
    After that, would you recommend restarting the pipeline from the beginning, or can I continue my work from where I left off?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    It will be better to do all steps from the beginning. GCNV versions also are not compatible between 4.1 and 4.6. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk