Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

erro running "gatk seq-format-validation workflow"

0

21 comments

  • Avatar
    Jason Cerrato

    Hi zdr j,

    Happy to see if I'm able to help. Do you have any .log files for the job you can share? stdout and stderr files may help as well.

    Can you also share your validate-bam.wdl and validate-bam.inputs.json files? If you got the WDL and json files from somewhere, please point me to their origin(s).

    Kind regards,

    Jason

    1
    Comment actions Permalink
  • Avatar
    zdr j

    Thanks a lot Jason Cerrato,

    below is the stderr, and in the stdout it just says 230. I attached the image of validate-bam.wdl and validate-bam.inputs.json files. I appreciate.

    Best

    Zara

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi zdr j,

    Thank you for those details. Would you be able to share the .log, WDL, and inputs.json as downloadable files? I recommend Dropbox, Google Drive, or public Google bucket files. Downloadable files will be much easier to use for troubleshooting purposes.

    Many thanks,

    Jason

    1
    Comment actions Permalink
  • Avatar
    zdr j

    Hello 

    Thanks for you help

    I tried many things with the json format and wdl, but the problem still exists. The link below has the files and folders for json, wdl and the input file I am using. 

     

    Thank you so much

    https://drive.google.com/drive/folders/10CJNhgegAnu1Sp-DQBw11YK5qVZrV5Ys?usp=sharing

    1
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi zdr j,

    Happy to continue assisting. I see the cromwell-workflow-logs folder is empty. Were logs not generated from your recent run?

    I also see that your inputs.json file does not include disk_size and mem_size. I'm curious—are you following the the steps in this article? https://gatk.broadinstitute.org/hc/en-us/articles/360035530952--How-to-Execute-Workflows-from-the-gatk-workflows-Git-Organization

    I'm curious to know if it will provide you with some helpful insight for your case.

    Kind regards,

    Jason

    1
    Comment actions Permalink
  • Avatar
    zdr j

    Hi Jason Cerrato 

    Thank you for your reply. Yes, the log folder is empty after the execution. And for the json file I tried with different formats and none of them worked. I added another json to the shared google drive link below. And yes, I am following exactly the guidelines in the article you mentioned. 

    I am thinking maybe the problem is my input file? can you please share a tested sample bam file here so that I check with that. And I have also shared my input bam.file, in case you prefer to test it on your machine. 

    I am new at GATK, and I would appreciate if you could help me figure out what is this problem. I am frustrated with it not working.

     

    Thank you so much

    Best

     

    https://drive.google.com/drive/folders/1MtWN6o6ZIdLmy67xSXvP7fU3To1A6bGS

    1
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi zdr j,

    Happy to see if we can get you unstuck here.

    Can you revise your inputs.json file to have the full path for the bam rather than the relative path, similar to how its shown in the link? You can also test with a different bam by going to the bucket as described in the link: https://console.cloud.google.com/storage/browser/gatk-test-data/wgs_bam?organizationId=548622027621&project=broad-dsde-outreach&prefix=

    I recommend selecting a small bam, like the NA12878_24RG_small.hg38.bam.

    Please provide the full output of the run if it fails to work again. We may want to run through the steps from scratch if so to make sure we're not missing anything.

    Kind regards,

    Jason

    1
    Comment actions Permalink
  • Avatar
    zdr j

    Hi Jason Cerrato 

    Thanks for your guidance. I changed the path in the Json file as you mentioned, unfortunately the workflow still fails. Please see files in the folder named" error gatk-workflows new" in the following link for details. I also tried with NA12878_24RG_small.hg38.bam, and did not resolve the issue meaning that most probably my input was fine. It seems the problem is json format or cromwell? what do you think? And also is debugging the json related to this at all? should I run in debug mode?

    Thank you so much

    Best

    https://drive.google.com/drive/folders/1VrJWUNtOsWkyNibQ6hIahAfVy7GQafAN

    1
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi zdr j,

    I'm working through the steps myself and I've identified a few issues with the current documentation. I'll let you know once I get a working configuration on my end.

    Kind regards,

    Jason

    1
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi zdr j,

    I took a look at your files and I see that you actually did get a stderr, but it was written as stderr.background. See: https://drive.google.com/file/d/1f4N_EVNgAkWdaJmcHxYiWU5rWu_ln0ny/view?usp=sharing

    The contents are as follows:

    /gatk/my_data/cromwell-executions/ValidateBamsWf/d5757999-1bef-4ce4-a316-77b3ddcf84e5/call-ValidateBAM/shard-0/execution/script.submit: line 5: docker: command not found
    cat: /gatk/my_data/cromwell-executions/ValidateBamsWf/d5757999-1bef-4ce4-a316-77b3ddcf84e5/call-ValidateBAM/shard-0/execution/docker_cid: No such file or directory
    /gatk/my_data/cromwell-executions/ValidateBamsWf/d5757999-1bef-4ce4-a316-77b3ddcf84e5/call-ValidateBAM/shard-0/execution/script.submit: line 14: docker: command not found
    cat: /gatk/my_data/cromwell-executions/ValidateBamsWf/d5757999-1bef-4ce4-a316-77b3ddcf84e5/call-ValidateBAM/shard-0/execution/docker_cid: No such file or directory
    /gatk/my_data/cromwell-executions/ValidateBamsWf/d5757999-1bef-4ce4-a316-77b3ddcf84e5/call-ValidateBAM/shard-0/execution/script.submit: line 17: docker: command not found

    Can you confirm you have docker installed and try again?

    Kind regards,

    Jason

    1
    Comment actions Permalink
  • Avatar
    zdr j

    Hi  Jason Cerrato 

    Thank you so much for your time. Yes, I would appreciate of you could try the workflow on your machine. Honestly, I think there is something missing in the instructions for local machine in the paper. Yes, I do have docker latest version installed and that is how I pulled the GATK image. However, one problem might be the fact that docker cannot be run when inside a docker container; when in the terminal and inside home directory I ask for "docker --version", there is an answer :"Docker version 19.03.12, build 48a66213fe". But when I change directory to gatk image or to the gatk-workflows directory, it says : " docker command not found". But I heard it is not a good idea to install docker inside a docker image.  After the image is pulled by docker, do we still need  docker when running the workflow? I look forward to your reply on this matter. Thank you so much

    Best 

    zdrj

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi zdr j,

    I was able to successfully run this locally. You do not need to run this inside a docker to run it locally—the job itself will use the docker.

    Can you try this again running following the directions in the Running Workflows Locally section? For your inputs.json, please edit it to look like the following:

    {
    "ValidateBamsWf.bam_array": [
            "inputs/4.bam"]
    }

    You can use your original bam, or the sample bam. Try with the full path if this doesn't work.

    If you don't already have the cromwell jar on your local machine you can simply download it and move it to the desired directory if your computer doesn't have wget.

    I believe the Google Cloud version should also work without being inside the GATK docker, if that's what you prefer to do.

    Let me know if this works for you! 

    Kind regards,

    Jason

    1
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi zdr j,

    This should also work with the addition of

    "ValidateBamsWf.ValidateBAM.validation_mode": "SUMMARY"

    in the json if you are interested in using the SUMMARY mode.

    Jason

    1
    Comment actions Permalink
  • Avatar
    zdr j

    Thank you so much Jason Cerrato

     When I edited the json file, it says failed state again at the end. But this time in the output directory, the output file exists and says: 
    ## HISTOGRAM java.lang.String
    Error Type Count
    ERROR:MATE_NOT_FOUND 73

    Does this mean that the workflow worked and this is an error the workflow found in my bam file? :)))))) The output directory contains these files now:

    1. 4.validation_.txt
    2. docker_cid
    3. rc
    4. script
    5. script.background
    6. script.submit
    7. stderr
    8. stderr.background
    9. stdout
    10. stdout.background

    I shared stdout and stderr in the link below.

    Thank you so much 

    Best

     

    https://drive.google.com/drive/folders/1KnAKSJ2qyRG-w1xeH8KVZFPc_RGUjvAg

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi zdr j,

    Yes, the contents of the validation_.txt file would reveal any issues with the provided bam. You can confirm by testing with one of the known valid sample bams to see if you get a different result.

    Kind regards,

    Jason

    1
    Comment actions Permalink
  • Avatar
    zdr j

    Yes, it seems working although says failed state at the end. Thank you so much. I really really appreciate your time and grate help. 

    Best Regards,

    zdrj

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    I'm very glad to help! I'll reach out to our User Education team to make the appropriate changes in our documentation to make it clearer. Thank you for writing in and for helping us improve our documentation!

    Jason

    0
    Comment actions Permalink
  • Avatar
    José Enrique

    Hello guys, I have followed your debate because something similar is happening to me but I have not understood what was the final solution to the problem, could you please provide us with information on how you solved your problem,

    Greetings Jose

    0
    Comment actions Permalink
  • Avatar
    José Enrique

    Problem solved with the help of Beri Shifaw, thanks for everything.

    Kind regards,

    Jose

    0
    Comment actions Permalink
  • Avatar
    Tomasz .P

    Hello guys,

     Could you please provide us with information on how you solved your problem ? I have the same problem and this type of error:

    [2022-03-22 13:38:43,22] [error] WorkflowManagerActor Workflow 1754f8db-da10-440c-bd04-591832dd8e44 failed (during ExecutingWorkflowState): Job ValidateBamsWf.ValidateBAM:0:1 exited with return code -1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: /gatk/cromwell-executions/ValidateBamsWf/1754f8db-da10-440c-bd04-591832dd8e44/call-ValidateBAM/shard-0/execution/stderr.
     Could not retrieve content: /gatk/cromwell-executions/ValidateBamsWf/1754f8db-da10-440c-bd04-591832dd8e44/call-ValidateBAM/shard-0/execution/stderr

    Thanks in advance

    Tom

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Tomasz .P,

    Can you please check the contents of the "/gatk/cromwell-executions/ValidateBamsWf/1754f8db-da10-440c-bd04-591832dd8e44/call-ValidateBAM/shard-0/execution/stderr" file. This file should provide more information about why this task failed.

    Kind regards,

    Pamela

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk