Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Does GATK4 BaseRecalibrator not understand relative paths?

0

10 comments

  • Avatar
    Robert Bremel

    I had a similar?? issue crop up with BuildBamIndex in the docker 4.3.0.0 version on Windows 10 with WSL2 and  DockerDesktop

     BuildBamIndex is supposed to only need a --INPUT argument.  It runs and the log says it wrote the .bai file  but there was no file in the proper directory or anywhere they I could locate.

    However, if I specified the --OUTPUT the .bai file was indeed written to the directory

    But the weird thing was it only seems to be the case when the the command appears in a string of commands.  If you open a bash window and run it everything is okay?

     

    0
    Comment actions Permalink
  • Avatar
    Eva (Evander)

    Hi Robert, 

    When I run the command in the terminal with the relative path "../some_dir/path_to_file" (with the "../"), it does the same thing. It seems to append the relative path, as a string, to the "current" path. The way I fixed it, was to just not use "../" and move the directory. 

    0
    Comment actions Permalink
  • Avatar
    Anthony Dias-Ciarla

    Hi Eva (Evander) and Robert Bremel,

    Thank you both for writing to the GATK forum. I hope that we can help you both sort these issues out.

    Firstly, Robert Bremel , could you please create a separate post for the issue you are encountering? You are using a different tool, so we need a separate post. Please include your GATK version, complete command line, and entire program log in your post.

    Eva (Evander), I have sent your inquiry to our developers and am awaiting a response. I should be back with some answers and the next steps shortly.

    Thank you both for being valued members of the GATK community! I am looking forward to hearing back from you both.

    Best,
    Anthony

    0
    Comment actions Permalink
  • Avatar
    Anthony Dias-Ciarla

    Hi Eva (Evander),

    Thank you for your patience! It appears the program is generating the error because it cannot find the sequence dictionary.

    Could you please check if all necessary files are present, i.e. the FASTA file .fna, the FASTA index file .fna.fai and the sequence dictionary .dict?

    Best,

    Anthony

    0
    Comment actions Permalink
  • Avatar
    Eva (Evander)

    Dear Anthony, sorry for the late reply. Thank you for answering.
    Yes, everything is there, and works if I do not use the relative paths pointing to the files. In other words, when I run the command in the directory itself (as I explained in the previous comment).

    So the relative path is appended to the full path all the way to my home directory (and hence, cannot find the file). From the log, see the /../ between > file:///mnt/scratch_dir/deutekoe/projects/HP/WESHPipe/Pipe/workflow/../resources/reference_genomes/hg38/no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.dict

    Maybe this is a feature and not a bug :-p? 

    1
    Comment actions Permalink
  • Avatar
    Anthony Dias-Ciarla

    Hi Eva (Evander),

    Thank you for your response! For the sake of clarity, I re-confirmed this with our developers.

    This is indeed how relative paths work; by appending it to the current working directory, i.e., /a/b/c/../c is equal to /a/b/c. Note that appending it to the current working directory makes it relative to that directory. 

    I hope this helps! Please let me know if this has cleared our doubts. If not, please let me know, and we can dive deeper. Thank you again for writing to the GATK forum.

    Best,

    Anthony

    0
    Comment actions Permalink
  • Avatar
    Eva (Evander)

    Hi Anthony, thanks for asking the developers. If that were true (although I do not think that it is practically…I tried), it still does not behave like this when running the tool. Because in the case I showed, it is more like a/b/c/../d, while it should be a/b/d if the tool is run in c (workflow in my case > see log). So I run my pipeline in directory c (workflow), want use files in ../d (recourses, one directory up and same level as c), but the tool sticks it to where the tool is running, in a/b/c, and becomes a/b/c/../d. But that does not exist. Hope that makes any sense… I fixed it by moving directory d into c. So then you indeed get that a/b/c/d. So practically, I would say when joining the paths given by the user, the “../“ should be removed, or tell users to only use the tool with files in subdirectory or directory that the tool is run in? Hope I am making sense, thank you for your time!

    0
    Comment actions Permalink
  • Avatar
    Chris Norman

    Hi Eva (Evander). I'm a GATK developer, and was just looking at this. I would expect those relative paths to work, and indeed I tried a test locally with relative paths and it seems to work. Without a stack trace I can't be sure, but it looks to me like GATK is able to find not only the reference file, but the associated index as well (the code locates the index first, followed by the sequence dictionary). So it seems to fail on only the dictionary.

    It may be something specific to dictionary file, or the .fna suffix (my test was with a .fasta), or possibly the specific file system that you're running on. I would suggest trying the following:

    Run GATK with the stack trace option `--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' ` on, and then reproduce the error and post the output here. Also, it would be helpful to include directory listing of the reference and associated files (index and dictionary), i.e., from the working directory where you're running GATK, do `ls -l ../resources/reference_genomes/hg38/no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.*`and include the output here.

    0
    Comment actions Permalink
  • Avatar
    Chris Norman

    Robert Bremel The BuildBamIndex issue you mentioned sounds very much like a separate (known) issue that is specific to `BuildBamIndex` (see https://github.com/broadinstitute/picard/issues/1827).

    0
    Comment actions Permalink
  • Avatar
    Anthony Dias-Ciarla

    Hi Eva (Evander),

    We haven't heard from you in a while so we're going to close out this ticket. If you still require assistance, simply respond to this email and we'll be happy to pick up where we left off!

    Kind regards,

    Anthony​

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk