Does GATK4 BaseRecalibrator not understand relative paths?
I am using GATK 4.2.6.1. and for some reason GATK BaseRecalibrator is not getting my (relative) paths. It pastes them to the path all the way up to the home directory (as you can see at the end of the error log). MarkDuplicates does not have that same issue. I used both my command in snakemake and on its own. For some reason it keeps giving me te same error. I got it all to work, by removing all my relative paths and moving my files to where GATK is run. Am I doing something wrong or is this a (possible?) bug.
Command:
gatk --java-options '-Xms8G -Xmx8G -XX:ParallelGCThreads=8' BaseRecalibrator -I ../results/marked_dups/hDNA_H19_EF_S9_marked_dups.paired.bam -R ../resources/reference_genomes/hg38/no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -O ../results/BQSR/hDNA_H19_EF_S9_BQSR.grp --known-sites ../resources/variants/Homo_sapiens_assembly38.dbsnp138.vcf --known-sites ../resources/variants/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites ../resources/variants/Homo_sapiens_assembly38.known_indels.vcf.gz
Error log:
Using GATK jar /mnt/scratch_dir/deutekoe/projects/HP/WESHPipe/Pipe/workflow/.snakemake/conda/2bb82e671a60db588a594ec7c239c278_/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms8G -Xmx8G -XX:ParallelGCThreads=8 -jar /mnt/scratch_dir/deutekoe/projects/HP/WESHPipe/Pipe/workflow/.snakemake/conda/2bb82e671a60db588a594ec7c239c278_/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar BaseRecalibrator -I ../results/marked_dups/hDNA_H19_EF_S9_marked_dups.paired.bam -R ../resources/reference_genomes/hg38/no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -O ../results/BQSR/hDNA_H19_EF_S9_BQSR.grp --known-sites ../resources/variants/Homo_sapiens_assembly38.dbsnp138.vcf --known-sites ../resources/variants/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites ../resources/variants/Homo_sapiens_assembly38.known_indels.vcf.gz
13:00:50.898 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/scratch_dir/deutekoe/projects/HP/WESHPipe/Pipe/workflow/.snakemake/conda/2bb82e671a60db588a594ec7c239c278_/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:00:51.133 INFO BaseRecalibrator - ------------------------------------------------------------
13:00:51.134 INFO BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.6.1
13:00:51.134 INFO BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
13:00:51.134 INFO BaseRecalibrator - Executing as deutekoe@rivm-lsfsv-l11p.rivm.ssc-campus.nl on Linux v3.10.0-1160.66.1.el7.x86_64 amd64
13:00:51.134 INFO BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
13:00:51.134 INFO BaseRecalibrator - Start Date/Time: October 17, 2022 1:00:50 PM CEST
13:00:51.134 INFO BaseRecalibrator - ------------------------------------------------------------
13:00:51.134 INFO BaseRecalibrator - ------------------------------------------------------------
13:00:51.135 INFO BaseRecalibrator - HTSJDK Version: 2.24.1
13:00:51.135 INFO BaseRecalibrator - Picard Version: 2.27.1
13:00:51.135 INFO BaseRecalibrator - Built for Spark Version: 2.4.5
13:00:51.135 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:00:51.135 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:00:51.135 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:00:51.135 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:00:51.135 INFO BaseRecalibrator - Deflater: IntelDeflater
13:00:51.136 INFO BaseRecalibrator - Inflater: IntelInflater
13:00:51.136 INFO BaseRecalibrator - GCS max retries/reopens: 20
13:00:51.136 INFO BaseRecalibrator - Requester pays: disabled
13:00:51.136 INFO BaseRecalibrator - Initializing engine
13:00:51.142 INFO BaseRecalibrator - Shutting down engine
[October 17, 2022 1:00:51 PM CEST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=8232370176
***********************************************************************
A USER ERROR has occurred: Fasta dict file file:///mnt/scratch_dir/deutekoe/projects/HP/WESHPipe/Pipe/workflow/../resources/reference_genomes/hg38/no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.dict for reference file:///mnt/scratch_dir/deutekoe/projects/HP/WESHPipe/Pipe/workflow/../resources/reference_genomes/hg38/no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it.
***********************************************************************
-
I had a similar?? issue crop up with BuildBamIndex in the docker 4.3.0.0 version on Windows 10 with WSL2 and DockerDesktop
BuildBamIndex is supposed to only need a --INPUT argument. It runs and the log says it wrote the .bai file but there was no file in the proper directory or anywhere they I could locate.
However, if I specified the --OUTPUT the .bai file was indeed written to the directory
But the weird thing was it only seems to be the case when the the command appears in a string of commands. If you open a bash window and run it everything is okay?
-
Hi Robert,
When I run the command in the terminal with the relative path "../some_dir/path_to_file" (with the "../"), it does the same thing. It seems to append the relative path, as a string, to the "current" path. The way I fixed it, was to just not use "../" and move the directory.
-
Hi Eva (Evander) and Robert Bremel,
Thank you both for writing to the GATK forum. I hope that we can help you both sort these issues out.Firstly, Robert Bremel , could you please create a separate post for the issue you are encountering? You are using a different tool, so we need a separate post. Please include your GATK version, complete command line, and entire program log in your post.
Eva (Evander), I have sent your inquiry to our developers and am awaiting a response. I should be back with some answers and the next steps shortly.
Thank you both for being valued members of the GATK community! I am looking forward to hearing back from you both.
Best,
Anthony -
Hi Eva (Evander),
Thank you for your patience! It appears the program is generating the error because it cannot find the sequence dictionary.
Could you please check if all necessary files are present, i.e. the FASTA file
.fna
, the FASTA index file.fna.fai
and the sequence dictionary.dict
?Best,
Anthony
-
Dear Anthony, sorry for the late reply. Thank you for answering.
Yes, everything is there, and works if I do not use the relative paths pointing to the files. In other words, when I run the command in the directory itself (as I explained in the previous comment).So the relative path is appended to the full path all the way to my home directory (and hence, cannot find the file). From the log, see the /../ between > file:///mnt/scratch_dir/deutekoe/projects/HP/WESHPipe/Pipe/workflow/../resources/reference_genomes/hg38/no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.dict
Maybe this is a feature and not a bug :-p?
-
Hi Eva (Evander),
Thank you for your response! For the sake of clarity, I re-confirmed this with our developers.
This is indeed how relative paths work; by appending it to the current working directory, i.e., /a/b/c/../c is equal to /a/b/c. Note that appending it to the current working directory makes it relative to that directory.
I hope this helps! Please let me know if this has cleared our doubts. If not, please let me know, and we can dive deeper. Thank you again for writing to the GATK forum.
Best,
Anthony
-
Hi Anthony, thanks for asking the developers. If that were true (although I do not think that it is practically…I tried), it still does not behave like this when running the tool. Because in the case I showed, it is more like a/b/c/../d, while it should be a/b/d if the tool is run in c (workflow in my case > see log). So I run my pipeline in directory c (workflow), want use files in ../d (recourses, one directory up and same level as c), but the tool sticks it to where the tool is running, in a/b/c, and becomes a/b/c/../d. But that does not exist. Hope that makes any sense… I fixed it by moving directory d into c. So then you indeed get that a/b/c/d. So practically, I would say when joining the paths given by the user, the “../“ should be removed, or tell users to only use the tool with files in subdirectory or directory that the tool is run in? Hope I am making sense, thank you for your time!
-
Hi Eva (Evander). I'm a GATK developer, and was just looking at this. I would expect those relative paths to work, and indeed I tried a test locally with relative paths and it seems to work. Without a stack trace I can't be sure, but it looks to me like GATK is able to find not only the reference file, but the associated index as well (the code locates the index first, followed by the sequence dictionary). So it seems to fail on only the dictionary.
It may be something specific to dictionary file, or the .fna suffix (my test was with a .fasta), or possibly the specific file system that you're running on. I would suggest trying the following:
Run GATK with the stack trace option `--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' ` on, and then reproduce the error and post the output here. Also, it would be helpful to include directory listing of the reference and associated files (index and dictionary), i.e., from the working directory where you're running GATK, do `ls -l ../resources/reference_genomes/hg38/no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.*`and include the output here.
-
Robert Bremel The BuildBamIndex issue you mentioned sounds very much like a separate (known) issue that is specific to `BuildBamIndex` (see https://github.com/broadinstitute/picard/issues/1827).
-
Hi Eva (Evander),
We haven't heard from you in a while so we're going to close out this ticket. If you still require assistance, simply respond to this email and we'll be happy to pick up where we left off!
Kind regards,
Anthony
Please sign in to leave a comment.
10 comments