GATK ApplyBQSR failing
Hi, I am new to bioinformatics and am trying to get a vcf from a FASTQ for my training. I am trying to run base recalibration on my bam file using "Homo_sapiens_assembly38.dbsnp138.vcf" but it seems to be having an error when creating an output file? Not sure what I'm doing wrong.
a) GATK version used:
b) Exact command used:
gatk ApplyBQSR \
-I /data/bam_files/Test01_markdup.bam \
-O data/bam_files/Test01_recalibrated.bam \
-R /data/reference_genome/hg38.fa \
--bqsr-recal-file /data/bam_files/recal_data.table
c) Entire program log:
16:37:59.255 INFO NativeLibraryLoader - Loading from jar:file:/gatk/gatk-package-!/com/intel/gkl/native/
16:37:59.334 INFO ApplyBQSR - ------------------------------------------------------------
16:37:59.336 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.5.0.0
16:37:59.336 INFO ApplyBQSR - For support and documentation go to
16:37:59.336 INFO ApplyBQSR - Executing as ?@459ed45575d8 on Linux v6.8.0-48-generic amd64
16:37:59.336 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v17.0.9+9-Ubuntu-122.04
16:37:59.336 INFO ApplyBQSR - Start Date/Time: November 27, 2024 at 4:37:59 PM GMT
16:37:59.336 INFO ApplyBQSR - ------------------------------------------------------------
16:37:59.336 INFO ApplyBQSR - ------------------------------------------------------------
16:37:59.337 INFO ApplyBQSR - HTSJDK Version: 4.1.0
16:37:59.337 INFO ApplyBQSR - Picard Version: 3.1.1
16:37:59.337 INFO ApplyBQSR - Built for Spark Version: 3.5.0
16:37:59.337 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:37:59.337 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:37:59.337 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:37:59.338 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:37:59.338 INFO ApplyBQSR - Deflater: IntelDeflater
16:37:59.338 INFO ApplyBQSR - Inflater: IntelInflater
16:37:59.338 INFO ApplyBQSR - GCS max retries/reopens: 20
16:37:59.338 INFO ApplyBQSR - Requester pays: disabled
16:37:59.339 INFO ApplyBQSR - Initializing engine
16:37:59.420 INFO ApplyBQSR - Done initializing engine
16:37:59.445 INFO ApplyBQSR - Shutting down engine
[November 27, 2024 at 4:37:59 PM GMT] done. Elapsed time: 0.00 minutes.
htsjdk.samtools.util.RuntimeIOException: Error opening file: file:///gatk/data/bam_files/Test01_recalibrated.bam
at htsjdk.samtools.SAMFileWriterFactory.makeBAMWriter(
at htsjdk.samtools.SAMFileWriterFactory.makeBAMWriter(
at htsjdk.samtools.SAMFileWriterFactory.makeSAMOrBAMWriter(
at htsjdk.samtools.SAMFileWriterFactory.makeWriter(
at org.broadinstitute.hellbender.engine.GATKTool.createSAMWriter(
at org.broadinstitute.hellbender.engine.GATKTool.doWork(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(
at org.broadinstitute.hellbender.Main.runCommandLineProgram(
at org.broadinstitute.hellbender.Main.mainEntry(
at org.broadinstitute.hellbender.Main.main(
Caused by: java.nio.file.NoSuchFileException: data/bam_files/Test01_recalibrated.bam
at java.base/sun.nio.fs.UnixException.translateToIOException(
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(
at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(
at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(
at java.base/java.nio.file.Files.newOutputStream(
at htsjdk.samtools.SAMFileWriterFactory.makeBAMWriter(
... 14 more
Using GATK jar /gatk/gatk-package-
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package- ApplyBQSR -I /data/bam_files/Test01_markdup.bam --bqsr-recal-file /data/bam_files/recal_data.table -O data/bam_files/Test01_recalibrated.bam
Traceback (most recent call last):
File "/home/user/modules/", line 29, in <module>
run_docker_subprocess(path_to_data, image, GATK_command)
File "/home/user/modules/", line 27, in run_docker_subprocess
process =, shell=True, check=True)
File "/usr/lib/python3.11/", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'docker run --rm -v /home/user/files/:/data --user 1012:1013 broadinstitute/gatk: bash -c 'gatk ApplyBQSR -I /data/bam_files/Test01_markdup.bam --bqsr-recal-file /data/bam_files/recal_data.table -O data/bam_files/Test01_recalibrated.bam'' returned non-zero exit status 3.
Just for reference, this is what I have run so far:
bwa mem (fastq > sam)
samtools sort -n (sam > bam)
samtools fixmate -m (bam > bam)
samtools sort -o (bam > bam)
samtools markdup (bam > bam)
Hi Kavi Jeshram
Looking at your command line your output contains a local folder destination whereas the input file is in a root location /data. Can you check to see if your destination folder is set properly?
Ah, it was being run in a docker container so luckily that did not matter but good spot
Found the issue though, I previously moved my hg38.dict file but inside there were relative paths
I changed them and now ApplyBQSR can find my hg38.fa :) thanks for the help though
