Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

(How to) Run GATK in a Docker container Follow

8 comments

  • Avatar
    vivekruhela

    Hi,

     

    Thanks for this tutorial. I have successfully pulled the gatk docker image as described above but when I tried this command 

    docker run -v ~/my_project:/gatk/my_data -it broadinstitute/gatk:4.1.3.0

    but I am getting an error of invalid mount specifications. '~/bam_files: gatk/my_data' invalid mount config for type "bind" : invalid mount path. mount path must be absolute. Kindly suggest.

    1
    Comment actions Permalink
  • Avatar
    Humphrey Gardner

    did you already create the "my_project" folder in your home directory?

    0
    Comment actions Permalink
  • Avatar
    vivekruhela

    Humphrey Gardner

    I tried it few months before. As far as I remember, I have already created the directory `bam_files` in my home directory and then I was trying to mount it to docker. I am not using docker now. I am using the java version of GATK in my work. Thanks.

    0
    Comment actions Permalink
  • Avatar
    François Kroll

    Incredibly helpful – thank you

    0
    Comment actions Permalink
  • Avatar
    Guy Horev

    That way one can work with GATK in interactive mode. To run gatk in a script I had to run docker in detached mode (adding -d flag). 

    0
    Comment actions Permalink
  • Avatar
    Ricardo Chinchilla

    How I can create the dict file inside Docker?. I try but I have this error: unable to access jarfile picard.jar

     

    0
    Comment actions Permalink
  • Avatar
    Daniel Crookston

    This is great, but once I've created the container instance with a mounted volume, now what?  Do I have to run that command again every time I want to get into GATK, or will doing that create a new instance of the container every time?

    I realize this is probably a question about Docker, not about the GATK, but a hint (probably not a direct link, since Docker will change their end without warning) about where to look for more information would be great.

    0
    Comment actions Permalink
  • Avatar
    mina ming

    Hi

    I am certain I have bam file in path but I get error

    (gatk) root@34684eaa046e:/gatk/data/Continuum/WES/vcf# java -d64 -XX:+UseSerialGC -Xmx3G -jar /gatk/gatk.jar CollectSequencingArtifactMetrics -I NG-27280_CLTSS_LTS_001A_lib506241_7636_2_MarkedDup.bam -O NG-27280_CLTSS_LTS_001A_lib506241_7636_2_MarkedDup --FILE_EXTENSION .txt -R GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
    12:49:41.698 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Aug 10 12:49:41 UTC 2023] CollectSequencingArtifactMetrics  --FILE_EXTENSION .txt --INPUT NG-27280_CLTSS_LTS_001A_lib506241_7636_2_MarkedDup.bam --OUTPUT NG-27280_CLTSS_LTS_001A_lib506241_7636_2_MarkedDup --REFERENCE_SEQUENCE GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz  --MINIMUM_QUALITY_SCORE 20 --MINIMUM_MAPPING_QUALITY 30 --MINIMUM_INSERT_SIZE 60 --MAXIMUM_INSERT_SIZE 600 --INCLUDE_UNPAIRED false --INCLUDE_DUPLICATES false --INCLUDE_NON_PF_READS false --TANDEM_READS false --USE_OQ true --CONTEXT_SIZE 1 --ASSUME_SORTED true --STOP_AFTER 0 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    Aug 10, 2023 12:49:43 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    [Thu Aug 10 12:49:43 UTC 2023] Executing as root@34684eaa046e on Linux 4.15.0-208-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.3.0
    [Thu Aug 10 12:49:43 UTC 2023] picard.analysis.artifacts.CollectSequencingArtifactMetrics done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=2076049408
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    htsjdk.samtools.SAMException: Cannot read non-existent file: file:///gatk/data/Continuum/WES/vcf/NG-27280_CLTSS_LTS_001A_lib506241_7636_2_MarkedDup.bam
        at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:483)
        at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:470)
        at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:95)
        at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:84)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
    (gatk) root@34684eaa046e:/gatk/data/Continuum/WES/vcf# ls
    GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
    (gatk) root@34684eaa046e:/gatk/data/Continuum/WES/vcf# 

     

    Please help me

     

    Thanks

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk