Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK on local HPC infrastructure Follow


  • Avatar
    Pieter Spealman

    For testing the installation we need a inputs .json file - but where is the 'hello_gatk.inputs.json' file?

    I can not find it in the sample directory or in any of the associated links.

    Comment actions Permalink
  • Avatar
    Robert Bremel

    I have been using the GATK Docker image with Docker Desktop on a Windows workstation and it works fine.  As a bit of a displacement activity while I am dodging delta I am thinking about building a small HPC -- a Baby Beowulf cluster -- with a controller and 3 worker nodes using gen n-1 'experienced' servers.   There seems to be many available from a number of second hand dealers. 

    1) The Lustre documents says it is good for > 100 nodes which suggests that it is overkill for a Baby Beowulf.   As the nearly 600 pages of documents show, Lustre has all sorts of things needed for running large HPCs but its admin may be little more than an admin hassle to for a small system?  

    2) Each node will have fast CPUs and lots of memory sticks but what is the minimal amount of disk storage on each node?  OS + SLURM + docker images + genomic references +  N ( resident input and output files).   This suggests that a flash drive or NVMe might be all that is needed on each node?  The files being analyzed will be passing through and not stored so a huge amount of space is not needed on the little cluster.


    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk