Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

is it possible to use a gcs bucket directory as tmp directory?

0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi joy bordini, within our GATK code there is not something already set up to do this. Have you checked out our platform Terra? It uses google cloud buckets for storing data and provides an easy way to run GATK. You can find out more here and I can also connect you with the Terra Support team if you have questions.

    0
    Comment actions Permalink
  • Avatar
    joy bordini

    Dear Genevieve,

     

    Thank you for your reply. I tried to use Terra but it is quite uncomfortable to type all the samples when you have a lot of them. I integrated my gatk pipeline in Snakemake that makes me handle all the diversities of options and samples. I'll try to use it directly on google cloud kubernetes in order to speed the pipeline and use less ram. But of course if there is a way to maintain the module directory organization of snakemake (snakefile, rules/common.smk, reference.smk etc) on Terra I'll be happy to try.

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi joy bordini,

    My name is Jason and I'm one of the Terra Support Specialist. I'm not very familiar with the requirements for snakemake, but I may have some ideas for how you can leverage Terra for ease of running analyses.

    You mentioned that you are having discomfort typing all of the samples. If you workflow is written to handle a directory full of samples, you can consider creating a .tar.gz file that contains all of your samples and feeding it into the workflow, which will be written to extract the contents of the tarball.

    You can also consider setting up the data tables for ease of running analysis on multiple samples, or on a set of samples. You can find documentation for how to do this here: https://support.terra.bio/hc/en-us/articles/360047046131-Data-Tables-QuickStart-Part-1-Intro-to-data-tables-

    And here is a helpful video that explains how to make a table: https://www.youtube.com/watch?v=2MxSlKhIrFY

    If you have any questions, please let me know.

    Kind regards,

    Jason

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk