Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Append GenomicsDBimport

Answered
0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi wbsimey,

    I'm so sorry to hear about your issue when the server went down, that's such a bummer! There are two different ways you can use GenomicsDB without uploading the entire workspace at once.

    1. You can add samples incrementally to GenomicsDBImport with the command --genomicsdb-update-workspace-path. All samples must have the same intervals. Samples cannot be added at to the workspace the same time, so it is not possible to parallelize this method. We recommend always backing up the workspace before import in case there are issues like a server going down.
    2. You can make multiple GenomicsDB workspaces - one per interval, or any other way to break them up. Each workspace should contain all your samples. You can then run multiple GenomicsDBImport commands at once to parallelize the analysis. Following GenotypeGVCFs, the VCFs can be combined with MergeVCFs. (This is the method the Broad runs for our production pipelines).

    Hope this helps!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    wbsimey

    Thank you for the quick response Genevieve Brandt. I will create separate databases for the remaining chromosomes.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    You're welcome!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk