Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenomicsDBImport takes to much time with 6075 g.vcf

0

2 comments

  • Avatar
    JooYoung Park

    sorry I am using GATK 4.2.0.0

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi JooYoung Park

    Apparently running GenomicsDBImport with 6000 files at once is not very optimal therefore you need to perform some additional optimization steps to get your samples running. It is not recommended to push all 6000 samples at once to a GenomicsDBImport as the tool has to keep a lot of information in memory therefore a low memory burden will kill your task or make your task hang at an impasse. 

    We recommend you to import you samples in smaller chunks and use multiple reader threads to enhance the speed of import. For your case importing about 25 - 50 samples (depending on the memory amount you have on your compute system) would make the process run faster and smoother. 

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk