Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

BwaMemIndexImageCreator:: Expected running time for 1 TB db

Answered
0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Thanks Renald James Legaspi for the update. It does look like it's still actively running so let it keep running for now and I'll reach out to Mark Walker regarding the 64 GB db. 

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Renald James Legaspi,

    The pre-built database is within the pathseq resource bundle hosted on google cloud platform: gs://gatk-best-practices/pathseq/resources/

    A 1 TB database can easily take longer than a week, the one we have took a few days. You are using a lot of swap memory so the process could have slowed down. You might consider decreasing the database size because 1TB is very large and may not be practical for running. The readme in the pathseq resource bundle bucket describes some strategies we used to get the database size down.

    Let me know if you have any other questions.

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I'm glad the resources are helpful! Let us know if you have other questions.

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Renald James Legaspi,

    Do you think that BwaMemIndexImageCreator is still running or is it hung at a certain position?

    We don't have any benchmark data for this tool.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Renald James Legaspi

    Hi Genevieve,

    I believe that it is still running the  process for
    '[bwa_index] Construct BWT for the packed sequence...'


    And the memory and cpu are being engaged as depicted in ff photo


    Oh thank you for informing, i will just perform the benchmarking.

    Anyway, I've seen from this post here that the largest microbe db you have built is around 64 GB for the PathSeq Pipeline. Is this publicly available? Or by any chance, I can look into this.

    Thank you!

     

     

    0
    Comment actions Permalink
  • Avatar
    Renald James Legaspi

    Hi Genevieve,

    We have decided to terminate run and just proceed to db truncation just like what you have suggested. 

    Thank you for the resource link. It is of great help to us.

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk