Recommended parameters for GenomicsDBImport
Answered
a)How do I use GenomicsDBImport for many files and following system criteria
Hi,
I need to use GenomicsDBImport on ~250 *.g.vcf.gz files which sum up together to ~5GB (each file ~10-20MB).
I can give this process: 50 cups and 400GB RAM on my linux cluster system.
What are the best parameters to run the GenomicsDBImport command for the above task?
In specific, how many intervals should I use (the -L param)?
Is it recommended to use batch ( the --batch-size param)? if so, how many?
If batch is used, what will be your recommendation for threading (--reader-threads) ?
Would you recommend to use import intervals in parallel (--max-num-intervals-to-import-in-parallel)? how many?
Thank you,
Arik
-
Hi Arye Harel,
Thank you for your question! This question falls outside of the scope of GATK Support. (See our support policy for more details). However, we encourage you to keep posting questions because they help us improve our documentation and build resources. In addition, if you know the answer to other questions outside of our GATK support team scope, please help out other users! And other users feel free to chime and discuss here.
Your question may be already answered in our extensive documentation and forum. Please see these resources for more information:
Please sign in to leave a comment.
1 comment