a)How do I use GenomicsDBImport for many files and following system criteria
I need to use GenomicsDBImport on ~250 *.g.vcf.gz files which sum up together to ~5GB (each file ~10-20MB).
I can give this process: 50 cups and 400GB RAM on my linux cluster system.
What are the best parameters to run the GenomicsDBImport command for the above task?
In specific, how many intervals should I use (the -L param)?
Is it recommended to use batch ( the --batch-size param)? if so, how many?
If batch is used, what will be your recommendation for threading (--reader-threads) ?
Would you recommend to use import intervals in parallel (--max-num-intervals-to-import-in-parallel)? how many?
Please sign in to leave a comment.