a)How do I work with large data set with: GenomicsDBImport
Please correct me if I am wrong, but following the tutorials, it seems there are two major options to work with large dataset with GenomicsDBImport:
1. Divide the data to intervals and make intervals work in parallel, using these two params:
a. --max-num-intervals-to-import-in-parallel for num' of parallel intervals.
b. -L for num of intervals. (for example -L 20 was given in https://gatk.broadinstitute.org/hc/en-us/articles/360036883491-GenomicsDBImport )
2. Divide the data to batches and make them work in parallels using these two params:
a. --batch-size (if >=100 use some flag)- number of samples for which readers are open at once
b. --reader-threads- num of simultaneous threads to use when opening VCFs in batches
* Is that correct?
* Is it allowed to combine the two possibilities in the same run?
*What exactly does -L 20 mean? does it mean the genome is divided into 20 fragments?
Please sign in to leave a comment.