Until now, I have stuck with GATK 3.8, as I haven't found a good strategy to build a GenomicsDB for use with GATK 4.
I have a substantial number of germline GVCF files which work fine with GATK 3.8. I would like to switch to GATK 4, but how best to build the GenomicsDB. The data are cattle data so there are 29 autosomes, X, Y, and MT.
GATK should be able to use multiple threads, but the output says that threading is disabled when using multiple intervals.
I'm currently running a GenomicsDBImport job with a -L option pointing to a list of all the chromosomes and a list of samples containing all the GVCFs. It has run for quite some time, and hasn't even finished importing the first sample.
Is the idea that one should create a separate GenomicsDB for each chromosome?
Running it the way I do now (i.e., all chromosomes and all samples), would seem to require a very(!) long time. I have gone through the documentation and not come across a discussion of efficient strategies.
Can anyone provide advice?
Please sign in to leave a comment.