I have a batch with 219 samples /human exomes), on which I ran genomicsDBimport by chromosomes without a problem. I am now running GenotypeGVCF also by chromosomes; for chr 1-15, the jobs completed correctly, but for the rest (chr 16-22, X, and Y) I am having problems with time and memory. I have tried to run it on smaller intervals, with more memory and time, but the jobs don't complete either.
gatk --java-options "-Xmx100g" GenotypeGVCFs -R .../ucsc.hg19.fasta -V gendb://gDB16 -L chr16:3100000-5500000 -O .../c16.vcf.gz
gatk version: gatk/184.108.40.206
Is it normal that the last chrs need more time and memory?
I did read the post https://gatk.broadinstitute.org/hc/en-us/community/posts/360063088471-Speeding-up-GenotypeGVCFS-GATK4 , and I understand that the "GenomicDB" has to be loaded completly first and then the option -L is applyed. But, is there any way to optimize this step?
Any help will be appreciated!
Please sign in to leave a comment.