GenomicsDBImport: Attempting to genotype more than 50 alleles
I'm joint calling ~10K samples, where I expected there to be MANY possible alleles/genotypes. How do I increase the # of alleles supported by GenomicsDBImport?
Perhaps this one?
--genomicsdb-segment-size?
a) GATK version used
The Genome Analysis Toolkit (GATK) v4.1.8.1
HTSJDK Version: 2.23.0
Picard Version: 2.22.8
b) Exact GATK commands used
gatk \
--java-options "-XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx16g" \
--spark-runner LOCAL \
GenomicsDBImport \
-L <redacted> \
--genomicsdb-workspace-path /path/to/db \
--arguments_file args.txt;
-
It looks like HTSJDK has the limit:
Here's the place GATK checks:
But someone thought it might be a command line argument in the future, but didn't wire it to the above the check above:
https://github.com/broadinstitute/gatk/blob/bc0994c180312cdca7afbe45b410b2c6fc312043/src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBArgumentCollection.java#L18-L22 -
Hi Nils Homer, I have looked into both of your requests, and unfortunately right now it is not possible to increase the number of alleles supported in GenomicsDB import. One option you might try is to look into the joint-calling WDL https://github.com/gatk-workflows. Using the gnarly genotyper (not genotype GVCFs), you will be able to run your analysis with more alleles. For your current workflow, there is not a good workaround at this point, since this limit involves more than just GATK.
Please sign in to leave a comment.
2 comments