GenomicsDBImport: has too many genotypes in the combined VCF record
I'm joint calling ~10K samples, where I expected there to be MANY possible alleles/genotypes. How do I increase the # of alleles supported by GenomicsDBImport?
Perhaps this one?
--genomicsdb-segment-size?
a) GATK version used
The Genome Analysis Toolkit (GATK) v4.1.8.1
HTSJDK Version: 2.23.0
Picard Version: 2.22.8
b) Exact GATK commands used
gatk \
--java-options "-XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx16g" \
--spark-runner LOCAL \
GenomicsDBImport \
-L <redacted> \
--genomicsdb-workspace-path /path/to/db \
--arguments_file args.txt;
Sample/Callset <redacted>( TileDB row idx 6055) at Chromosome chr1 position <redacted> (TileDB column 26661119) has too many genotypes in the combined VCF record : 1081 : current limit : 1024 (num_alleles, ploidy) = (46, 2). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.
-
Any way to override this in the config?
https://github.com/GenomicsDB/GenomicsDB/blob/2225b6e19d18cbcf98be52fac4529d9156ba7948/src/main/cpp/src/config/genomicsdb_config_base.cc#L59 -
Hi Nils Homer, I have written a response on your other post that pertains here as well: https://gatk.broadinstitute.org/hc/en-us/community/posts/360072168712-GenomicsDBImport-Attempting-to-genotype-more-than-50-alleles?page=1#community_comment_360012343671
-
Hello Genevieve Brandt (she/her). I am having the same problem as Nils in many sites. I am working with merged samples (samtools merge) product of 2 RADseq of the same libraries for 352 samples of tetraploids. This because we needed to increase the read depth.
Sample/Callset 016( TileDB row idx 0) at Chromosome Chromosome_1 position 5490482 (TileDB column 1610426421) has too many genotypes in the combined VCF record : 1820 : current limit : 1024 (num_alleles, ploidy) = (13, 4). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.
Sample/Callset 018( TileDB row idx 1) at Chromosome Chromosome_1 position 5490482 (TileDB column 1610426421) has too many genotypes in the combined VCF record : 1820 : current limit : 1024 (num_alleles, ploidy) = (13, 4). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.
Sample/Callset 021( TileDB row idx 2) at Chromosome Chromosome_1 position 5490482 (TileDB column 1610426421) has too many genotypes in the combined VCF record : 1820 : current limit : 1024 (num_alleles, ploidy) = (13, 4). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.
Sample/Callset 022( TileDB row idx 3) at Chromosome Chromosome_1 position 5490482 (TileDB column 1610426421) has too many genotypes in the combined VCF record : 1820 : current limit : 1024 (num_alleles, ploidy) = (13, 4). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.I saw your answer here https://gatk.broadinstitute.org/hc/en-us/community/posts/360072168712-GenomicsDBImport-Attempting-to-genotype-more-than-50-alleles?page=1#community_comment_360012343671, but I still don't understand the error. I even have 7315 genotypes from 4 possible genotypes as tetraploids from (19,4)? And following this answer, https://gatk.broadinstitute.org/hc/en-us/community/posts/10120780187291-multiple-errors-and-warnings-with-GenotypeGVCFs, I don't understand where the 7315 comes from
Thanks!
-
Hi Paula Andrea Espitia Buitrago
The number is from a mendelian calculation for g genotypes at m-ploidy for a single unlinked loci which is simply calculated as
(g+m-1)! / ((g-1)!*m!)
https://www.ias.ac.in/article/fulltext/jgen/049/02/0117-0119
I hope this helps.
Please sign in to leave a comment.
4 comments