Java large integer formatting error in GnarlyGenotyper
Hi GATK team,
I'm in the process of using the biggest practices to generate a joint genotyped SNV/indel callset on ~55k human germline whole-genomes.
I've been able to successfully run the Gnarly genotyping pipeline on several chromosomes, but am encountering a persistent error with a specific position on chr21 that seems to be related to a Java string-to-integer formatting error in GnarlyGenotyper.
I've tried debugging this by checking the genomicsdb.tar input file, but didn't spot anything that was obviously incorrect.
I'd be really grateful if you might be able to provide some pointers on debugging this issue!
Here are the technical details:
- GATK version: 4.6.1.0, executed via Cromwell using Docker (`us.gcr.io/broad-dsde-methods/gatk-sv/sv-base-mini:2024-10-25-v0.29-beta-5ea22a52`)
- Exact command used:
gatk --java-options "-Xms8000m -Xmx25000m" \
GnarlyGenotyper \
-R /cromwell_root/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta \
-O dfci-g2c.v1.chr21.patch.5.0.vcf.gz \
-D gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf \
--only-output-calls-starting-in-intervals \
-V gendb://$WORKSPACE \
-L gs://fc-secure-29075a92-7950-4778-aa20-874a75cd37bf/cromwell/execution/GnarlyJointGenotypingPart1/6f5bf4f3-cf87-40ff-bc8c-46ed85202654/call-G2CSplitIntervals/glob-5db530f76090fc6b87003e11af3127d1/000006-scattered.interval_list \
-stand-call-conf 10 \
--max-alternate-alleles 5 \
--merge-input-intervals
- Entire program log:
Using GATK jar /gatk/gatk-package-4.6.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms8000m -Xmx25000m -jar /gatk/gatk-package-4.6.1.0-local.jar GnarlyGenotyper -R/cromwell_root/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -O dfci-g2c.v1.chr21.patch.5.0.vcf.gz -D gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf --only-output-calls-starting-in-intervals -V gendb://genomicsdb -L gs://fc-secure-29075a92-7950-4778-aa20-874a75cd37bf/cromwell/execution/GnarlyJointGenotypingPart1/6f5bf4f3-cf87-40ff-bc8c-46ed85202654/call-G2CSplitIntervals/glob-5db530f76090fc6b87003e11af3127d1/000006-scattered.interval_list -stand-call-conf 10 --max-alternate-alleles 5 --merge-input-intervals
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.06JHXL
23:55:20.054 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.6.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
SLF4J(W): Class path contains multiple SLF4J providers.
SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@27898e13]
SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@4f5f474c]
SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@27898e13]
23:55:20.100 INFO GnarlyGenotyper - ------------------------------------------------------------
23:55:20.103 INFO GnarlyGenotyper - The Genome Analysis Toolkit (GATK) v4.6.1.0
23:55:20.103 INFO GnarlyGenotyper - For support and documentation go to https://software.broadinstitute.org/gatk/
23:55:20.103 INFO GnarlyGenotyper - Executing as root@dff50889227e on Linux v6.6.72+ amd64
23:55:20.103 INFO GnarlyGenotyper - Java runtime: OpenJDK 64-Bit Server VM v17.0.12+7-Ubuntu-1ubuntu222.04
23:55:20.104 INFO GnarlyGenotyper - Start Date/Time: March 19, 2025 at 11:55:19 PM GMT
23:55:20.104 INFO GnarlyGenotyper - ------------------------------------------------------------
23:55:20.104 INFO GnarlyGenotyper - ------------------------------------------------------------
23:55:20.105 INFO GnarlyGenotyper - HTSJDK Version: 4.1.3
23:55:20.105 INFO GnarlyGenotyper - Picard Version: 3.3.0
23:55:20.106 INFO GnarlyGenotyper - Built for Spark Version: 3.5.0
23:55:20.109 INFO GnarlyGenotyper - HTSJDK Defaults.COMPRESSION_LEVEL : 2
23:55:20.109 INFO GnarlyGenotyper - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
23:55:20.110 INFO GnarlyGenotyper - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
23:55:20.110 INFO GnarlyGenotyper - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
23:55:20.110 INFO GnarlyGenotyper - Deflater: IntelDeflater
23:55:20.110 INFO GnarlyGenotyper - Inflater: IntelInflater
23:55:20.111 INFO GnarlyGenotyper - GCS max retries/reopens: 20
23:55:20.111 INFO GnarlyGenotyper - Requester pays: disabled
23:55:20.111 WARN GnarlyGenotyper -
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: GnarlyGenotyper is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
23:55:20.112 INFO GnarlyGenotyper - Initializing engine
23:55:22.078 INFO FeatureManager - Using codec VCFCodec to read file gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
23:55:23.676 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.5.4-764a03c
23:55:24.260 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AC - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.260 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AF - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.260 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AN - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.260 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AS_BaseQRankSum- the field will NOT be part of INFO fields in the generated VCF records
23:55:24.260 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AS_FS - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.260 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AS_InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.260 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AS_MQ - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.260 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AS_MQRankSum - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.260 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AS_QD - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.261 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AS_ReadPosRankSum - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.261 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field AS_SOR - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.261 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field FS - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.261 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field InbreedingCoeff- the field will NOT be part of INFO fields in the generated VCF records
23:55:24.261 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field QD - the field will NOT be part of INFO fields in the generated VCF records
23:55:24.261 INFO NativeGenomicsDB - pid=107 tid=108 No valid combination operation found for INFO field SOR - the fieldwill NOT be part of INFO fields in the generated VCF records
23:55:26.663 INFO FeatureManager - Using codec IntervalListCodec to read file gs://fc-secure-29075a92-7950-4778-aa20-874a75cd37bf/cromwell/execution/GnarlyJointGenotypingPart1/6f5bf4f3-cf87-40ff-bc8c-46ed85202654/call-G2CSplitIntervals/glob-5db530f76090fc6b87003e11af3127d1/000006-scattered.interval_list
23:55:27.055 INFO IntervalArgumentCollection - Processing 20141 bp from intervals
23:55:27.112 INFO GnarlyGenotyper - Done initializing engine
23:55:27.113 WARN GnarlyGenotyper - The --only-output-calls-starting-in-intervals option is deprecated. Please use '--variant-output-filtering STARTS_IN' for an equivalent filtering.
23:55:27.511 INFO Reflections - Reflections took 147 ms to scan 1 urls, producing 11 keys and 39 values
23:55:27.615 INFO Reflections - Reflections took 101 ms to scan 1 urls, producing 11 keys and 39 values
23:55:27.616 INFO ProgressMeter - Starting traversal
23:55:27.617 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
Chromosome chr21 position 8202866 (TileDB column 2785675936) has too many alleles in the combined VCF record : 14 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
23:57:34.449 INFO ProgressMeter - chr21:8203459 2.1 1000 473.1
Chromosome chr21 position 8203682 (TileDB column 2785676752) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
23:58:09.494 INFO ProgressMeter - chr21:8204486 2.7 2000 741.3
Chromosome chr21 position 8208018 (TileDB column 2785681088) has too many alleles in the combined VCF record : 13 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8208019 (TileDB column 2785681089) has too many alleles in the combined VCF record : 10 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
23:59:33.733 INFO ProgressMeter - chr21:8208113 4.1 3000 731.4
Chromosome chr21 position 8208479 (TileDB column 2785681549) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8208493 (TileDB column 2785681563) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8209111 (TileDB column 2785682181) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8209116 (TileDB column 2785682186) has too many alleles in the combined VCF record : 10 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8209117 (TileDB column 2785682187) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
00:04:36.516 INFO ProgressMeter - chr21:8209113 9.1 4000 437.2
Chromosome chr21 position 8209118 (TileDB column 2785682188) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
00:05:22.608 INFO ProgressMeter - chr21:8211351 9.9 5000 504.2
Chromosome chr21 position 8211686 (TileDB column 2785684756) has too many alleles in the combined VCF record : 16 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8211705 (TileDB column 2785684775) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8211707 (TileDB column 2785684777) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8211709 (TileDB column 2785684779) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
00:06:02.619 INFO ProgressMeter - chr21:8212714 10.6 6000 566.9
Chromosome chr21 position 8212916 (TileDB column 2785685986) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8212952 (TileDB column 2785686022) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213080 (TileDB column 2785686150) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213131 (TileDB column 2785686201) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213183 (TileDB column 2785686253) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213185 (TileDB column 2785686255) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213285 (TileDB column 2785686355) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213335 (TileDB column 2785686405) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213338 (TileDB column 2785686408) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213339 (TileDB column 2785686409) has too many alleles in the combined VCF record : 12 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213361 (TileDB column 2785686431) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213393 (TileDB column 2785686463) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213394 (TileDB column 2785686464) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213402 (TileDB column 2785686472) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213441 (TileDB column 2785686511) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8213602 (TileDB column 2785686672) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
00:12:19.455 INFO ProgressMeter - chr21:8213714 16.9 7000 415.1
Chromosome chr21 position 8214405 (TileDB column 2785687475) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214454 (TileDB column 2785687524) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214473 (TileDB column 2785687543) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214495 (TileDB column 2785687565) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214497 (TileDB column 2785687567) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214501 (TileDB column 2785687571) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214505 (TileDB column 2785687575) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214520 (TileDB column 2785687590) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214628 (TileDB column 2785687698) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214648 (TileDB column 2785687718) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214649 (TileDB column 2785687719) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214650 (TileDB column 2785687720) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214658 (TileDB column 2785687728) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214669 (TileDB column 2785687739) has too many alleles in the combined VCF record : 10 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214671 (TileDB column 2785687741) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214672 (TileDB column 2785687742) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214676 (TileDB column 2785687746) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214679 (TileDB column 2785687749) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
00:18:16.466 INFO ProgressMeter - chr21:8214714 22.8 8000 350.7
Chromosome chr21 position 8214732 (TileDB column 2785687802) has too many alleles in the combined VCF record : 10 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214734 (TileDB column 2785687804) has too many alleles in the combined VCF record : 17 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214749 (TileDB column 2785687819) has too many alleles in the combined VCF record : 15 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214755 (TileDB column 2785687825) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214760 (TileDB column 2785687830) has too many alleles in the combined VCF record : 10 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214761 (TileDB column 2785687831) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214764 (TileDB column 2785687834) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214777 (TileDB column 2785687847) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214780 (TileDB column 2785687850) has too many alleles in the combined VCF record : 11 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214781 (TileDB column 2785687851) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214783 (TileDB column 2785687853) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214788 (TileDB column 2785687858) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214863 (TileDB column 2785687933) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214881 (TileDB column 2785687951) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214894 (TileDB column 2785687964) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214904 (TileDB column 2785687974) has too many alleles in the combined VCF record : 10 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214913 (TileDB column 2785687983) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214920 (TileDB column 2785687990) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8214956 (TileDB column 2785688026) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8215018 (TileDB column 2785688088) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8215104 (TileDB column 2785688174) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
00:23:07.684 INFO ProgressMeter - chr21:8215714 27.7 9000 325.3
00:23:49.097 INFO ProgressMeter - chr21:8216714 28.4 10000 352.6
Chromosome chr21 position 8216861 (TileDB column 2785689931) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8216926 (TileDB column 2785689996) has too many alleles in the combined VCF record : 10 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8216997 (TileDB column 2785690067) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217012 (TileDB column 2785690082) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217206 (TileDB column 2785690276) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217212 (TileDB column 2785690282) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217213 (TileDB column 2785690283) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217305 (TileDB column 2785690375) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217317 (TileDB column 2785690387) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217378 (TileDB column 2785690448) has too many alleles in the combined VCF record : 11 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217381 (TileDB column 2785690451) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217403 (TileDB column 2785690473) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8217405 (TileDB column 2785690475) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
00:28:41.920 INFO ProgressMeter - chr21:8217715 33.2 11000 330.9
Chromosome chr21 position 8218901 (TileDB column 2785691971) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8218941 (TileDB column 2785692011) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8218946 (TileDB column 2785692016) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219053 (TileDB column 2785692123) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219067 (TileDB column 2785692137) has too many alleles in the combined VCF record : 13 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219074 (TileDB column 2785692144) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219246 (TileDB column 2785692316) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219347 (TileDB column 2785692417) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219351 (TileDB column 2785692421) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219354 (TileDB column 2785692424) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219502 (TileDB column 2785692572) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
00:32:27.479 INFO ProgressMeter - chr21:8219499 37.0 12000 324.3
Chromosome chr21 position 8219891 (TileDB column 2785692961) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219892 (TileDB column 2785692962) has too many alleles in the combined VCF record : 19 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219897 (TileDB column 2785692967) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219899 (TileDB column 2785692969) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219900 (TileDB column 2785692970) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219901 (TileDB column 2785692971) has too many alleles in the combined VCF record : 12 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219902 (TileDB column 2785692972) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219903 (TileDB column 2785692973) has too many alleles in the combined VCF record : 29 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219905 (TileDB column 2785692975) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219908 (TileDB column 2785692978) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219909 (TileDB column 2785692979) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219911 (TileDB column 2785692981) has too many alleles in the combined VCF record : 12 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219915 (TileDB column 2785692985) has too many alleles in the combined VCF record : 13 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219917 (TileDB column 2785692987) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219919 (TileDB column 2785692989) has too many alleles in the combined VCF record : 13 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219921 (TileDB column 2785692991) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219923 (TileDB column 2785692993) has too many alleles in the combined VCF record : 12 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219925 (TileDB column 2785692995) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219927 (TileDB column 2785692997) has too many alleles in the combined VCF record : 13 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219929 (TileDB column 2785692999) has too many alleles in the combined VCF record : 43 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219931 (TileDB column 2785693001) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219933 (TileDB column 2785693003) has too many alleles in the combined VCF record : 12 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219935 (TileDB column 2785693005) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219937 (TileDB column 2785693007) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219941 (TileDB column 2785693011) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219947 (TileDB column 2785693017) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219951 (TileDB column 2785693021) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219953 (TileDB column 2785693023) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219956 (TileDB column 2785693026) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219962 (TileDB column 2785693032) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219966 (TileDB column 2785693036) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219969 (TileDB column 2785693039) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219970 (TileDB column 2785693040) has too many alleles in the combined VCF record : 7 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219985 (TileDB column 2785693055) has too many alleles in the combined VCF record : 14 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8219998 (TileDB column 2785693068) has too many alleles in the combined VCF record : 14 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8220002 (TileDB column 2785693072) has too many alleles in the combined VCF record : 11 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8220006 (TileDB column 2785693076) has too many alleles in the combined VCF record : 11 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8220012 (TileDB column 2785693082) has too many alleles in the combined VCF record : 17 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8220024 (TileDB column 2785693094) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8220027 (TileDB column 2785693097) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8220028 (TileDB column 2785693098) has too many alleles in the combined VCF record : 9 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8220029 (TileDB column 2785693099) has too many alleles in the combined VCF record : 12 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8220031 (TileDB column 2785693101) has too many alleles in the combined VCF record : 40 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr21 position 8220033 (TileDB column 2785693103) has too many alleles in the combined VCF record : 8 : currentlimit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
00:37:27.737 INFO GnarlyGenotyper - Shutting down engine
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),1780.0524901770002,Cpu time(s),1773.6591073770003
[March 20, 2025 at 12:37:28 AM GMT] org.broadinstitute.hellbender.tools.walkers.gnarlyGenotyper.GnarlyGenotyper done. Elapsed time: 42.14 minutes.
Runtime.totalMemory()=8388608000
java.lang.NumberFormatException: For input string: "2707900110"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
at java.base/java.lang.Integer.parseInt(Integer.java:668)
at java.base/java.lang.Integer.parseInt(Integer.java:786)
at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsInt(CommonInfo.java:316)
at htsjdk.variant.variantcontext.VariantContext.getAttributeAsInt(VariantContext.java:842)
at org.broadinstitute.hellbender.tools.walkers.gnarlyGenotyper.GnarlyGenotyperEngine.finalizeGenotype(GnarlyGenotyperEngine.java:86)
at org.broadinstitute.hellbender.tools.walkers.gnarlyGenotyper.GnarlyGenotyperEngine.finalizeGenotype(GnarlyGenotyperEngine.java:69)
at org.broadinstitute.hellbender.tools.walkers.gnarlyGenotyper.GnarlyGenotyper.apply(GnarlyGenotyper.java:288)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1119)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:150)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:203)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:222)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
Thanks in advance for any help you can provide!
-
Hi Ryan Collins
The value that tool throws an error is certainly outside of Java Integer bounds (-2147483648 to 2147483647) therefore it cannot parse that value to an integer correctly. Unfortunately Java does not have unsigned integer therefore this value somehow ended up in your GenomicsDB from a particular variant context. It may be beneficial for you to extract chr21 (may be a subset of chr21 looking at the logs somewhere around base 8220033 could work) of the GenomicsDB as GVCF (if you can which for that many samples it may be very problematic but this value seems to be in the INFO field therefore you may extract it as sites only data.) using SelectVariants tool and see where this value actually comes from. Once you get to pinpoint the location you may be able to skip that site and continue genotyping the rest of chr21 normally and deal with this particular site specifically.
I hope this helps.
Regards.
-
Hi Gökalp Çelik,
Thanks so much for your reply and detailed suggestions!
I really like your proposed strategy and want to implement it, but I am having some trouble getting SelectVariants to run on a local copy of the GenomicsDB.
Any chance you can help point out what I'm doing wrong? This is being run using GATK-4.3.0.0 on the NIH AllOfUs Researcher Workbench. I don't have root permissions to install a different version of GATK in this environment and cannot move data outside of the environment, so am hoping the difference in version (4.3 here vs. 4.6 in my original issue) isn't a huge problem.
Here's my invocation of SelectVariants:
gatk SelectVariants \
--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx3000m' \
-R Homo_sapiens_assembly38.fasta \
-V gendb://genomicsdb \
--intervals "chr21:8220030-8220688" \
-O problem_region.vcf.gzAnd here's the stack trace:
Using GATK jar /etc/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx3000m -jar /etc/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar SelectVariants -R Homo_sapiens_assembly38.fasta -V gendb://genomicsdb --intervals chr21:8220030-8220688 -O problem_region.vcf.gz
20:42:33.548 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/etc/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
20:42:33.707 INFO SelectVariants - ------------------------------------------------------------
20:42:33.708 INFO SelectVariants - The Genome Analysis Toolkit (GATK) v4.3.0.0
20:42:33.708 INFO SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
20:42:33.708 INFO SelectVariants - Executing as jupyter@47c18480bb61 on Linux v6.1.112+ amd64
20:42:33.709 INFO SelectVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.20.1+1-post-Ubuntu-0ubuntu120.04
20:42:33.709 INFO SelectVariants - Start Date/Time: March 24, 2025 at 8:42:33 PM UTC
20:42:33.709 INFO SelectVariants - ------------------------------------------------------------
20:42:33.709 INFO SelectVariants - ------------------------------------------------------------
20:42:33.710 INFO SelectVariants - HTSJDK Version: 3.0.1
20:42:33.711 INFO SelectVariants - Picard Version: 2.27.5
20:42:33.711 INFO SelectVariants - Built for Spark Version: 2.4.5
20:42:33.711 INFO SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
20:42:33.711 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
20:42:33.711 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
20:42:33.711 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
20:42:33.711 INFO SelectVariants - Deflater: IntelDeflater
20:42:33.712 INFO SelectVariants - Inflater: IntelInflater
20:42:33.712 INFO SelectVariants - GCS max retries/reopens: 20
20:42:33.712 INFO SelectVariants - Requester pays: disabled
20:42:33.712 INFO SelectVariants - Initializing engine
20:42:34.439 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.3-6069e4a
20:42:34.482 INFO SelectVariants - Shutting down engine
[March 24, 2025 at 8:42:34 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=274726912
***********************************************************************
A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException: Couldn't create GenomicsDBFeatureReader
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:463)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:365)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:58)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:45)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.io.IOException: GenomicsDB JNI Error: FileBasedVidMapperException : contig_info_dict.HasMember("tiledb_column_offset") && contig_info_dict["tiledb_column_offset"].IsInt64()
at org.genomicsdb.reader.GenomicsDBQueryStream.jniGenomicsDBInit(Native Method)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:209)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:182)
at org.genomicsdb.reader.GenomicsDBQueryStream.<init>(GenomicsDBQueryStream.java:91)
at org.genomicsdb.reader.GenomicsDBFeatureReader.generateHeadersForQuery(GenomicsDBFeatureReader.java:200)
at org.genomicsdb.reader.GenomicsDBFeatureReader.<init>(GenomicsDBFeatureReader.java:85)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:460)
... 13 moreIs insufficient memory the issue? I can re-try this on a beefier VM if you think that might be helpful. Or does this tool need write permissions to the root directory (which I also don't have in this environment)?
Any input would be greatly appreciated!
Thanks again,
Ryan
-
Hi again.
Native library versions are different but I may need to relay this to the team to see if there are changes that may break this export. In the meantime if you are able to get your hands onto 4.6.1.0 somehow that would be great. I am not familiar with AoU workbench and I will try to get someones opinion with better hands on with that space.
Regards.
-
Hi Ryan Collins!
The secondary issue you're seeing with the JNI issue might be because of the GenomicsDB version mismatch. Can you try running the SelectVariants as a workflow with the 4.6.1.0 version?
55K samples isn't *that* many... Are these 30X samples? Where did you get those GVCFs? Did you run reblocking as in the biggest practices? That's a lot of alleles. We definitely recommend running ReblockGvcf on each GVCF to get rid of low confidence indel alleles and pare down the total alternate alleles, which should speed things up. Our version of the ReblockGvcf workflow is here: https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.wdl. Although this may not fix the MAX_INT error. Mapping quality has been the historic offender there. You could trying specifying `-XA RMSMappingQuality` in the GnarlyCommand, although that would mean you don't have that annotation available for filtering. I know there was a fix for MQ MAX_INT, but I thought it was a version way previous to 4.6. I'll do a little digging.
Best,
Laura
-
OMG it's the QUAL! Your stack trace leads me to https://github.com/broadinstitute/gatk/blob/ebd8a275bea63b34e55f29e89a0a34aea78cce43/src/main/java/org/broadinstitute/hellbender/tools/walkers/gnarlyGenotyper/GnarlyGenotyperEngine.java#L86. This must be a super high depth site for some reason. For SNPs, the QUAL is roughly 30 x #alt reads, so if you had 30X samples that would be 3 million samples, which is at lot more than 55K obviously. The QUAL can be elevated for indels because that 30 (which comes from the base quality for a good SNP) can be higher, but more than 10X higher would surprise me. So I suspect it's depth. I can't think of a good way to get around this other than excluding the site like Gokalp suggested. And it seems like it's tough to figure out exactly where, but that region of the genome looks pretty low complexity in IGV. I'm having trouble finding our tried-and-true low complexity BED, but you might have your own. Ours comes from Heng Li according to the following:
#The file was created using the symmetric DUST algorithm:
#~hengli/minimap/sdust -t30 hs38DH.fa > sdust30.pre.bed
#awk '{print $1"\t"($2-1)"\t"$3}' sdust30.pre.bed > sdust30.bed
#The second command adds 1bp to the 5'-end as in VCF, an insertion is usually placed 1bp ahead of LCR.I would just exclude the union of LCR regions and sites in the log, maybe plus 500bp or so. Hand wavy, but it will get you unstuck.
Good luck,
Laura
-
Amazing, you are my hero Laura Gauthier!!! I nearly Slacked you about this but didn't want to bother you unnecessarily. That said, I am glad this route linked me up to the original source herself after all!
Re: your original question about the input data, these were genomes we processed as part of a cancer-focused case-control WGS project drawing from All of Us (~37k), Hartwig Medical Foundation (~3.3k), various NCI-related WGS projects (~2.8k), some Dana-Farber patient populations (~2.1k), and controls from TOPMed/other misc dbGaP (~9.4k). We realigned mostly everything using the same pipeline (based roughly on WARP/functional equivalence, with some tweaks), generated gVCFs using the same GATK version, and reblocked everything with the following arguments:
-drop-low-quals \
-rgq-threshold 10 \
-do-qual-approx \
--floor-blocks -GQB 20 -GQB 30 -GQB 40Re: the culprit — interesting! I was wondering if it could be some metric like QUAL that might be getting aggregated across a bunch of samples leading to an unreasonably large integer.
Good to know that LCR filtering is a recommended option here; for this specific issue I was able to just excise a 100bp interval around the problematic record as Gokalp suggested. I didn't encounter this issue on chr19, chr21, or chrY, but if I hit it again on other chromosomes I might just default to the LCR filter strategy genome-wide.
Thanks again for your help & input! I really appreciate it!
Hope all's well with you,
Ryan
-
So happy to hear you were successful! We've solved a surprising number of issues over the years by throwing away poorly behaved LCR regions. Always a good tool to have in one's back pocket.
Please sign in to leave a comment.
7 comments