Error in GnarlyGenotyper (QUALapproxkey)
Dear,
I am trying to genotype SNPs from a set of > 1000 WGS samples. Following the "Biggest Practices" the g.vcf files have been reblocked, and after GenomicsDBimport, I was trying to obtain the final vcf file, using GnarlyGenotyper.
Running on a small interval for testing pourposes, it took 146 minutes, but the output is empty.
In the log there is the following warning:
12:29:57.666 WARN GnarlyGenotyper - At least one variant cannot be genotyped because it is missing the QUALapproxkey assigned by the ReblockGVCFs tool. GnarlyGenotyper output may be empty.
Is there a solution to obtain the genotipic data out of the DB?
Thanks for any advice.
REQUIRED for all errors and issues:
a) GATK version used: gatk-4.6.0.0
b) Exact command used:
gatk --java-options "-Xmx80g -Xms80g" GnarlyGenotyper \
-R /u/xxx/reference/xxx.fasta \
-V gendb://xxx/gatk_db_import/chr01 \
-O /u/xxx/chr01.vcf.gz \
-L xxx/test.interval_list \
--tmp-dir /u/xxx/tmp/ \
--heterozygosity 0.01 \
--max-alternate-alleles 4 \
--merge-input-intervals
c) Entire program log:
Using GATK jar /xxx/gatk-4.6.0.0/gatk-package-4.6.0.0-local.jar
Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80g -Xms80g -jar /xxx/gatk-4.6.0.0/gatk-package-4.6.0.0-local.jar GnarlyGenotyper -R /u/xxx/reference/xxx.fasta -V gendb://xxx/chr01 -O /u/xxx/chr01.vcf.gz -L xxx/test.interval_list --tmp-dir /u/xxx/tmp/ --heterozygosity 0.01 --max-alternate-alleles 4 --merge-input-intervals
10:03:12.424 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/xxx/gatk-4.6.0.0/gatk-package-4.6.0.0-local.jar!/com/intel/gkl/native/libgkl
10:03:12.424 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/xxx/gatk-4.6.0.0/gatk-package-4.6.0.0-local.jar!/com/intel/gkl/native/libgkl
_compression.so 10:03:12.547 INFO GnarlyGenotyper - ------------------------------------------------------------
10:03:12.550 INFO GnarlyGenotyper - The Genome Analysis Toolkit (GATK) v4.6.0.0
10:03:12.550 INFO GnarlyGenotyper - For support and documentation go to https://software.broadinstitute.org/gatk/
10:03:12.550 INFO GnarlyGenotyper - Executing as xxx@xxx.hpc on Linux v6.10.3-200.fc40.x86_64 amd64
10:03:12.551 INFO GnarlyGenotyper - Java runtime: OpenJDK 64-Bit Server VM v22.0.1-internal-adhoc.conda.src
10:03:12.551 INFO GnarlyGenotyper - Start Date/Time: December 11, 2024, 10:03:12 AM CET
10:03:12.551 INFO GnarlyGenotyper - ------------------------------------------------------------
10:03:12.551 INFO GnarlyGenotyper - ------------------------------------------------------------
10:03:12.552 INFO GnarlyGenotyper - HTSJDK Version: 4.1.1
10:03:12.552 INFO GnarlyGenotyper - Picard Version: 3.2.0
10:03:12.553 INFO GnarlyGenotyper - Built for Spark Version: 3.5.0
10:03:12.553 INFO GnarlyGenotyper - HTSJDK Defaults.COMPRESSION_LEVEL : 2 10:03:12.553 INFO GnarlyGenotyper - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:03:12.554 INFO GnarlyGenotyper - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:03:12.554 INFO GnarlyGenotyper - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:03:12.555 INFO GnarlyGenotyper - Deflater: IntelDeflater
10:03:12.555 INFO GnarlyGenotyper - Inflater: IntelInflater
10:03:12.555 INFO GnarlyGenotyper - GCS max retries/reopens: 20
10:03:12.555 INFO GnarlyGenotyper - Requester pays: disabled
10:03:12.555 WARN GnarlyGenotyper -
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: GnarlyGenotyper is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
10:03:12.555 INFO GnarlyGenotyper - Initializing engine
10:03:12.830 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.5.3-b586a26
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AC - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AF - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AN - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AS_BaseQRankSum - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AS_FS - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AS_InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AS_MQ - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AS_MQRankSum - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AS_QD - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AS_ReadPosRankSum - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field AS_SOR - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field FS - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field QD - the field will NOT be part of INFO fields in the generated VCF records
10:03:17.321 INFO NativeGenomicsDB - pid=1968181 tid=1968182 No valid combination operation found for INFO field SOR - the field will NOT be part of INFO fields in the generated VCF records10:03:21.741 INFO FeatureManager - Using codec IntervalListCodec to read file file:///xxx/test.interval_list
10:03:21.755 INFO IntervalArgumentCollection - Processing 5038 bp from intervals
10:03:21.768 INFO GnarlyGenotyper - Done initializing engine
10:03:21.981 INFO Reflections - Reflections took 137 ms to scan 1 urls, producing 11 keys and 39 values
10:03:22.075 INFO Reflections - Reflections took 91 ms to scan 1 urls, producing 11 keys and 39 values
10:03:22.075 INFO ProgressMeter - Starting traversal
10:03:22.076 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:29:57.666 WARN GnarlyGenotyper - At least one variant cannot be genotyped because it is missing the QUALapproxkey assigned by the ReblockGVCFs tool. GnarlyGenotyper output may be empty.
12:29:59.456 INFO ProgressMeter - chr01:1762404 146.6 1000 6.8
Chromosome chr01 position 1762808 (TileDB column 1762807) has too many alleles in the combined VCF record : 9 : current limit : 5. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),8.005722431999997,Cpu time(s),7.895096517999998
12:30:05.863 INFO ProgressMeter - chr01:1767066 146.7 5038 34.3
12:30:05.863 INFO ProgressMeter - Traversal complete. Processed 5038 total variants in 146.7 minutes.
12:30:05.869 INFO GnarlyGenotyper - Shutting down engine
[December 11, 2024, 12:30:05 PM CET] org.broadinstitute.hellbender.tools.walkers.gnarlyGenotyper.GnarlyGenotyper done. Elapsed time: 146.89 minutes.
-
There is a paramter in ReblockGVCF tool to add that INFO field to reblocked GVCFs. The below parameter adds that field.
--do-qual-score-approximation true
If you have not done so you may need to rerun the tool on the original files to get it.
I hope this helps.
Regards.
-
Dear Gökalp Çelik,
Totally corret, by adding the option
--do-qual-score-approximation
(which is also reported here https://gatk.broadinstitute.org/hc/en-us/articles/27007997020955-ReblockGVCF), the problem was fixed!
Is there a way to obtain the results from the DB without using GnarlyGenotyper within some days and not months (beside splitting the genome in very small chuncks), since GnarlyGenotypes figures still as in *beta*.
Best regards,
Gabriele
-
GnarlyGenotyper is used well within our biggest practices workflows therefore you can consider it as out-of-beta for the most functionality.
Regards.
Please sign in to leave a comment.
3 comments