GenotypeGVCFs stalls while using --all-sites
AnsweredHello,
I am using GATK4.1.0.0 and trying to combine GVCFs using GenotypeGVCFs. I need to genotype at all sites (not just SNPs) for popgen measures (pi, dxy). When I run without --all-sites, it runs fine, but when I run with --all-sites it stalls after 200K. I have also tried with --include-non-variant-sites. Same problem. I've also tried with different chromosomes. Here is the command, where $ref and $db directories are defined earlier in the script. The output is also pasted below. Any advice would be very much appreciated! Thanks!
gatk --java-options "-Xmx8G" GenotypeGVCFs \
-R $ref \
-V gendb://$db \
-L Chr01 \
--all-sites \
-O Chr01_allsites_raw.vcf
Sep 15, 2020 5:26:28 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
WARNING: Failed to detect whether we are running on Google Compute Engine.
java.net.ConnectException: Network is unreachable
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
at shaded.cloud_nio.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:104)
at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials.runningOnComputeEngine(ComputeEngineCredentials.java:210)
at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.tryGetComputeCredentials(DefaultCredentialsProvider.java:290)
at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentialsUnsynchronized(DefaultCredentialsProvider.java:207)
at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentials(DefaultCredentialsProvider.java:124)
at shaded.cloud_nio.com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:127)
at shaded.cloud_nio.com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:100)
at com.google.cloud.ServiceOptions.defaultCredentials(ServiceOptions.java:304)
at com.google.cloud.ServiceOptions.<init>(ServiceOptions.java:278)
at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:83)
at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:31)
at com.google.cloud.storage.StorageOptions$Builder.build(StorageOptions.java:78)
at org.broadinstitute.hellbender.utils.gcs.BucketUtils.setGlobalNIODefaultOptions(BucketUtils.java:353)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:182)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
17:26:28.688 INFO GenotypeGVCFs - ------------------------------------------------------------
17:26:28.689 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.0.0
17:26:28.689 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
17:26:28.689 INFO GenotypeGVCFs - Executing as rachbay@r080.pvt.bridges.psc.edu on Linux v3.10.0-957.27.2.el7.x86_64 amd64
17:26:28.689 INFO GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_73-b02
17:26:28.689 INFO GenotypeGVCFs - Start Date/Time: September 15, 2020 5:26:28 PM EDT
17:26:28.689 INFO GenotypeGVCFs - ------------------------------------------------------------
17:26:28.689 INFO GenotypeGVCFs - ------------------------------------------------------------
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Version: 2.18.2
17:26:28.690 INFO GenotypeGVCFs - Picard Version: 2.18.25
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:26:28.690 INFO GenotypeGVCFs - Deflater: IntelDeflater
17:26:28.690 INFO GenotypeGVCFs - Inflater: IntelInflater
17:26:28.690 INFO GenotypeGVCFs - GCS max retries/reopens: 20
17:26:28.690 INFO GenotypeGVCFs - Requester pays: disabled
17:26:28.690 INFO GenotypeGVCFs - Initializing engine
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
17:27:15.128 INFO IntervalArgumentCollection - Processing 42612672 bp from intervals
17:27:15.138 INFO GenotypeGVCFs - Done initializing engine
17:27:15.344 INFO ProgressMeter - Starting traversal
17:27:15.345 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
Chromosome Chr01 position 74432 (TileDB column 74431) has too many alleles in the combined VCF record : 61 : current limit : 50. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome Chr01 position 143975 (TileDB column 143974) has too many alleles in the combined VCF record : 55 : current limit : 50. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),685.9508288570295,Cpu time(s),252.06587346199646
17:42:10.112 INFO ProgressMeter - Chr01:1000 14.9 1000 67.1
17:42:29.410 INFO ProgressMeter - Chr01:2000 15.2 2000 131.3
17:43:18.380 INFO ProgressMeter - Chr01:6000 16.1 6000 373.8
17:43:29.260 INFO ProgressMeter - Chr01:8000 16.2 8000 492.9
17:44:01.384 INFO ProgressMeter - Chr01:9000 16.8 9000 536.8
17:44:18.603 INFO ProgressMeter - Chr01:10000 17.1 10000 586.4
17:44:34.819 INFO ProgressMeter - Chr01:11000 17.3 11000 634.9
17:44:51.079 INFO ProgressMeter - Chr01:12000 17.6 12000 682.0
17:45:04.837 INFO ProgressMeter - Chr01:14000 17.8 14000 785.4
17:45:15.947 INFO ProgressMeter - Chr01:15000 18.0 15000 832.9
17:45:34.110 INFO ProgressMeter - Chr01:17000 18.3 17000 928.3
17:45:46.279 INFO ProgressMeter - Chr01:20000 18.5 20000 1080.2
17:45:58.621 INFO ProgressMeter - Chr01:24000 18.7 24000 1282.0
17:46:11.520 INFO ProgressMeter - Chr01:28000 18.9 28000 1478.6
17:46:25.252 INFO ProgressMeter - Chr01:32000 19.2 32000 1669.7
17:46:35.329 INFO ProgressMeter - Chr01:34000 19.3 34000 1758.6
17:46:50.055 INFO ProgressMeter - Chr01:36000 19.6 36000 1838.8
17:47:01.808 INFO ProgressMeter - Chr01:37000 19.8 37000 1871.1
17:47:11.945 INFO ProgressMeter - Chr01:38000 19.9 38000 1905.4
17:47:29.569 INFO ProgressMeter - Chr01:40000 20.2 40000 1976.6
17:47:39.999 INFO ProgressMeter - Chr01:42000 20.4 42000 2057.7
17:47:53.201 INFO ProgressMeter - Chr01:44000 20.6 44000 2132.7
17:48:14.118 INFO ProgressMeter - Chr01:45000 21.0 45000 2145.0
17:48:40.184 INFO ProgressMeter - Chr01:46000 21.4 46000 2148.1
17:48:53.388 INFO ProgressMeter - Chr01:47000 21.6 47000 2172.5
17:49:11.885 INFO ProgressMeter - Chr01:49000 21.9 49000 2233.1
17:49:31.071 INFO ProgressMeter - Chr01:51000 22.3 51000 2290.9
17:49:43.809 INFO ProgressMeter - Chr01:53000 22.5 53000 2358.2
17:49:54.697 INFO ProgressMeter - Chr01:54000 22.7 54000 2383.5
17:50:21.743 INFO ProgressMeter - Chr01:55000 23.1 55000 2380.3
17:50:37.773 INFO ProgressMeter - Chr01:57000 23.4 57000 2438.6
17:51:03.435 INFO ProgressMeter - Chr01:59000 23.8 59000 2478.8
17:51:19.038 INFO ProgressMeter - Chr01:60000 24.1 60000 2493.6
17:51:50.084 INFO ProgressMeter - Chr01:61000 24.6 61000 2481.8
17:52:22.112 INFO ProgressMeter - Chr01:62000 25.1 62000 2468.9
17:52:32.430 INFO ProgressMeter - Chr01:64000 25.3 64000 2531.2
17:52:46.686 INFO ProgressMeter - Chr01:68000 25.5 68000 2664.3
17:52:58.951 INFO ProgressMeter - Chr01:72000 25.7 72000 2798.6
17:53:07.034 WARN MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Chr01:74432
17:53:09.390 INFO ProgressMeter - Chr01:75000 25.9 75000 2895.7
17:53:22.277 INFO ProgressMeter - Chr01:79000 26.1 79000 3025.0
17:53:32.367 INFO ProgressMeter - Chr01:82000 26.3 82000 3119.8
17:53:42.633 INFO ProgressMeter - Chr01:84000 26.5 84000 3175.2
17:53:53.576 INFO ProgressMeter - Chr01:87000 26.6 87000 3266.1
17:54:03.672 INFO ProgressMeter - Chr01:90000 26.8 90000 3357.5
17:54:16.631 INFO ProgressMeter - Chr01:94000 27.0 94000 3478.7
17:54:28.404 INFO ProgressMeter - Chr01:97000 27.2 97000 3563.9
17:54:42.970 INFO ProgressMeter - Chr01:101000 27.5 101000 3678.0
17:54:55.067 INFO ProgressMeter - Chr01:105000 27.7 105000 3795.8
17:55:07.141 INFO ProgressMeter - Chr01:108000 27.9 108000 3876.1
17:55:18.010 INFO ProgressMeter - Chr01:111000 28.0 111000 3958.0
17:55:30.441 INFO ProgressMeter - Chr01:115000 28.3 115000 4070.6
17:55:42.281 INFO ProgressMeter - Chr01:119000 28.4 119000 4182.9
17:55:54.376 INFO ProgressMeter - Chr01:123000 28.7 123000 4293.1
17:56:07.351 INFO ProgressMeter - Chr01:127000 28.9 127000 4399.5
17:56:17.666 INFO ProgressMeter - Chr01:130000 29.0 130000 4476.8
17:56:28.967 INFO ProgressMeter - Chr01:133000 29.2 133000 4550.6
17:56:46.308 INFO ProgressMeter - Chr01:135000 29.5 135000 4573.8
17:56:59.394 INFO ProgressMeter - Chr01:137000 29.7 137000 4607.5
17:57:17.916 INFO ProgressMeter - Chr01:139000 30.0 139000 4626.7
17:57:31.680 INFO ProgressMeter - Chr01:141000 30.3 141000 4657.7
17:57:46.253 WARN MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Chr01:143975
17:57:46.929 INFO ProgressMeter - Chr01:144000 30.5 144000 4717.2
17:58:00.245 INFO ProgressMeter - Chr01:148000 30.7 148000 4813.3
17:58:11.713 INFO ProgressMeter - Chr01:151000 30.9 151000 4880.5
17:58:22.299 INFO ProgressMeter - Chr01:154000 31.1 154000 4949.2
17:58:37.859 INFO ProgressMeter - Chr01:157000 31.4 157000 5003.9
17:58:48.262 INFO ProgressMeter - Chr01:160000 31.5 160000 5071.5
17:59:00.787 INFO ProgressMeter - Chr01:164000 31.8 164000 5164.2
17:59:14.582 INFO ProgressMeter - Chr01:167000 32.0 167000 5220.8
17:59:26.127 INFO ProgressMeter - Chr01:168000 32.2 168000 5220.7
17:59:44.531 INFO ProgressMeter - Chr01:170000 32.5 170000 5233.0
18:00:01.756 INFO ProgressMeter - Chr01:172000 32.8 172000 5248.1
18:00:14.106 INFO ProgressMeter - Chr01:174000 33.0 174000 5276.0
18:00:26.962 INFO ProgressMeter - Chr01:178000 33.2 178000 5362.5
18:00:38.501 INFO ProgressMeter - Chr01:181000 33.4 181000 5421.4
18:00:49.075 INFO ProgressMeter - Chr01:184000 33.6 184000 5482.4
18:01:00.092 INFO ProgressMeter - Chr01:187000 33.7 187000 5541.4
18:01:11.563 INFO ProgressMeter - Chr01:190000 33.9 190000 5598.6
18:01:22.578 INFO ProgressMeter - Chr01:193000 34.1 193000 5656.4
18:01:34.090 INFO ProgressMeter - Chr01:196000 34.3 196000 5712.2
18:01:47.073 INFO ProgressMeter - Chr01:200000 34.5 200000 5792.3
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
-
Thank you for reporting this error as well Eric C. Anderson! This is very helpful for figuring out the bug.
-
Andrius Jonas Dagilis it doesn't look like there is going to be a quick solution for the GenotypeGVCFs bug. For the stalling issue, can you try running GenotypeGVCFs 4.2.4.0 with your GenomicsDB from 4.2.4.1?
-
I will give it a shot! Just about done re-generating gvcfs just in case something went wrong in that process, should have an update for you over the weekend.
-
I just wanted to chime in to say that I have had success running GenotypeGVCFs 4.2.4.0 on genomicsDBs from GenomicsDBImport 4.2.4.1. A long run across 34 chromosomes and 14 collections of scaffolds with 375 individuals is just finishing up with no problems. Options that were in effect can be seen in the rules involved highlighted here: https://github.com/eriqande/dna-seq-gatk-variant-calling/blob/yukon-chinookomes/rules/calling.smk#L36-L139
Thanks again for posting about this Andrius! I would still be completely in the woods without you reporting the problem.
-
I've run into the "genotype does not contain likelihoods" problem as well, while running a pipeline that had worked before, with GATK GenotypeGVCFs 4.2.4.1
-
Nicolas Rochette We are going to be releasing a new point release of GATK ASAP to fix this GenotypeGVCFs issue. You can read more at the bug report here: https://github.com/broadinstitute/gatk/issues/7639
Andrius Jonas Dagilis Eric C. Anderson Nicolas Rochette we would appreciate if any of you can test your data on the branch ldg_maxAltAlleleBugFix to see if the bug fix solves the problem for your cases. Please let us know if you have questions on how to get this branch!
-
A new GATK release is available that fixes the GenotypeGVCFs java.lang.IllegalStateException: https://github.com/broadinstitute/gatk/releases/tag/4.2.5.0
Please sign in to leave a comment.
37 comments