GenotypeGVCFs stalls while using --all-sites
AnsweredHello,
I am using GATK4.1.0.0 and trying to combine GVCFs using GenotypeGVCFs. I need to genotype at all sites (not just SNPs) for popgen measures (pi, dxy). When I run without --all-sites, it runs fine, but when I run with --all-sites it stalls after 200K. I have also tried with --include-non-variant-sites. Same problem. I've also tried with different chromosomes. Here is the command, where $ref and $db directories are defined earlier in the script. The output is also pasted below. Any advice would be very much appreciated! Thanks!
gatk --java-options "-Xmx8G" GenotypeGVCFs \
-R $ref \
-V gendb://$db \
-L Chr01 \
--all-sites \
-O Chr01_allsites_raw.vcf
Sep 15, 2020 5:26:28 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
WARNING: Failed to detect whether we are running on Google Compute Engine.
java.net.ConnectException: Network is unreachable
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
at shaded.cloud_nio.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:104)
at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials.runningOnComputeEngine(ComputeEngineCredentials.java:210)
at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.tryGetComputeCredentials(DefaultCredentialsProvider.java:290)
at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentialsUnsynchronized(DefaultCredentialsProvider.java:207)
at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentials(DefaultCredentialsProvider.java:124)
at shaded.cloud_nio.com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:127)
at shaded.cloud_nio.com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:100)
at com.google.cloud.ServiceOptions.defaultCredentials(ServiceOptions.java:304)
at com.google.cloud.ServiceOptions.<init>(ServiceOptions.java:278)
at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:83)
at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:31)
at com.google.cloud.storage.StorageOptions$Builder.build(StorageOptions.java:78)
at org.broadinstitute.hellbender.utils.gcs.BucketUtils.setGlobalNIODefaultOptions(BucketUtils.java:353)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:182)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
17:26:28.688 INFO GenotypeGVCFs - ------------------------------------------------------------
17:26:28.689 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.0.0
17:26:28.689 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
17:26:28.689 INFO GenotypeGVCFs - Executing as rachbay@r080.pvt.bridges.psc.edu on Linux v3.10.0-957.27.2.el7.x86_64 amd64
17:26:28.689 INFO GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_73-b02
17:26:28.689 INFO GenotypeGVCFs - Start Date/Time: September 15, 2020 5:26:28 PM EDT
17:26:28.689 INFO GenotypeGVCFs - ------------------------------------------------------------
17:26:28.689 INFO GenotypeGVCFs - ------------------------------------------------------------
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Version: 2.18.2
17:26:28.690 INFO GenotypeGVCFs - Picard Version: 2.18.25
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:26:28.690 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:26:28.690 INFO GenotypeGVCFs - Deflater: IntelDeflater
17:26:28.690 INFO GenotypeGVCFs - Inflater: IntelInflater
17:26:28.690 INFO GenotypeGVCFs - GCS max retries/reopens: 20
17:26:28.690 INFO GenotypeGVCFs - Requester pays: disabled
17:26:28.690 INFO GenotypeGVCFs - Initializing engine
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
17:27:15.128 INFO IntervalArgumentCollection - Processing 42612672 bp from intervals
17:27:15.138 INFO GenotypeGVCFs - Done initializing engine
17:27:15.344 INFO ProgressMeter - Starting traversal
17:27:15.345 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
Chromosome Chr01 position 74432 (TileDB column 74431) has too many alleles in the combined VCF record : 61 : current limit : 50. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome Chr01 position 143975 (TileDB column 143974) has too many alleles in the combined VCF record : 55 : current limit : 50. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),685.9508288570295,Cpu time(s),252.06587346199646
17:42:10.112 INFO ProgressMeter - Chr01:1000 14.9 1000 67.1
17:42:29.410 INFO ProgressMeter - Chr01:2000 15.2 2000 131.3
17:43:18.380 INFO ProgressMeter - Chr01:6000 16.1 6000 373.8
17:43:29.260 INFO ProgressMeter - Chr01:8000 16.2 8000 492.9
17:44:01.384 INFO ProgressMeter - Chr01:9000 16.8 9000 536.8
17:44:18.603 INFO ProgressMeter - Chr01:10000 17.1 10000 586.4
17:44:34.819 INFO ProgressMeter - Chr01:11000 17.3 11000 634.9
17:44:51.079 INFO ProgressMeter - Chr01:12000 17.6 12000 682.0
17:45:04.837 INFO ProgressMeter - Chr01:14000 17.8 14000 785.4
17:45:15.947 INFO ProgressMeter - Chr01:15000 18.0 15000 832.9
17:45:34.110 INFO ProgressMeter - Chr01:17000 18.3 17000 928.3
17:45:46.279 INFO ProgressMeter - Chr01:20000 18.5 20000 1080.2
17:45:58.621 INFO ProgressMeter - Chr01:24000 18.7 24000 1282.0
17:46:11.520 INFO ProgressMeter - Chr01:28000 18.9 28000 1478.6
17:46:25.252 INFO ProgressMeter - Chr01:32000 19.2 32000 1669.7
17:46:35.329 INFO ProgressMeter - Chr01:34000 19.3 34000 1758.6
17:46:50.055 INFO ProgressMeter - Chr01:36000 19.6 36000 1838.8
17:47:01.808 INFO ProgressMeter - Chr01:37000 19.8 37000 1871.1
17:47:11.945 INFO ProgressMeter - Chr01:38000 19.9 38000 1905.4
17:47:29.569 INFO ProgressMeter - Chr01:40000 20.2 40000 1976.6
17:47:39.999 INFO ProgressMeter - Chr01:42000 20.4 42000 2057.7
17:47:53.201 INFO ProgressMeter - Chr01:44000 20.6 44000 2132.7
17:48:14.118 INFO ProgressMeter - Chr01:45000 21.0 45000 2145.0
17:48:40.184 INFO ProgressMeter - Chr01:46000 21.4 46000 2148.1
17:48:53.388 INFO ProgressMeter - Chr01:47000 21.6 47000 2172.5
17:49:11.885 INFO ProgressMeter - Chr01:49000 21.9 49000 2233.1
17:49:31.071 INFO ProgressMeter - Chr01:51000 22.3 51000 2290.9
17:49:43.809 INFO ProgressMeter - Chr01:53000 22.5 53000 2358.2
17:49:54.697 INFO ProgressMeter - Chr01:54000 22.7 54000 2383.5
17:50:21.743 INFO ProgressMeter - Chr01:55000 23.1 55000 2380.3
17:50:37.773 INFO ProgressMeter - Chr01:57000 23.4 57000 2438.6
17:51:03.435 INFO ProgressMeter - Chr01:59000 23.8 59000 2478.8
17:51:19.038 INFO ProgressMeter - Chr01:60000 24.1 60000 2493.6
17:51:50.084 INFO ProgressMeter - Chr01:61000 24.6 61000 2481.8
17:52:22.112 INFO ProgressMeter - Chr01:62000 25.1 62000 2468.9
17:52:32.430 INFO ProgressMeter - Chr01:64000 25.3 64000 2531.2
17:52:46.686 INFO ProgressMeter - Chr01:68000 25.5 68000 2664.3
17:52:58.951 INFO ProgressMeter - Chr01:72000 25.7 72000 2798.6
17:53:07.034 WARN MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Chr01:74432
17:53:09.390 INFO ProgressMeter - Chr01:75000 25.9 75000 2895.7
17:53:22.277 INFO ProgressMeter - Chr01:79000 26.1 79000 3025.0
17:53:32.367 INFO ProgressMeter - Chr01:82000 26.3 82000 3119.8
17:53:42.633 INFO ProgressMeter - Chr01:84000 26.5 84000 3175.2
17:53:53.576 INFO ProgressMeter - Chr01:87000 26.6 87000 3266.1
17:54:03.672 INFO ProgressMeter - Chr01:90000 26.8 90000 3357.5
17:54:16.631 INFO ProgressMeter - Chr01:94000 27.0 94000 3478.7
17:54:28.404 INFO ProgressMeter - Chr01:97000 27.2 97000 3563.9
17:54:42.970 INFO ProgressMeter - Chr01:101000 27.5 101000 3678.0
17:54:55.067 INFO ProgressMeter - Chr01:105000 27.7 105000 3795.8
17:55:07.141 INFO ProgressMeter - Chr01:108000 27.9 108000 3876.1
17:55:18.010 INFO ProgressMeter - Chr01:111000 28.0 111000 3958.0
17:55:30.441 INFO ProgressMeter - Chr01:115000 28.3 115000 4070.6
17:55:42.281 INFO ProgressMeter - Chr01:119000 28.4 119000 4182.9
17:55:54.376 INFO ProgressMeter - Chr01:123000 28.7 123000 4293.1
17:56:07.351 INFO ProgressMeter - Chr01:127000 28.9 127000 4399.5
17:56:17.666 INFO ProgressMeter - Chr01:130000 29.0 130000 4476.8
17:56:28.967 INFO ProgressMeter - Chr01:133000 29.2 133000 4550.6
17:56:46.308 INFO ProgressMeter - Chr01:135000 29.5 135000 4573.8
17:56:59.394 INFO ProgressMeter - Chr01:137000 29.7 137000 4607.5
17:57:17.916 INFO ProgressMeter - Chr01:139000 30.0 139000 4626.7
17:57:31.680 INFO ProgressMeter - Chr01:141000 30.3 141000 4657.7
17:57:46.253 WARN MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Chr01:143975
17:57:46.929 INFO ProgressMeter - Chr01:144000 30.5 144000 4717.2
17:58:00.245 INFO ProgressMeter - Chr01:148000 30.7 148000 4813.3
17:58:11.713 INFO ProgressMeter - Chr01:151000 30.9 151000 4880.5
17:58:22.299 INFO ProgressMeter - Chr01:154000 31.1 154000 4949.2
17:58:37.859 INFO ProgressMeter - Chr01:157000 31.4 157000 5003.9
17:58:48.262 INFO ProgressMeter - Chr01:160000 31.5 160000 5071.5
17:59:00.787 INFO ProgressMeter - Chr01:164000 31.8 164000 5164.2
17:59:14.582 INFO ProgressMeter - Chr01:167000 32.0 167000 5220.8
17:59:26.127 INFO ProgressMeter - Chr01:168000 32.2 168000 5220.7
17:59:44.531 INFO ProgressMeter - Chr01:170000 32.5 170000 5233.0
18:00:01.756 INFO ProgressMeter - Chr01:172000 32.8 172000 5248.1
18:00:14.106 INFO ProgressMeter - Chr01:174000 33.0 174000 5276.0
18:00:26.962 INFO ProgressMeter - Chr01:178000 33.2 178000 5362.5
18:00:38.501 INFO ProgressMeter - Chr01:181000 33.4 181000 5421.4
18:00:49.075 INFO ProgressMeter - Chr01:184000 33.6 184000 5482.4
18:01:00.092 INFO ProgressMeter - Chr01:187000 33.7 187000 5541.4
18:01:11.563 INFO ProgressMeter - Chr01:190000 33.9 190000 5598.6
18:01:22.578 INFO ProgressMeter - Chr01:193000 34.1 193000 5656.4
18:01:34.090 INFO ProgressMeter - Chr01:196000 34.3 196000 5712.2
18:01:47.073 INFO ProgressMeter - Chr01:200000 34.5 200000 5792.3
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
-
Hi Andrius,
Sorry, this is really frustrating and I'm not sure I have a good answer for you. I did finally get it to work, but only because our lab switched to a new cluster. It then ran without any issue. I hope you can find a solution!
Best,
Rachael
-
Rachael Bay have you tried increasing your memory allocation to be larger (--java-options "-Xmx8G") ? If there is an issue with memory allocation another trick is to specify a temporary directory with the option --tmp-dir
-
Hi Genevieve Brandt (she/her) - thanks for the response! We've tried with up to 96G for java and both with and without the --tmp-dir option. Any other ideas?
-
Rachael Bay since it stalls at the same location each time, are there many alleles at that location? It might be useful to check out DepthofCoverage or another tool and look at that area in the genome.
Another thing you can try is to update your GATK version since we continually improve the tools.
-
Hi Genevieve Brandt (she/her), I don't think it's the location, since we stall at the exact same position (200000) when we try a different chromosome. I've also tried updating my GATK version and same issue. We're really stuck here, any help would be great!
-
Rachael Bay have you had any successful runs using --all-sites with any of your chromosomes? Or do they all fail at position 200000?
Does the job fail, or just stall? And for how long?
-
Genevieve Brandt (she/her) I have only tried the first 2 chromosomes, but they both stalled at position 200000. The jobs do not actually fail, they just stall. They reach 200000 quickly (an hour or so) and I let them run for 24 hours.
-
Rachael Bay thank you for this info. I am continuing to look into solutions and will let you know when I get more information.
-
Hi Rachael Bay,
Sometimes early positions in the chromosome do not have data and we are wondering if this is an issue with writing an enormous amount of information once it starts outputting data.
- Could you try writing to a .gz VCF file? It should have better performance.
- How big is your genomicsDB, how many samples are you working with?
- Could you give more information about your machine that you are running this on? How much memory is available to the machine and is it the exact amount that you are giving for the java memory? The machine will need slightly more memory than you give in the java options to account for the C++ memory.
- Could you start in the middle of the chromosome and see if it runs? Perhaps from position 500,000 to 1,000,000
-
Hi Genevieve Brandt (she/her),
*I tried writing to a .gz file - same issue
*The genomicsDB directory is large - looks like 793G. We have 471 individuals.
*I am running on the Bridges cluster through the NSF XSEDE program. I have tried both regular memory and large memory. On large memory I requested 128G from the machine while asking for 96 in the java options, so there should be extra memory there.
*Does the genomicsDB have to be remade to start in the middle of the chromosome? I tried specifying the interval you suggested above with Haplotype caller, but it JUST stalled (never got the progress meter showing position)
-
Hi Rachael Bay, thank you for those updates. I have some follow up questions to try to get to the bottom of this.
- We want to determine if this is an issue with the genomicsDB or GenotypeGVCFs with --all-sites. Could you run SelectVariants at position 200k-201k and see if it gets any results?
- What species are you working with? Is there anything weird happening at the reference at position 200k?
-
Hi Genevieve Brandt (she/her). When I look at that region, it actually looks like the last position is 1999988. This is exactly the same across multiple chromosomes (I have tried three of six). I am working with Zostera marina, but I don't really see why there would be something weird happening at that position in multiple chromosomes.
-
Did you run SelectVariants yet?
I am confused about what you mean by the last position is 1999988, since we are looking into position 200K? Am I misunderstanding, or did you possibly make a typo?
-
Genevieve Brandt (she/her) Sorry, it was a typo sort of. Just looking right now. The last position recorded for Chr01 is 199988 and for Chr03 is 199998, so not EXACTLY the same position. Let me know if it would be at all useful for me to try this on the other chromosomes. I'm really at a loss for what else to try!
-
Also, I guess I'm confused about how running SelectVariants would see anything different since it would just return a subset of the raw vcf file, right?
-
Rachael Bay thanks for clarifying!
Running SelectVariants, though not helpful for your final result, helps us in our troubleshooting of this issue. We are trying to determine if this runtime issue is from the GenomicsDB or if it is from GenotypeGVCFs.
-
Genevieve Brandt (she/her) thanks, that make sense. When I run SelectVariants for Chr01:200000-201000 it stalls with no variants output.
-
Rachael Bay thanks for the update, I'll bring this up with my team and let you know when I have more information
-
Hi Rachael Bay, here are a few more things that you could check that would give us more information:
- While you run GenotypeGVCFs and it is stalled, could you inspect the java process and what is going on? The command is jstack with the process ID: (https://docs.oracle.com/en/java/javase/13/docs/specs/man/jstack.html). Look at the jstack a few different times to try to figure out what is going on while it is stalled.
- Could you give more information about how you ran the import to your GenomicsDB? Was it in batches?
- Could you create a new GenomicsDB workspace with all the individuals on a small interval, for example 195,000-200,000 and see if the same issue persists?
And something to try for a possible improvement:
- What version of GATK did you use to create the GenomicsDB? We have been improving GenomicsDB quite a bit and our most up to date version is 4.1.9.0. If you re-run the import you may see better performance and it may help the GenotypeGVCFs process. [You will need to use the option --genomicsdb-shared-posixfs-optimizations for your cluster]
-
Hi,
Reviving this thread since I've run into a nearly identical issue. I am running GATK 4.1.9.0, and have a data-set of 807 flies for which I am trying to genotype individual chromosomes using the all-sites option. Each chromosome is in a separate database folder, and each chromosome stalls roughly 200,000 bases in. I'm running the following commands for GenotypeGVCF:
gatk --java-options "-Xmx190g" GenotypeGVCFs \
-all-sites \
-R /proj/matutelb/projects/drosophila/melanogaster/dmel6_ref.fasta \
-V gendb://all_mels_chr2L \
-L chr2L \
-O all_mels_chr2L.vcf.gz
This is on a large memory node on our cluster, for which I request 200g of memory to give some overhead.
I ran GenomicsDBImport in batches of 50, have not yet tried subsampling the chromosomes into smaller chunks for database/genotypegvcf. Currently trying to login to the nodes and take a look at jstack, will update once I have some results.
Mainly reviving the thread to see if a solution was ever found.
Cheers,
Andrius J. Dagilis -
Thanks for the update Rachael Bay!
Andrius Jonas Dagilis my first recommendation would be use a lower percentage of the available physical memory for your GenotypeGVCFs Java Xmx value. Since you have allocated 200G, you'll want your Xmx value between 160-180.
We now have an article with all of our performance recommendations using GenomicsDB here: https://gatk.broadinstitute.org/hc/en-us/articles/360056138571-GenomicsDBImport-usage-and-performance-guidelines
-
Thanks Genevieve Brandt (she/her),
I've started another job using a different amount of memory, but since it's still in queue, here's the jstack output from one of the jobs:Full thread dump OpenJDK 64-Bit Server VM (25.292-b10 mixed mode):
"Attach Listener" #11 daemon prio=9 os_prio=0 tid=0x00007fa394001000 nid=0x1dbb6 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"pool-2-thread-1" #10 prio=5 os_prio=0 tid=0x00007fd3653a4000 nid=0x48e6 runnable [0x00007fa398188000]
java.lang.Thread.State: RUNNABLE
at org.genomicsdb.importer.GenomicsDBImporterJni.jniImportBatch(Native Method)
at org.genomicsdb.importer.GenomicsDBImporter.doSingleImport(GenomicsDBImporter.java:588)
at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:703)
at org.genomicsdb.importer.GenomicsDBImporter$$Lambda$84/1713988669.get(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007fd36414a000 nid=0x4661 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007fd364147000 nid=0x4660 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007fd364145800 nid=0x465f waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007fd364136800 nid=0x465e runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fd36410a000 nid=0x465d in Object.wait() [0x00007fa3ac65e000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007fb3aa15bc70> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x00007fb3aa15bc70> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fd364105800 nid=0x4659 in Object.wait() [0x00007fa3ac75f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007fb3aa1596d8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00007fb3aa1596d8> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00007fd364054800 nid=0x4652 waiting on condition [0x00007fd36bc0e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007fb3aae4fad8> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
at org.genomicsdb.importer.GenomicsDBImporter$$Lambda$85/1662312252.apply(Unknown Source)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at org.genomicsdb.importer.GenomicsDBImporter.iterateOverSamplesInBatches(GenomicsDBImporter.java:679)
at org.genomicsdb.importer.GenomicsDBImporter.executeImport(GenomicsDBImporter.java:637)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.traverse(GenomicsDBImport.java:748)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
"VM Thread" os_prio=0 tid=0x00007fd3640fb800 nid=0x4656 runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007fd36414c800 nid=0x4662 waiting on condition
JNI global references: 428
Out of curiosity, I also ran a job without including all-sites, and that seems to be running just fine, currently well past 5 Mb, so definitely an issue with the -all-sites option specifically. -
Andrius Jonas Dagilis Let me know what you find with the run that is going! Hopefully the memory change will help.
It makes sense that -all-sites would greatly affect the resources needed. It usually adds a significant number of sites for analysis.
-
Hey Genevieve Brandt (she/her), I let a few different jobs try and run over the weekend and they have all stalled at 200kb in. I have tried:
Giving the node more memory overhead (190 for GATK, 260 for the job)
Giving more overhead with the same memory (160 for GATK, 200 for the job)
Creating a region database for a smaller portion of a chromosome (the first 10Mb)
And the one thing that worked, but doesn't seem like a good solution:
Use only half of the individuals to create a database.
I'm tempted to split the calling into separate calls for different geographic regions, but we're asking, in part, how much gene-flow between populations there is, so joint-calling would be preferred. -
Thanks for the update Andrius Jonas Dagilis. There have been some GenomicsDB changes in the most recent versions of GATK (we are currently on 4.2.4.1) that might help you out here. First, use the option --genomicsdb-shared-posixfs-optimizations, which can help performance in shared filesystems.
Another option to make sure you can keep all the individuals would be to decrease the --max-alternate-alleles option in GenotypeGVCFs. The default is 6. If you are using -all-sites and many of your sites have a high number of alternate alleles, this can cause major slow downs with GenotypeGVCFs. One caveat to this option: this option was broken before 4.2.4.1 so you will definitely have to update)
-
Updating to 4.2.4.1 has at least changed the errors, so that's something.
I created the databases using the -genomicsdb-shared-poixfs-optimizations parameter, and attempted to run GenotypeGVCFs using several different cutoffs for the --max-alternate-alleles option, going from 5, to 4, to 3. In all cases, I am still getting errors and crashing out, but now for entirely new reasons. For each chromosome, I get an error of the type:
Genotype [LA69_2nd_aliq AGTGAGACCAATATCTAGG|AGTGAGACCAATATCTAGG GQ 3 DP 1 AD 0,0,1,0,0,0,0 {PGT=0|1, PID=5669_G_A, PS=5669, SB=0,0,0,1}] does not contain likelihoods necessary to calculate posteriors.
At first I thought perhaps this info was missing for a particular individual and therefore causing errors. However, this information seems to be missing for different individuals for different chromosomes, and changes depending on the --max-alternate-alleles option I use.
One big caveat - I have not re-generated the gvcfs for each sample using GATK 4.2.4.1, so just in case that's at fault I am re-running HaplotypeCaller in case that makes a difference. -
Andrius Jonas Dagilis can you post the entire program log that contains that error? [You can delete some lines if there are too many duplicates, but it would be helpful to see the complete output]
-
Here you go - tried running without the --max-alternate-alleles option and had the same result, so I don't think that option is causing it.
Using GATK jar /nas/longleaf/apps/gatk/4.2.4.1/gatk-4.2.4.1/gatk-package-4.2.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx190g -jar /nas/longleaf/apps/gatk/4.2.4.1/gatk-4.2.4.1/gatk-package-4.2.4.1-local.jar GenotypeGVCFs -R /proj/matutelb/projects/drosophila/melanogaster/dmel6_ref.fasta -V gendb://all_mels_chr2L -L chr2L -O all_mels_chr2L.vcf.gz
19:40:48.803 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/nas/longleaf/apps/gatk/4.2.4.1/gatk-4.2.4.1/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 13, 2022 7:40:50 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
19:40:50.088 INFO GenotypeGVCFs - ------------------------------------------------------------
19:40:50.089 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.4.1
19:40:50.090 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
19:40:50.102 INFO GenotypeGVCFs - Executing as adagilis@t0601.ll.unc.edu on Linux v3.10.0-1160.2.2.el7.x86_64 amd64
19:40:50.103 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_292-b10
19:40:50.103 INFO GenotypeGVCFs - Start Date/Time: January 13, 2022 7:40:48 PM EST
19:40:50.103 INFO GenotypeGVCFs - ------------------------------------------------------------
19:40:50.103 INFO GenotypeGVCFs - ------------------------------------------------------------
19:40:50.104 INFO GenotypeGVCFs - HTSJDK Version: 2.24.1
19:40:50.104 INFO GenotypeGVCFs - Picard Version: 2.25.4
19:40:50.104 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
19:40:50.104 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
19:40:50.104 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:40:50.104 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
19:40:50.104 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:40:50.105 INFO GenotypeGVCFs - Deflater: IntelDeflater
19:40:50.105 INFO GenotypeGVCFs - Inflater: IntelInflater
19:40:50.105 INFO GenotypeGVCFs - GCS max retries/reopens: 20
19:40:50.105 INFO GenotypeGVCFs - Requester pays: disabled
19:40:50.105 INFO GenotypeGVCFs - Initializing engine
19:40:51.727 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.3-6069e4a
19:41:10.142 info NativeGenomicsDB - pid=29116 tid=29117 No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
19:41:10.143 info NativeGenomicsDB - pid=29116 tid=29117 No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
19:41:10.143 info NativeGenomicsDB - pid=29116 tid=29117 No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
19:41:28.066 INFO IntervalArgumentCollection - Processing 23513712 bp from intervals
19:41:28.137 INFO GenotypeGVCFs - Done initializing engine
19:41:28.487 INFO ProgressMeter - Starting traversal
19:41:28.488 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
19:42:19.471 INFO ProgressMeter - chr2L:5364 0.8 1000 1176.9
19:42:32.573 INFO ProgressMeter - chr2L:6364 1.1 2000 1872.5
19:42:48.677 INFO ProgressMeter - chr2L:8364 1.3 4000 2992.9
19:43:03.248 INFO ProgressMeter - chr2L:10364 1.6 6000 3799.1
Chromosome chr2L position 11353 (TileDB column 11352) has too many alleles in the combined VCF record : 8 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
19:43:18.779 INFO ProgressMeter - chr2L:12364 1.8 8000 4352.1
Chromosome chr2L position 13464 (TileDB column 13463) has too many alleles in the combined VCF record : 11 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
19:43:32.483 INFO ProgressMeter - chr2L:14364 2.1 10000 4838.9
19:43:43.998 INFO ProgressMeter - chr2L:16364 2.3 12000 5313.3
Chromosome chr2L position 16479 (TileDB column 16478) has too many alleles in the combined VCF record : 8 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr2L position 17004 (TileDB column 17003) has too many alleles in the combined VCF record : 13 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
19:43:57.857 INFO ProgressMeter - chr2L:17364 2.5 13000 5222.0
Chromosome chr2L position 18754 (TileDB column 18753) has too many alleles in the combined VCF record : 9 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
Chromosome chr2L position 19311 (TileDB column 19310) has too many alleles in the combined VCF record : 16 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
19:44:17.535 INFO ProgressMeter - chr2L:19364 2.8 15000 5324.0
Chromosome chr2L position 19835 (TileDB column 19834) has too many alleles in the combined VCF record : 11 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
19:44:34.904 INFO ProgressMeter - chr2L:21364 3.1 17000 5471.6
19:44:47.867 INFO ProgressMeter - chr2L:23364 3.3 19000 5717.8
Chromosome chr2L position 25349 (TileDB column 25348) has too many alleles in the combined VCF record : 7 : current limit : 6. Fields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
19:45:00.952 INFO GenotypeGVCFs - Shutting down engine
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),57.82864317899994,Cpu time(s),49.25545264700008
[January 13, 2022 7:45:01 PM EST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 4.22 minutes.
Runtime.totalMemory()=2076049408
java.lang.IllegalStateException: Genotype [Ark CTTT/CTTT GQ 24 DP 8 AD 0,8,0,0,0,0,0,0 {SB=0,0,4,4}] does not contain likelihoods necessary to calculate posteriors.
at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.AlleleFrequencyCalculator.log10NormalizedGenotypePosteriors(AlleleFrequencyCalculator.java:89)
at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.AlleleFrequencyCalculator.effectiveAlleleCounts(AlleleFrequencyCalculator.java:258)
at org.broadinstitute.hellbender.tools.walkers.genotyper.afcalc.AlleleFrequencyCalculator.calculate(AlleleFrequencyCalculator.java:141)
at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:147)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.calculateGenotypes(GenotypeGVCFsEngine.java:244)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.regenotypeVC(GenotypeGVCFsEngine.java:152)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.callRegion(GenotypeGVCFsEngine.java:135)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:283)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$0(VariantLocusWalker.java:135)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
(base) [adagilis@longleaf-login4 logs]$
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:490)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:132)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289) -
Thanks for sharing this. It seems like there might be a bug in this newest GenotypeGVCFs version so I have created a ticket on the GATK github repo: https://github.com/broadinstitute/gatk/issues/7639. I'll follow up next week regarding what the team finds.
Are you able to test to see what version this problem originated in? This would help us to narrow down the problem.
Thank you,
Genevieve
-
Hi Genevieve, thank you for working on this. And thank you Andrius for posting the error.
I have the same problem. It is not just associated with --all-sites. The error seems to occur when the number of alleles is exactly one greater than the value of ` --max-alternate-alleles`. Note that is the case in Andrius's output: it throws a warning, but no error when there are more than 7 alleles, but when there are exactly 7 alleles (one more than the default max of 6) it errors out.
I have tried this on a number of different chromosomes (small test case with 4 Chinook salmon) and always get the error immediately after a warning about having 7 alleles. On one chromosome, an earlier warning about 8 alleles causes no error, UNLESS I set `--max-alternate-alleles` to 7, in which case, the position with 8 alleles causes the error immediately after the warning.
I will copy this the GitHub issue. It seems like it should be pretty easy to find and correct this bug, though I couldn't find it in the source code.
Please sign in to leave a comment.
37 comments