GenomicsDBimport not importing all the batches
Hi,
I am using GATK version 4.1.9.0.
I used GenomicsDBimport to create a data store for around 850 samples with Batch size=50 (17 batches). I used the follwing script.,
gatk --java-options "-Xmx80g" GenomicsDBImport \
--genomicsdb-workspace-path /prj/allsamples1 \
--batch-size 50 \
-L /prj/Felix.interval_list \
--max-num-intervals-to-import-in-parallel 8 \
--merge-input-intervals true \
--sample-name-map /prj/allplates.sample_map \
--tmp-dir /prj/allsamples_temp \
--genomicsdb-shared-posixfs-optimizations true \
2> /prj/allsamples1.err \
The script is ending without any error but only 14 out of 17 batches have been imported. An there is an message at the end of the log file:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80g -jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /prj/allsamples1 --batch-size 50 -L /prj/Felix.interval_list --max-num-intervals-to-import-in-parallel 8 --merge-input-intervals true --sample-name-map /prj/allplates.sample_map --tmp-dir /prj/allsamples_temp --genomicsdb-shared-posixfs-optimizations true
11:40:18.254 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 18, 2021 11:40:18 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:40:18.422 INFO GenomicsDBImport - ------------------------------------------------------------
11:40:18.423 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
11:40:18.423 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
11:40:18.423 INFO GenomicsDBImport - Executing as vkumar@sumi008 on Linux v4.20.4-200.fc29.x86_64 amd64
11:40:18.423 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-b13
11:40:18.423 INFO GenomicsDBImport - Start Date/Time: January 18, 2021 11:40:18 AM CET
11:40:18.423 INFO GenomicsDBImport - ------------------------------------------------------------
11:40:18.423 INFO GenomicsDBImport - ------------------------------------------------------------
11:40:18.424 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
11:40:18.424 INFO GenomicsDBImport - Picard Version: 2.23.3
11:40:18.424 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:40:18.424 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:40:18.424 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:40:18.424 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:40:18.424 INFO GenomicsDBImport - Deflater: IntelDeflater
11:40:18.424 INFO GenomicsDBImport - Inflater: IntelInflater
11:40:18.424 INFO GenomicsDBImport - GCS max retries/reopens: 20
11:40:18.424 INFO GenomicsDBImport - Requester pays: disabled
11:40:18.424 INFO GenomicsDBImport - Initializing engine
11:40:19.319 INFO FeatureManager - Using codec IntervalListCodec to read file file:///prj/pflaphy-robot/Felix.interval_list
11:40:19.415 INFO IntervalArgumentCollection - Processing 195574599 bp from intervals
11:40:19.449 INFO GenomicsDBImport - Done initializing engine
11:40:19.782 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
11:40:19.787 INFO GenomicsDBImport - Vid Map JSON file will be written to /prj/pflaphy-robot/genomicsDB/allsamples1/vidmap.json
11:40:19.787 INFO GenomicsDBImport - Callset Map JSON file will be written to /prj/pflaphy-robot/genomicsDB/allsamples1/callset.json
11:40:19.787 INFO GenomicsDBImport - Complete VCF Header will be written to /prj/pflaphy-robot/genomicsDB/allsamples1/vcfheader.vcf
11:40:19.787 INFO GenomicsDBImport - Importing to workspace - /prj/pflaphy-robot/genomicsDB/allsamples1
11:40:19.787 INFO ProgressMeter - Starting traversal
11:40:19.788 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
11:40:29.785 INFO GenomicsDBImport - Importing batch 1 with 50 samples
11:40:29.785 INFO GenomicsDBImport - Importing batch 1 with 50 samples
11:40:29.785 INFO GenomicsDBImport - Importing batch 1 with 50 samples
11:40:29.785 INFO GenomicsDBImport - Importing batch 1 with 50 samples
11:40:29.785 INFO GenomicsDBImport - Importing batch 1 with 50 samples
11:40:29.785 INFO GenomicsDBImport - Importing batch 1 with 50 samples
11:40:29.785 INFO GenomicsDBImport - Importing batch 1 with 50 samples
11:40:29.785 INFO GenomicsDBImport - Importing batch 1 with 50 samples
12:52:09.956 INFO GenomicsDBImport - Importing batch 1 with 50 samples
12:52:15.267 INFO GenomicsDBImport - Importing batch 1 with 50 samples
12:52:19.722 INFO GenomicsDBImport - Importing batch 1 with 50 samples
.
.
.
.
17:55:44.306 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:55:48.103 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:55:55.777 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:01.707 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:05.540 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:08.106 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:11.068 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:14.010 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:16.030 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:21.266 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:23.300 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:24.399 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:26.681 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:30.449 INFO GenomicsDBImport - Importing batch 14 with 50 samples
17:56:31.552 INFO GenomicsDBImport - Importing batch 14 with 50 samples
19:16:12.025 INFO GenomicsDBImport - Shutting down engine
[January 19, 2021 7:16:12 PM CET] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 1,895.90 minutes.
Runtime.totalMemory()=8940158976
htsjdk.samtools.SAMFormatException: Did not inflate expected amount
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:147)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:241)
at htsjdk.tribble.readers.TabixReader.readLine(TabixReader.java:215)
at htsjdk.tribble.readers.TabixReader.access$300(TabixReader.java:48)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:434)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:851)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:842)
at org.genomicsdb.importer.GenomicsDBImporterStreamWrapper.next(GenomicsDBImporterStreamWrapper.java:110)
at org.genomicsdb.importer.GenomicsDBImporter.doSingleImport(GenomicsDBImporter.java:580)
at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:703)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
(END)
Why is it not importing all the batches? Space and memory are not issues here.
Thanks,
-
Hi Vinod Kumar, is there any chance you ran out of space in the temp directory while this was running?
-
Hi Genevieve Brandt (she/her),
Actually I am working on server so overall space is not a problem. When I ran the script I had around 2TB space remaining. But still I don't know if temp directory is consuming a lot of space? Do you think this much space is sufficient for 850 sample (genome size 250MB)?
I also ran the same script 2 times and everytime, it is just importing only 14 out of 17 batches, mean stopped at same point in both runs.
Can I solve this issue by including less number of parallel intervals (--max-num-intervals-to-import-in-parallel 2) instead of 8 in my script?
Is it necessary to mention this option (temp directory) in the script?
Thanks,
-
Hi Vinod,
Thanks for that information. See if decreasing the --max-num-intervals-to-import-in-parallel works. You may have some sort of maximum number of files open at once on your server that could be interfering with GenomicsDBImport.
Let me know if that works and if not we can look into other options.
Yes, the temp directory should be specified as an option in the GATK command line for optimization.
Genevieve
-
Hi Genevieve Brandt (she/her),
I repeated my analysis with --max-num-intervals-to-import-in-parallel 2 instead of 8 and the analysis stopped at same point, it only imported only 14 out of 17 batches like my previous analyses with 8 parallel intervals. This is now really frustrating, as it is taking a lot of time and finally I am not getting the final data store.
I was also checking the temp space using this:
df -h /tmp
Filesystem Size Used Avail Use% Mounted on
tmpfs 126G 3.6M 126G 1% /tmpI don't know what to do. I tried many things and could not find a way to solve the issue.
How can I see the list of the samples which are actually imported in the GenomicsDB so I can look at where it is actually stopping every time. I looked into callset.json which contains all the samples provided in the sample_map file.
I also posted the err log file in my first post, could you please see what are those lines at the end of the files. It looks like there are some errors but I am not understanding them correctly.
Do you need something from me to see what's going on here?
Thanks,
-
Hi Vinod Kumar,
I am sorry you are frustrated, thank you for providing me with all of this information. I am looking into this on my end as quickly as possible so that we can find a solution.
I have been using the error log from your first post to try to figure out the problem. What is going on here is that the process BlockGunzipper is not able to fully extract the contents of one of your compressed files, which is why I initially suspected an issue with the temp directory space.
htsjdk.samtools.SAMFormatException: Did not inflate expected amount
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:147)I am going to reach out to my colleagues to determine if there is a way for us to find which specific file is causing this issue. There could also be an issue where one of your files is malformed. I will get more information and get back to you. Thank you for your patience.
Genevieve
-
Hi Vinod Kumar - I heard back from my team. We think this looks like a GKL error. Could you try running your command with --use-jdk-inflater. If that doesn't work, try also --use-jdk-deflater.
Let me know if this is successful.
Another note to keep in mind with GenomicsDBImport, which I haven't mentioned yet, is to make sure that you set the Java Xmx/Xms values to no more than 80% or 90% of the available physical memory to leave room for the C/C++ libraries. I don't think this is a problem in your case but I wanted to mention it in case you had not considered that.
-
Hi Genevieve Brandt (she/her),
Thank you very much for the responses. It looks our server is overloaded now, once I'll have the results, I'll come back to you.
Thanks,
-
Hi Genevieve Brandt (she/her),
I can't import all the chromosomes in one go so, I divided my entire analysis chromosome-wise and ran separate scripts for different chromosomes.
Now, I am almost finished the analysis by using -L $one chromosome at a time. But it was a hit and trial for me, some chromosomes worked without --use-jdk-inflater or deflater, some with --use-jdk-inflater and some with --use-jdk-deflater. But mostly analysis was stopping after 14 or 15 batches out of 17 batches (50 samples per batch). I've also compared the size of the chromosomes and I couldn't correlated them with failed and passed (imported all the batches) analyses.
This is really too much work and I was in suspicion if this time it is going to work or not.
From this separate analysis that there are two kinds of errors are produced from failed scripts:
First error: I couldn't solve this error even after trying all three options. It is coming just for one chromosome.
Using GATK jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50g -jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /prj/pflaphy-robot/genomicsDB/allsamples_chr1 --batch-size 50 -L /prj/pflaphy-robot/Felix_chr1.interval_list --use-jdk-inflater --sample-name-map /prj/pflaphy-robot/genoDB_allplates.sample_map --tmp-dir /prj/pflaphy-robot/genomicsDB/allsamples_temp --genomicsdb-shared-posixfs-optimizations true
08:56:13.576 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 08, 2021 8:56:13 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:56:13.868 INFO GenomicsDBImport - ------------------------------------------------------------
08:56:13.869 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
08:56:13.869 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
08:56:13.869 INFO GenomicsDBImport - Executing as vkumar@suc01001 on Linux v5.4.0-47-generic amd64
08:56:13.869 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v11.0.8+10-post-Ubuntu-0ubuntu120.04
08:56:13.869 INFO GenomicsDBImport - Start Date/Time: February 8, 2021 at 8:56:13 AM CET
08:56:13.870 INFO GenomicsDBImport - ------------------------------------------------------------
08:56:13.870 INFO GenomicsDBImport - ------------------------------------------------------------
08:56:13.871 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
08:56:13.871 INFO GenomicsDBImport - Picard Version: 2.23.3
08:56:13.871 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:56:13.871 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:56:13.871 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:56:13.871 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:56:13.872 INFO GenomicsDBImport - Deflater: IntelDeflater
08:56:13.872 INFO GenomicsDBImport - Inflater: JdkInflater
08:56:13.873 INFO GenomicsDBImport - GCS max retries/reopens: 20
08:56:13.873 INFO GenomicsDBImport - Requester pays: disabled
08:56:13.873 INFO GenomicsDBImport - Initializing engine
08:56:14.356 INFO FeatureManager - Using codec IntervalListCodec to read file file:///prj/pflaphy-robot/Felix_chr1.interval_list
08:56:14.395 INFO IntervalArgumentCollection - Processing 27806075 bp from intervals
08:56:14.398 INFO GenomicsDBImport - Done initializing engine
08:56:15.319 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
08:56:15.334 INFO GenomicsDBImport - Vid Map JSON file will be written to /prj/pflaphy-robot/genomicsDB/allsamples_chr1/vidmap.json
08:56:15.334 INFO GenomicsDBImport - Callset Map JSON file will be written to /prj/pflaphy-robot/genomicsDB/allsamples_chr1/callset.json
08:56:15.335 INFO GenomicsDBImport - Complete VCF Header will be written to /prj/pflaphy-robot/genomicsDB/allsamples_chr1/vcfheader.vcf
08:56:15.335 INFO GenomicsDBImport - Importing to workspace - /prj/pflaphy-robot/genomicsDB/allsamples_chr1
08:56:15.335 INFO ProgressMeter - Starting traversal
08:56:15.336 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
08:56:18.769 INFO GenomicsDBImport - Importing batch 1 with 50 samples
12:27:21.492 INFO ProgressMeter - chr1:1 211.1 1 0.0
12:27:21.493 INFO GenomicsDBImport - Done importing batch 1/17
12:27:29.438 INFO GenomicsDBImport - Importing batch 2 with 50 samples
15:59:54.324 INFO ProgressMeter - chr1:1 423.6 2 0.0
15:59:54.325 INFO GenomicsDBImport - Done importing batch 2/17
16:00:16.191 INFO GenomicsDBImport - Importing batch 3 with 50 samples
03:07:15.928 INFO ProgressMeter - chr1:1 1091.0 3 0.0
03:07:15.930 INFO GenomicsDBImport - Done importing batch 3/17
03:07:26.651 INFO GenomicsDBImport - Importing batch 4 with 50 samples
06:38:30.390 INFO ProgressMeter - chr1:1 1302.3 4 0.0
06:38:30.391 INFO GenomicsDBImport - Done importing batch 4/17
06:38:41.336 INFO GenomicsDBImport - Importing batch 5 with 50 samples
10:16:39.429 INFO ProgressMeter - chr1:1 1520.4 5 0.0
10:16:39.430 INFO GenomicsDBImport - Done importing batch 5/17
10:16:47.547 INFO GenomicsDBImport - Importing batch 6 with 50 samples
13:49:39.219 INFO ProgressMeter - chr1:1 1733.4 6 0.0
13:49:39.220 INFO GenomicsDBImport - Done importing batch 6/17
13:49:49.366 INFO GenomicsDBImport - Importing batch 7 with 50 samples
17:29:35.040 INFO ProgressMeter - chr1:1 1953.3 7 0.0
17:29:35.041 INFO GenomicsDBImport - Done importing batch 7/17
17:29:45.156 INFO GenomicsDBImport - Importing batch 8 with 50 samples
20:59:14.608 INFO ProgressMeter - chr1:1 2163.0 8 0.0
20:59:14.609 INFO GenomicsDBImport - Done importing batch 8/17
20:59:24.900 INFO GenomicsDBImport - Importing batch 9 with 50 samples
00:31:45.349 INFO ProgressMeter - chr1:1 2375.5 9 0.0
00:31:45.351 INFO GenomicsDBImport - Done importing batch 9/17
00:31:47.132 INFO GenomicsDBImport - Importing batch 10 with 50 samples
04:04:47.652 INFO ProgressMeter - chr1:1 2588.5 10 0.0
04:04:47.653 INFO GenomicsDBImport - Done importing batch 10/17
04:04:57.703 INFO GenomicsDBImport - Importing batch 11 with 50 samples
07:32:53.234 INFO ProgressMeter - chr1:1 2796.6 11 0.0
07:32:53.235 INFO GenomicsDBImport - Done importing batch 11/17
07:33:03.058 INFO GenomicsDBImport - Importing batch 12 with 50 samples
11:12:01.061 INFO ProgressMeter - chr1:1 3015.8 12 0.0
11:12:01.062 INFO GenomicsDBImport - Done importing batch 12/17
11:12:12.095 INFO GenomicsDBImport - Importing batch 13 with 50 samples
14:40:58.513 INFO ProgressMeter - chr1:1 3224.7 13 0.0
14:40:58.514 INFO GenomicsDBImport - Done importing batch 13/17
14:41:09.444 INFO GenomicsDBImport - Importing batch 14 with 50 samples
18:26:09.530 INFO ProgressMeter - chr1:1 3449.9 14 0.0
18:26:09.531 INFO GenomicsDBImport - Done importing batch 14/17
18:26:20.343 INFO GenomicsDBImport - Importing batch 15 with 50 samples
22:23:19.723 INFO ProgressMeter - chr1:1 3687.1 15 0.0
22:23:19.724 INFO GenomicsDBImport - Done importing batch 15/17
22:23:29.623 INFO GenomicsDBImport - Importing batch 16 with 50 samples
01:22:19.590 INFO GenomicsDBImport - Shutting down engine
[February 11, 2021 at 1:22:19 AM CET] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 3,866.10 minutes.
Runtime.totalMemory()=4873781248
htsjdk.samtools.util.RuntimeIOException: java.util.zip.DataFormatException: invalid stored block lengths
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:161)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:241)
at htsjdk.tribble.readers.TabixReader.readLine(TabixReader.java:215)
at htsjdk.tribble.readers.TabixReader.access$300(TabixReader.java:48)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:434)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:851)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:842)
at org.genomicsdb.importer.GenomicsDBImporterStreamWrapper.next(GenomicsDBImporterStreamWrapper.java:110)
at org.genomicsdb.importer.GenomicsDBImporter.doSingleImport(GenomicsDBImporter.java:580)
at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:703)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.util.zip.DataFormatException: invalid stored block lengths
at java.base/java.util.zip.Inflater.inflateBytesBytes(Native Method)
at java.base/java.util.zip.Inflater.inflate(Inflater.java:385)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:145)
... 23 moreSecond Error: I could solve this error by using inflater and deflater options suggested by you.
...Just end of the error log
01:18:32.301 INFO GenomicsDBImport - Done importing batch 13/17
01:18:37.772 INFO GenomicsDBImport - Importing batch 14 with 50 samples
01:24:07.663 INFO GenomicsDBImport - Shutting down engine
[February 11, 2021 at 1:24:07 AM CET] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 752.13 minutes.
Runtime.totalMemory()=3271557120
htsjdk.samtools.SAMFormatException: Did not inflate expected amount
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:147)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:241)
at htsjdk.tribble.readers.TabixReader.readLine(TabixReader.java:215)
at htsjdk.tribble.readers.TabixReader.access$300(TabixReader.java:48)
at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:434)
at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:851)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:842)
at org.genomicsdb.importer.GenomicsDBImporterStreamWrapper.next(GenomicsDBImporterStreamWrapper.java:110)
at org.genomicsdb.importer.GenomicsDBImporter.doSingleImport(GenomicsDBImporter.java:580)
at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:703)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)Do you have a clue if we can avoid this problem in future?
Thanks,
-
Thanks Vinod Kumar for your updates. I am so sorry this issue has been causing so much frustration. I will get you more information as soon as possible about 1) why this issue came up so that you can avoid it in the future and 2) how to solve the first error.
Let me know if you have any other questions I can address.
-
Hi Vinod Kumar,
For your first error (the one that is not solved), there is a possibility that the chromosome that failed has a problem with its VCFs. Could you run ValidateVariants just to determine adherence to the VCF format with gatk ValidateVariants -V cohort.vcf.gz? Let us know what you find from this and we will have more information about the issue.
For the second error (solved with --use-jdk-inflater or --use-jdk-deflater), this could be a bug in GKL. We don't maintain GKL but we can reach out to the people who do. Would you be able to submit a bug report with all the files necessary for one of your chromosomes that has this problem? Please let me know the file folder name once you have uploaded.
Thank you for your patience as we look into this.
Best,
Genevieve
-
I encountered the same error when using HaplotypeCaller. As Pamela Bretscher suggested I tried using those two options but it seemed that it didn't work in my case. Do you have any other solutions for this bug?
Thank you very much in advance!
Best regards!
WANG
-
Hi Jacob Wang,
I am working with Pamela on your other post, so I don't have any alternative suggestions at this time. Please let us know if you have other questions you want us to take a look at as well.
Best,
Genevieve
-
Thank you all the same. I hope it could be solved in the future version.
-
Hi, Genevieve Brandt (she/her)
I think I and Pamela Bretscher have solved the problem. I re-run both the steps of BQSR and HaplotypeCaller with the --use-jdk-inflater or --use-jdk-deflater options, and got gvcf files in correct size with all the chromosomes called. (Only re-run HaplotypeCaller with the options did not work.)
It seems that with default setting of Intel inflater/deflater, BQSR could generate an complete bam file, but some blocks may actually have compression errors. Therefore when HaplotypeCaller tries to read those blocks, the program terminated.
I think some other people have also encountered the same problem without an solution. Shall we report this bug somewhere?
Best regards!
WANG
-
I'll also post the issue ticket here for reference Jacob Wang https://github.com/broadinstitute/gatk/issues/7582
-
Thank you for the follow up and reporting this bug Jacob Wang!
-
Hi, Genevieve Brandt (she/her)
It's my pleasure to work with the GATK team and be able to contribute. I just ran the following steps (from merging gvcfs to annotation) using those gvcfs in the past weekend. No problem happened. I think for now I can say the problem has been fixed. I will post my solution in the git page for others to reference.
-
Great news! I'm glad you were able to find the solution!
Please sign in to leave a comment.
18 comments