GenomicsDBImport - A USER ERROR has occurred: Failed to create reader from file: 2.0
I am receiving the following error when trying to run GenomicsDBImport:
Using GATK jar /local/cluster/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx26g -Xms26g -jar /local/cluster/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar GenomicsDBImport --genomicsdb-workspace-path pre_calibration/cohort_database -L file_paths/interval_list.txt --batch-size 50 --sample-name-map file_paths/map_pre_cal.txt --reader-threads 5
13:13:57.383 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/local/cluster/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 07, 2021 1:13:57 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:13:57.713 INFO GenomicsDBImport - ------------------------------------------------------------
13:13:57.714 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.0.0
13:13:57.714 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
13:13:57.716 INFO GenomicsDBImport - Executing as johnjare@chrom18.cgrb.oregonstate.local on Linux v3.10.0-957.12.2.el7.x86_64 amd64
13:13:57.716 INFO GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_71-b15
13:13:57.717 INFO GenomicsDBImport - Start Date/Time: May 7, 2021 1:13:57 PM PDT
13:13:57.717 INFO GenomicsDBImport - ------------------------------------------------------------
13:13:57.717 INFO GenomicsDBImport - ------------------------------------------------------------
13:13:57.718 INFO GenomicsDBImport - HTSJDK Version: 2.24.0
13:13:57.718 INFO GenomicsDBImport - Picard Version: 2.25.0
13:13:57.718 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
13:13:57.718 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:13:57.718 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:13:57.718 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:13:57.719 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:13:57.719 INFO GenomicsDBImport - Deflater: IntelDeflater
13:13:57.719 INFO GenomicsDBImport - Inflater: IntelInflater
13:13:57.719 INFO GenomicsDBImport - GCS max retries/reopens: 20
13:13:57.719 INFO GenomicsDBImport - Requester pays: disabled
13:13:57.719 INFO GenomicsDBImport - Initializing engine
13:13:57.865 INFO GenomicsDBImport - Shutting down engine
[May 7, 2021 1:13:57 PM PDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=26754416640
***********************************************************************
A USER ERROR has occurred: Failed to create reader from file:///variant_calls/pre_calibration/per_isolate/E19.vcf.gz
***********************************************************************
Per the recommendations in other posts about this issue I have:
- Double checked that there was a .vcf.gz.tbi index file corresponding to each .vcf.gz file and that both the index file and VCF file were compressed.
- Ran ValidateVariants on the VCF file (see results below)
- Recompressed and reindex the VCF file using bgzip and bcftools index -t.
Here is the output from ValidateVariants:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /local/cluster/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar ValidateVariants -R reference/streptococcus_macedonicus_pangenome.fa -V pre_calibration/per_isolate/E19.vcf.gz --validation-type-to-exclude ALL
13:07:57.420 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/local/cluster/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 07, 2021 1:07:58 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:07:58.393 INFO ValidateVariants - ------------------------------------------------------------
13:07:58.394 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.0.0
13:07:58.394 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
13:07:58.395 INFO ValidateVariants - Executing as johnjare@chrom16.cgrb.oregonstate.local on Linux v3.10.0-1160.15.2.el7.x86_64 amd64
13:07:58.396 INFO ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_71-b15
13:07:58.396 INFO ValidateVariants - Start Date/Time: May 7, 2021 1:07:57 PM PDT
13:07:58.396 INFO ValidateVariants - ------------------------------------------------------------
13:07:58.396 INFO ValidateVariants - ------------------------------------------------------------
13:07:58.397 INFO ValidateVariants - HTSJDK Version: 2.24.0
13:07:58.397 INFO ValidateVariants - Picard Version: 2.25.0
13:07:58.397 INFO ValidateVariants - Built for Spark Version: 2.4.5
13:07:58.397 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:07:58.397 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:07:58.397 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:07:58.397 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:07:58.398 INFO ValidateVariants - Deflater: IntelDeflater
13:07:58.398 INFO ValidateVariants - Inflater: IntelInflater
13:07:58.398 INFO ValidateVariants - GCS max retries/reopens: 20
13:07:58.398 INFO ValidateVariants - Requester pays: disabled
13:07:58.398 INFO ValidateVariants - Initializing engine
13:08:05.493 INFO FeatureManager - Using codec VCFCodec to read file file://pre_calibration/per_isolate/E19.vcf.gz
13:08:06.270 INFO ValidateVariants - Done initializing engine
13:08:06.270 INFO ProgressMeter - Starting traversal
13:08:06.271 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
13:08:07.145 INFO ProgressMeter - T296_01859:358 0.0 28595 1969804.8
13:08:07.146 INFO ProgressMeter - Traversal complete. Processed 28595 total variants in 0.0 minutes.
13:08:07.146 INFO ValidateVariants - Shutting down engine
[May 7, 2021 1:08:07 PM PDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.17 minutes.
My interpretation is that my VCF file is fine. Is this correct?
Nothing I tried above has solved the problem and I appear to be having issues with more than just one VCF file (but not all). This is not my first time running this pipeline; however, I used a different reference when mapping my sequences the first time. This issue is new and did not occur during the previous runs.
-
Hi Jared Johnson,
What was the output VCF from HaplotypeCaller for these files? Could you share the command you used for E19.vcf.gz?
Best,
Genevieve
Please sign in to leave a comment.
1 comment