genomicsDBImport .A USER ERROR has occurred: Failed to create reader from file
AnsweredGood afternoon.
I am trying to import the below g.vcf's into a genomics DB before joint genotyping. I keep getting the following error and I would appreciate any help in solving it. I am new to GATK, the g.vcf files were created with HaplotypeCaller.
Many thanks for your time,
Jen
a) GATK version used :
GATK/4.0.10.1
b) Exact GATK commands used
gatk GenomicsDBImport \
--variant 3_335063_QUEEN.g.vcf\
--variant 4_335063_GRANDMOTHER.g.vcf\
--variant 5_335063_GREATGRANDMOTHER.g.vcf\
--variant BSH__739.g.vcf\
--variant 742_SISTER.g.vcf\
--variant 1_335063.g.vcf\
--genomicsdb-workspace-path Luna_database/ \
--intervals chr1,chr2
c) The entire error log if applicable.
Using GATK jar /gpfs/igmmfs01/software/pkg/el7/apps/GATK/4.0.10.1/gatk-package-4.0.10.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gpfs/igmmfs01/software/pkg/el7/apps/GATK/4.0.10.1/gatk-package-4.0.10.1-local.jar GenomicsDBImport --variant 3_335063_QUEEN.g.vcf --variant 4_335063_GRANDMOTHER.g.vcf --variant 5_335063_GREATGRANDMOTHER.g.vcf --variant BSH__739.g.vcf --variant 742_SISTER.g.vcf --variant 1_335063.g.vcf --genomicsdb-workspace-path Luna_database/ --intervals chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9.chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19
16:24:50.503 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/igmmfs01/software/pkg/el7/apps/GATK/4.0.10.1/gatk-package-4.0.10.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:24:52.197 INFO GenomicsDBImport - ------------------------------------------------------------
16:24:52.197 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.10.1
16:24:52.197 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
16:24:52.198 INFO GenomicsDBImport - Executing as s0782801@node3b02.ecdf.ed.ac.uk on Linux v3.10.0-327.36.3.el7.x86_64 amd64
16:24:52.198 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_171-b10
16:24:52.198 INFO GenomicsDBImport - Start Date/Time: 23 January 2020 16:24:50 GMT
16:24:52.199 INFO GenomicsDBImport - ------------------------------------------------------------
16:24:52.199 INFO GenomicsDBImport - ------------------------------------------------------------
16:24:52.200 INFO GenomicsDBImport - HTSJDK Version: 2.16.1
16:24:52.200 INFO GenomicsDBImport - Picard Version: 2.18.13
16:24:52.200 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:24:52.200 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:24:52.200 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:24:52.200 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:24:52.200 INFO GenomicsDBImport - Deflater: IntelDeflater
16:24:52.200 INFO GenomicsDBImport - Inflater: IntelInflater
16:24:52.200 INFO GenomicsDBImport - GCS max retries/reopens: 20
16:24:52.201 INFO GenomicsDBImport - Requester pays: disabled
16:24:52.201 INFO GenomicsDBImport - Initializing engine
16:24:52.339 INFO GenomicsDBImport - Shutting down engine
[23 January 2020 16:24:52 GMT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1194328064
***********************************************************************
A USER ERROR has occurred: Failed to create reader from file:///exports/cmvm/eddie/eb/groups/schoenebeck_group/JENNI/gVCF_Luna/3_335063_QUEEN.g.vcf
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
-
It is possible that your g.vcfs have missing index files. Can you please check for that.
Tip: When ever you get a "Failed to create reader" error, you should test your g.vcfs with
SelectVariants
. -
Hi Bhanu Gandham ,
Thank you for your reply.
The g.vcf files have their index files in the same directory. For example "3_335063_QUEEN.g.vcf.gz.tbi". Is it ok for them to be in this format or is this the problem?
When I ran select Variants on just one of the g.vcf files it did run but I got the following log, I am confused to why it says processed 0 total variants.
gatk SelectVariants -R Felis_catus.Felis_catus_9.0.dna.toplevel.fa -V 1_335063.g.vcf --select-type-to-include SNP -O output.vcf
Using GATK jar /gpfs/igmmfs01/software/pkg/el7/apps/GATK/4.0.10.1/gatk-package-4.0.10.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gpfs/igmmfs01/software/pkg/el7/apps/GATK/4.0.10.1/gatk-package-4.0.10.1-local.jar SelectVariants -R Felis_catus.Felis_catus_9.0.dna.toplevel.fa -V 1_335063.g.vcf --select-type-to-include SNP -O output.vcf
11:28:49.894 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/igmmfs01/software/pkg/el7/apps/GATK/4.0.10.1/gatk-package-4.0.10.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
11:28:51.564 INFO SelectVariants - ------------------------------------------------------------
11:28:51.564 INFO SelectVariants - The Genome Analysis Toolkit (GATK) v4.0.10.1
11:28:51.564 INFO SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
11:28:51.565 INFO SelectVariants - Executing as s0782801@node3c02.ecdf.ed.ac.uk on Linux v3.10.0-327.36.3.el7.x86_64 amd64
11:28:51.565 INFO SelectVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_74-b02
11:28:51.565 INFO SelectVariants - Start Date/Time: 29 January 2020 11:28:49 GMT
11:28:51.565 INFO SelectVariants - ------------------------------------------------------------
11:28:51.565 INFO SelectVariants - ------------------------------------------------------------
11:28:51.566 INFO SelectVariants - HTSJDK Version: 2.16.1
11:28:51.566 INFO SelectVariants - Picard Version: 2.18.13
11:28:51.566 INFO SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:28:51.566 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:28:51.566 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:28:51.566 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:28:51.566 INFO SelectVariants - Deflater: IntelDeflater
11:28:51.567 INFO SelectVariants - Inflater: IntelInflater
11:28:51.567 INFO SelectVariants - GCS max retries/reopens: 20
11:28:51.567 INFO SelectVariants - Requester pays: disabled
11:28:51.567 INFO SelectVariants - Initializing engine
11:28:52.059 INFO FeatureManager - Using codec VCFCodec to read file file:///exports/cmvm/eddie/eb/groups/schoenebeck_group/JENNI/gVCF_Luna/1_335063.g.vcf
11:28:52.152 INFO SelectVariants - Done initializing engine
11:28:52.526 INFO ProgressMeter - Starting traversal
11:28:52.526 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:01:42.043 INFO SelectVariants - 777401876 variant(s) filtered by: (AllowAllVariantsVariantFilter AND VariantTypesVariantFilter)
777401876 variant(s) filtered by: VariantTypesVariantFilter
12:01:42.044 INFO ProgressMeter - unmapped 32.8 0 0.0
12:01:42.045 INFO ProgressMeter - Traversal complete. Processed 0 total variants in 32.8 minutes.
12:01:42.059 INFO SelectVariants - Shutting down engine
[29 January 2020 12:01:42 GMT] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 32.87 minutes.
Any advice as to why I am still unable to run genomicsDBimport and to why Select variants processed 0 total variants appreciated.
Many thanks,
Jen
-
The issue is that the input you are providing to GenomicsDBImport is a uncompressed g.vcf file. The indexed filed you shared with me is for a compressed g.vcf i.e. "3_335063_QUEEN.g.vcf.gz.tbi".
HaplotypeCaller
outputs an uncompressed VCF, and produces the index file itself, giving you a.g.vcf
and a.g.vcf.idx
file. Try to use those input vcfs and vcf.idx files and see if that resolves the issue.Note: When you compress a normal VCF file to
.vcf.gz
and use tabix to index it gives you a.vcf.tbi
.My advice is to not compress the GVCF, and use the
HaplotypeCaller
produced vcf andidx
files to see if it resolves the issue. -
Hi Bhanu, Thank you for your reply.
However when i run Haplotype caller, I am getting a .g.vcf.gz as output as the example script details this:
example script:
gatk --java-options "-Xmx4g" HaplotypeCaller \ -R Homo_sapiens_assembly38.fasta \ -I input.bam \ -O output.g.vcf.gz \ -ERC GVCF
If i want to get an uncompressed g.vcf, do I just change the output to .g.vcf? would this automatically also change the index to uncompressed? (I thought it safer to ask than risk running it again)
Many thanks, Jenni
-
- If you are getting a .g.vcf.gz as output, then please use the corresponding index file also generated by HaplotypeCaller. The point is that they should either both be compressed or both uncompressed. From the information you provided above, looks like the vcf is named `3_335063_QUEEN.g.vcf` and the index file `3_335063_QUEEN.g.vcf.gz.tbi` hence the inconsistency.
- Please upgrade to the latest GATKv4.1.5.0 and try again.
-
Hi there,
I have the same error message "A USER ERROR has occurred: Failed to create reader from file:...". Both of the g.vcf files and the g.vcf.idx files are in the same directory. I also ran the ValidateVariants and here is the output:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/gatk/4.1.7.0/gatk-package-4.1.7.0-local.jar ValidateVariants -V Pre-transplant_recipient-2_S7_TEST.g.vcf
08:02:11.336 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/gatk/4.1.7.0/gatk-package-4.1.7.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 06, 2021 8:02:11 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:02:11.503 INFO ValidateVariants - ------------------------------------------------------------
08:02:11.503 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.7.0
08:02:11.503 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
08:02:11.504 INFO ValidateVariants - Executing as dxy257@hpc4 on Linux v3.10.0-1160.36.2.el7.x86_64 amd64
08:02:11.504 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_292-b10
08:02:11.504 INFO ValidateVariants - Start Date/Time: August 6, 2021 8:02:11 AM EDT
08:02:11.504 INFO ValidateVariants - ------------------------------------------------------------
08:02:11.504 INFO ValidateVariants - ------------------------------------------------------------
08:02:11.505 INFO ValidateVariants - HTSJDK Version: 2.21.2
08:02:11.505 INFO ValidateVariants - Picard Version: 2.21.9
08:02:11.505 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:02:11.505 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:02:11.505 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:02:11.505 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:02:11.505 INFO ValidateVariants - Deflater: IntelDeflater
08:02:11.505 INFO ValidateVariants - Inflater: IntelInflater
08:02:11.505 INFO ValidateVariants - GCS max retries/reopens: 20
08:02:11.505 INFO ValidateVariants - Requester pays: disabled
08:02:11.506 INFO ValidateVariants - Initializing engine
08:02:11.852 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/rstor/SOM_EPBI_XXZ10/dxy257/rotation1/data/blood_trans/bwa_mem_alignment_result/Pre-transplant_recipient-2_S7_TEST.g.vcf
08:02:11.900 INFO ValidateVariants - Done initializing engine
08:02:11.900 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
08:02:11.900 WARN ValidateVariants - Other possible validations will still be performed
08:02:11.900 WARN ValidateVariants - REF validation cannot be done because no reference file was provided
08:02:11.900 WARN ValidateVariants - Other possible validations will still be performed
08:02:11.900 INFO ProgressMeter - Starting traversal
08:02:11.900 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
08:02:12.170 INFO ProgressMeter - GL000219.1:150334 0.0 12270 2747014.9
08:02:12.170 INFO ProgressMeter - Traversal complete. Processed 12270 total variants in 0.0 minutes.
08:02:12.170 INFO ValidateVariants - Shutting down engine
[August 6, 2021 8:02:12 AM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1707081728"Could you please let me know what is the problem? Thank you!
Diya
-
Hi Diya,
Please post the exact command you are using, the version of the tool and the entire error log.
-
Hi,
Has Jennifer's issue been resolved? I'm also getting a " .A USER ERROR has occurred: Failed to create reader from file" error for a compressed vcf.gz which has its vcf.gz.tbi index. I ran Select Variants on this gvcf and, similarly to Jennifer, I get that 0 variants are processed in 26 minutes. Both the compressed gvcf and its index are direct outputs form HaplotypeCaller. Do you have any advice on how to resolve this (all other gvcfs work in GenomicsDB) ? Thanks, Nisha
-
Nisha Dwivedi It's hard to say what's going on without further information. If it's possible could you set the following environment variable and then rerun in order to generate a complete stacktrace?
GATK_STACKTRACE_ON_USER_EXCEPTION=true
It's also necessary to know your GATK version and what type of machine you're running on.
-
Hello, I'm having the same error. I'm including the g.vcf and .idx files generated by HaplotypeCaller, however, the error persists. What can I do to solve this?
-
Hi Keity Farfán
Can you run IndexFeatureFile tool on the g.vcf file to recreate the index. It is possible that the index file is corrupt therefore it cannot be read.
Please sign in to leave a comment.
11 comments