Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

genomicsDBImport .A USER ERROR has occurred: Failed to create reader from file

Answered
0

11 comments

  • Avatar
    Bhanu Gandham

    Hi Jennifer Irving-McGrath

    It is possible that your g.vcfs have missing index files. Can you please check for that.

    Tip: When ever you get a "Failed to create reader" error, you should test your g.vcfs with SelectVariants.

    0
    Comment actions Permalink
  • Avatar
    Jennifer Irving-McGrath

    Hi Bhanu Gandham ,

    Thank you for your reply.

    The g.vcf files have their index files in the same directory. For example "3_335063_QUEEN.g.vcf.gz.tbi". Is it ok for them to be in this format or is this the problem?

     

    When I ran select Variants on just one of the g.vcf files it did run but I got the following log, I am confused to why it says processed 0 total variants.

     

     gatk SelectVariants      -R Felis_catus.Felis_catus_9.0.dna.toplevel.fa     -V 1_335063.g.vcf      --select-type-to-include SNP -O output.vcf

    Using GATK jar /gpfs/igmmfs01/software/pkg/el7/apps/GATK/4.0.10.1/gatk-package-4.0.10.1-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gpfs/igmmfs01/software/pkg/el7/apps/GATK/4.0.10.1/gatk-package-4.0.10.1-local.jar SelectVariants -R Felis_catus.Felis_catus_9.0.dna.toplevel.fa -V 1_335063.g.vcf --select-type-to-include SNP -O output.vcf

    11:28:49.894 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/igmmfs01/software/pkg/el7/apps/GATK/4.0.10.1/gatk-package-4.0.10.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

    11:28:51.564 INFO  SelectVariants - ------------------------------------------------------------

    11:28:51.564 INFO  SelectVariants - The Genome Analysis Toolkit (GATK) v4.0.10.1

    11:28:51.564 INFO  SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/

    11:28:51.565 INFO  SelectVariants - Executing as s0782801@node3c02.ecdf.ed.ac.uk on Linux v3.10.0-327.36.3.el7.x86_64 amd64

    11:28:51.565 INFO  SelectVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_74-b02

    11:28:51.565 INFO  SelectVariants - Start Date/Time: 29 January 2020 11:28:49 GMT

    11:28:51.565 INFO  SelectVariants - ------------------------------------------------------------

    11:28:51.565 INFO  SelectVariants - ------------------------------------------------------------

    11:28:51.566 INFO  SelectVariants - HTSJDK Version: 2.16.1

    11:28:51.566 INFO  SelectVariants - Picard Version: 2.18.13

    11:28:51.566 INFO  SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    11:28:51.566 INFO  SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    11:28:51.566 INFO  SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    11:28:51.566 INFO  SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    11:28:51.566 INFO  SelectVariants - Deflater: IntelDeflater

    11:28:51.567 INFO  SelectVariants - Inflater: IntelInflater

    11:28:51.567 INFO  SelectVariants - GCS max retries/reopens: 20

    11:28:51.567 INFO  SelectVariants - Requester pays: disabled

    11:28:51.567 INFO  SelectVariants - Initializing engine

    11:28:52.059 INFO  FeatureManager - Using codec VCFCodec to read file file:///exports/cmvm/eddie/eb/groups/schoenebeck_group/JENNI/gVCF_Luna/1_335063.g.vcf

    11:28:52.152 INFO  SelectVariants - Done initializing engine

    11:28:52.526 INFO  ProgressMeter - Starting traversal

    11:28:52.526 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute

    12:01:42.043 INFO  SelectVariants - 777401876 variant(s) filtered by: (AllowAllVariantsVariantFilter AND VariantTypesVariantFilter)

      777401876 variant(s) filtered by: VariantTypesVariantFilter

     

    12:01:42.044 INFO  ProgressMeter -             unmapped             32.8                     0              0.0

    12:01:42.045 INFO  ProgressMeter - Traversal complete. Processed 0 total variants in 32.8 minutes.

    12:01:42.059 INFO  SelectVariants - Shutting down engine

    [29 January 2020 12:01:42 GMT] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 32.87 minutes.

     

     

    Any advice as to why I am still unable to run genomicsDBimport and to why Select variants processed 0 total variants appreciated.

    Many thanks,

    Jen

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Jennifer Irving-McGrath

     

    The issue is that the input you are providing to GenomicsDBImport is a uncompressed g.vcf file. The indexed filed you shared with me is for a compressed g.vcf i.e. "3_335063_QUEEN.g.vcf.gz.tbi".

    HaplotypeCaller outputs an uncompressed VCF, and produces the index file itself, giving you a .g.vcf and a .g.vcf.idx file. Try to use those input vcfs and vcf.idx files and see if that resolves the issue. 

    Note: When you compress a normal VCF file to .vcf.gz and use tabix to index it gives you a .vcf.tbi.

    My advice is to not compress the GVCF, and use the HaplotypeCaller produced vcf and idx files to see if it resolves the issue.

    0
    Comment actions Permalink
  • Avatar
    Jennifer Irving-McGrath

    Hi Bhanu, Thank you for your reply.

    However when i run Haplotype caller, I am getting a .g.vcf.gz  as output as the example script details this:

    example script:

    gatk --java-options "-Xmx4g" HaplotypeCaller  \
       -R Homo_sapiens_assembly38.fasta \
       -I input.bam \
       -O output.g.vcf.gz \
       -ERC GVCF

     

     

    If i want to get an uncompressed g.vcf, do I just change the output to .g.vcf? would this automatically also change the index to uncompressed? (I thought it safer to ask than risk running it again)

     

    Many thanks, Jenni

     

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Jennifer Irving-McGrath

     

    1. If you are getting a .g.vcf.gz  as output, then please use the corresponding index file also generated by HaplotypeCaller. The point is that they should either both be compressed or both uncompressed. From the information you provided above, looks like the vcf is named `3_335063_QUEEN.g.vcf` and the index file `3_335063_QUEEN.g.vcf.gz.tbi` hence the inconsistency.
    2. Please upgrade to the latest GATKv4.1.5.0 and try again.
    0
    Comment actions Permalink
  • Avatar
    Diya Yang

    Hi there,

    I have the same error message "A USER ERROR has occurred: Failed to create reader from file:...". Both of the g.vcf files and the g.vcf.idx files are in the same directory. I also ran the ValidateVariants and here is the output:

    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/gatk/4.1.7.0/gatk-package-4.1.7.0-local.jar ValidateVariants -V Pre-transplant_recipient-2_S7_TEST.g.vcf
    08:02:11.336 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/gatk/4.1.7.0/gatk-package-4.1.7.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Aug 06, 2021 8:02:11 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    08:02:11.503 INFO ValidateVariants - ------------------------------------------------------------
    08:02:11.503 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.7.0
    08:02:11.503 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    08:02:11.504 INFO ValidateVariants - Executing as dxy257@hpc4 on Linux v3.10.0-1160.36.2.el7.x86_64 amd64
    08:02:11.504 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_292-b10
    08:02:11.504 INFO ValidateVariants - Start Date/Time: August 6, 2021 8:02:11 AM EDT
    08:02:11.504 INFO ValidateVariants - ------------------------------------------------------------
    08:02:11.504 INFO ValidateVariants - ------------------------------------------------------------
    08:02:11.505 INFO ValidateVariants - HTSJDK Version: 2.21.2
    08:02:11.505 INFO ValidateVariants - Picard Version: 2.21.9
    08:02:11.505 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    08:02:11.505 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    08:02:11.505 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    08:02:11.505 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    08:02:11.505 INFO ValidateVariants - Deflater: IntelDeflater
    08:02:11.505 INFO ValidateVariants - Inflater: IntelInflater
    08:02:11.505 INFO ValidateVariants - GCS max retries/reopens: 20
    08:02:11.505 INFO ValidateVariants - Requester pays: disabled
    08:02:11.506 INFO ValidateVariants - Initializing engine
    08:02:11.852 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/rstor/SOM_EPBI_XXZ10/dxy257/rotation1/data/blood_trans/bwa_mem_alignment_result/Pre-transplant_recipient-2_S7_TEST.g.vcf
    08:02:11.900 INFO ValidateVariants - Done initializing engine
    08:02:11.900 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
    08:02:11.900 WARN ValidateVariants - Other possible validations will still be performed
    08:02:11.900 WARN ValidateVariants - REF validation cannot be done because no reference file was provided
    08:02:11.900 WARN ValidateVariants - Other possible validations will still be performed
    08:02:11.900 INFO ProgressMeter - Starting traversal
    08:02:11.900 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    08:02:12.170 INFO ProgressMeter - GL000219.1:150334 0.0 12270 2747014.9
    08:02:12.170 INFO ProgressMeter - Traversal complete. Processed 12270 total variants in 0.0 minutes.
    08:02:12.170 INFO ValidateVariants - Shutting down engine
    [August 6, 2021 8:02:12 AM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=1707081728"

     

    Could you please let me know what is the problem? Thank you!

    Diya

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Diya,

    Please post the exact command you are using, the version of the tool and the entire error log. 

    0
    Comment actions Permalink
  • Avatar
    Nisha Dwivedi

    Hi, 

    Has Jennifer's issue been resolved? I'm also getting a " .A USER ERROR has occurred: Failed to create reader from file" error for a compressed vcf.gz which has its vcf.gz.tbi index. I ran Select Variants on this gvcf and, similarly to Jennifer, I get that 0 variants are processed in 26 minutes. Both the compressed gvcf and its index are direct outputs form HaplotypeCaller. Do you have any advice on how to resolve this (all other gvcfs work in GenomicsDB) ? Thanks, Nisha

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    Nisha Dwivedi  It's hard to say what's going on without further information.  If it's possible could you set the following environment variable and then rerun in order to generate a complete stacktrace?

    GATK_STACKTRACE_ON_USER_EXCEPTION=true

    It's also necessary to know your GATK version and what type of machine you're running on.  

    0
    Comment actions Permalink
  • Avatar
    Keity Farfán

    Hello, I'm having the same error. I'm including the g.vcf and .idx files generated by HaplotypeCaller, however, the error persists. What can I do to solve this?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Keity Farfán

    Can you run IndexFeatureFile tool on the g.vcf file to recreate the index. It is possible that the index file is corrupt therefore it cannot be read. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk