Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Getting an error with genomicsdbimport

0

9 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Vinod Kumar, here are two things that may help:

    1. Submit your GATK command using the gatk wrapper script, not with calling the jar file. You can find more information here in point #2
    2. There are issues with your map file. A detailed description can be found in the GenomicsDB documentation. It seems that you have flipped the order and have file \t sample instead of sample \t file. The second issue is that there is a line with only whitespace in your map file. Make sure that you delete that line. 
    1
    Comment actions Permalink
  • Avatar
    Vinod Kumar

    Hi Genevieve Brandt (she/her),

    Thanks for the reply.

    Yes, I saw there was a line with white space and I removed it and the error is gone.

    I've multiple GVCF samples in multiple directories and I thought through sample_map file I can give the sample name and their locations in these directories so I don't need to put all the 1000 samples in one directory and the program will automatically select files from the location mentioned in the sample_map file. 

    I just tried 5 different files (gvcf) from 5 different directories and it is not working. I tried various types of sample_map formats and program is giving me error about "A USER ERROR has occurred: Failed to create reader".

    It feels to me I've to generate a list of samples from different directories and feed it to program through -V option. 

    Is it possible to give path to GVCF files in different directories though sample_map file or I am taking it in wrong way?

    Thanks,

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Vinod Kumar, for posts regarding GATK issues, we require three items to be included in the post.

    1. GATK version number
    2. Exact command used
    3. Complete Stack Trace/Error log [Use -DSTACK_TRACE_ON_USEREXCEPTION to print the stack trace.] How to submit java arguments.

    Your post is missing one of these items that must be included so that we can thoroughly look into the problem.

    0
    Comment actions Permalink
  • Avatar
    Vinod Kumar

    Hi Genevieve Brandt (she/her),

    I am using GATK genomicsDBimport to generate datastore of variants. I've 1000 samples in 10 different directories. ValidateVariants is working well with samples. Initially, I am trying only 5 samples from 5 different directories to test the program. I used this script:

    gatk --java-options "-Xmx50g" GenomicsDBImport --genomicsdb-workspace-path /prj/pflaphy-robot/genomicsDB/samplesmerged --batch-size 0 -L chr1 --sample-name-map /prj/pflaphy-robot/cohort.sample_map

    sample_map file look like: sample name_tab_path_to_sample_file

    D370_EKDL200000636 /prj/pflaphy_robot/haploc1_plate1
    S382_UKKD19060006 /prj/pflaphy_robot/haploc1_plate2
    457_EKDL190031784 /prj/pflaphy_robot/haploc1_plate3
    83_EKDL190132027 /prj/pflaphy_robot/haploc1_plate4
    526_EKDL190132218 /prj/pflaphy_robot/haploc1_plate5

    Bit I am getting this error: Error log

    Using GATK jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50g -jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /prj/pflaphy-robot/genomicsDB/samplesmerged --batch-size 0 -L chr1 --sample-name-map /prj/pflaphy-robot/cohort1.sample_map
    09:04:20.789 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Nov 11, 2020 9:04:21 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    09:04:21.141 INFO GenomicsDBImport - ------------------------------------------------------------
    09:04:21.142 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
    09:04:21.142 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    09:04:21.143 INFO GenomicsDBImport - Executing as vkumar@suc01005 on Linux v4.18.16-300.fc29.x86_64 amd64
    09:04:21.143 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_181-b15
    09:04:21.143 INFO GenomicsDBImport - Start Date/Time: November 11, 2020 9:04:20 AM CET
    09:04:21.144 INFO GenomicsDBImport - ------------------------------------------------------------
    09:04:21.144 INFO GenomicsDBImport - ------------------------------------------------------------
    09:04:21.145 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
    09:04:21.145 INFO GenomicsDBImport - Picard Version: 2.23.3
    09:04:21.145 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    09:04:21.145 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    09:04:21.145 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    09:04:21.145 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    09:04:21.145 INFO GenomicsDBImport - Deflater: IntelDeflater
    09:04:21.145 INFO GenomicsDBImport - Inflater: IntelInflater
    09:04:21.146 INFO GenomicsDBImport - GCS max retries/reopens: 20
    09:04:21.146 INFO GenomicsDBImport - Requester pays: disabled
    09:04:21.146 INFO GenomicsDBImport - Initializing engine
    09:04:21.203 INFO GenomicsDBImport - Shutting down engine
    [November 11, 2020 9:04:21 AM CET] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=2026897408
    ***********************************************************************

    A USER ERROR has occurred: Failed to create reader from file:///prj/pflaphy_robot/haploc1_plate3

    Same error when sample_map file was used in different way:

    D370_EKDL200000636 /prj/pflaphy_robot/haploc1_plate1/D370_EKDL200000636.g.vcf.gz
    S382_UKKD19060006 /prj/pflaphy_robot/haploc1_plate2/S382_UKKD19060006.g.vcf.gz
    83_EKDL190132027 /prj/pflaphy_robot/haploc1_plate4/83_EKDL190132027.g.vcf.gz
    526_EKDL190132218 /prj/pflaphy_robot/haploc1_plate5/526_EKDL190132218.g.vcf.gz

    Again the same error extended: Error log

    Using GATK jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50g -jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /prj/pflaphy-robot/genomicsDB/samplesmerged --batch-size 0 -L chr1 --sample-name-map /prj/pflaphy-robot/cohort.sample_map
    16:34:05.764 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Nov 10, 2020 4:34:06 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    16:34:06.074 INFO GenomicsDBImport - ------------------------------------------------------------
    16:34:06.075 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
    16:34:06.075 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:34:06.075 INFO GenomicsDBImport - Executing as vkumar@suc01014 on Linux v4.20.4-200.fc29.x86_64 amd64
    16:34:06.075 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-b13
    16:34:06.075 INFO GenomicsDBImport - Start Date/Time: November 10, 2020 4:34:05 PM CET
    16:34:06.075 INFO GenomicsDBImport - ------------------------------------------------------------
    16:34:06.076 INFO GenomicsDBImport - ------------------------------------------------------------
    16:34:06.076 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
    16:34:06.077 INFO GenomicsDBImport - Picard Version: 2.23.3
    16:34:06.077 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:34:06.077 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:34:06.077 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:34:06.077 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:34:06.077 INFO GenomicsDBImport - Deflater: IntelDeflater
    16:34:06.077 INFO GenomicsDBImport - Inflater: IntelInflater
    16:34:06.077 INFO GenomicsDBImport - GCS max retries/reopens: 20
    16:34:06.078 INFO GenomicsDBImport - Requester pays: disabled
    16:34:06.078 INFO GenomicsDBImport - Initializing engine
    16:34:06.130 INFO GenomicsDBImport - Shutting down engine
    [November 10, 2020 4:34:06 PM CET] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=2105016320
    ***********************************************************************

    A USER ERROR has occurred: Failed to create reader from file:///prj/pflaphy_robot/haploc1_plate5/526_EKDL190132218-2a-AK6521-AK6691_HJHHCDSXX_L3.g.vcf.gz

     

    ValidateVariants is working well with samples.

     

    Any suggestion? Thanks,

    Vinod,

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Vinod Kumar This issue has been previously discussed on the forum here, please see that link first.

    0
    Comment actions Permalink
  • Avatar
    Shirely

    Hi Viond Kumar,

    I also have the same problem with you, with the error information like that:

    A USER ERROR has occurred: Failed to create reader from file:///biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz

    ***********************************************************************
    org.broadinstitute.hellbender.exceptions.UserException: Failed to create reader from file:///biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:880)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getHeaderFromPath(GenomicsDBImport.java:521)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.initializeHeaderAndSampleMappings(GenomicsDBImport.java:489)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onStartup(GenomicsDBImport.java:420)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)
    Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: /biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz, for input source: file:///biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:102)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:127)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:833)
    ... 9 more
    Caused by: java.nio.file.NoSuchFileException: /biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
    at java.nio.file.Files.newByteChannel(Files.java:361)
    at java.nio.file.Files.newByteChannel(Files.java:407)
    at htsjdk.samtools.seekablestream.SeekablePathStream.<init>(SeekablePathStream.java:41)
    at htsjdk.tribble.util.ParsingUtils.openInputStream(ParsingUtils.java:107)
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:253)

    Could you tell me how did you solve that? much thanks

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Shirely,

    Please see the previous link I shared for more information. Vinod Kumar, do you have any more helpful tips to share?

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Shirely

    Hi Genevieve

    I have read and tried all useful posts and suggestions that your team shared, but also got the same error information. Below is the solutions I have tried:

    1) Put the sample.g.vcf.gz and sample.g.vcf.gz.tbi in the same folder.

    2) uncompressed these g.vcf.gz file and indexed g.vcf files

    3) uncompressed these g.vcf.gz file and compressed them again with bgzip, and then indexed them with tabix and IndexFeatureFile respectively.

    I also used the SelectVariants to check my file based on your suggestion. I didn't receive error information but no variants was selected in the output.

    The gatk version I used is: us.gcr.io/broad-gatk/gatk:4.1.8.0.

    The workflow I used is: warp/pipelines/broad/dna_seq/germline/joint_genotyping/.

    Thanks for your any suggestions.

     

    0
    Comment actions Permalink
  • Avatar
    Vinod Kumar

    Shirely,

    In my case, I tried various things like you but finally I found that I had a problem in sample_map file, there was an extra '/' , I removed it and it worked for me.

    Error in creating Reader looks similar to my problem which was related to filename and its location, please see carefully all your filenames and their locations and then run it or first run for few samples and check whether it is working or not.

    Thanks,

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk