Getting an error with genomicsdbimport
Hi,
Quite new to GATK. I am using GATK /gatk-package-4.1.8.1-local.jar GenomicsDBImport to consolidates GVCFs. Instead of using all the samples from from different directories I just took 5 samples from 5 different directories bit it is giving me an error. Script and error I am posting here:
java -Xmx50g -jar /prj/pflaphy-robot/software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar GenomicsDBImport --genomicsdb-workspace-path /prj/pflaphy-robot/genomicsDB/samplesmerged --batch-size 0 -L chr1 --sample-name-map /prj/pflaphy-robot/cohort.sample_map 2> /prj/pflaphy-robot/genomicsDB/samplemerge.err
Map file looks like this: There is tab space between two columns
D370_AK6738_H2KGYDSXY_L3.g.vcf.gz /prj/pflaphy_robot/haploc1_plate1
S382_AK6663_HGVCLDSXX_L2.g.vcf.gz /prj/pflaphy_robot/haploc1_plate2
381_AK6730_HJFGTDSXX_L1.g.vcf.gz /prj/pflaphy_robot/haploc1_plate3
83_AK6725_HJFGTDSXX_L2.g.vcf.gz /prj/pflaphy_robot/haploc1_plate4
526_AK6521_AK6691_HJHHCDSXX_L3.g.vcf.gz /prj/pflaphy_robot/haploc1_plate5
And the error log:
INFO: Failed to detect whether we are running on Google Compute Engine.
10:29:29.643 INFO GenomicsDBImport - ------------------------------------------------------------
10:29:29.644 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.8.1
10:29:29.644 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
10:29:29.645 INFO GenomicsDBImport - Executing as vkumar@robin on Linux v4.20.4-200.fc29.x86_64 amd64
10:29:29.645 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-b13
10:29:29.645 INFO GenomicsDBImport - Start Date/Time: November 9, 2020 10:29:28 AM CET
10:29:29.645 INFO GenomicsDBImport - ------------------------------------------------------------
10:29:29.645 INFO GenomicsDBImport - ------------------------------------------------------------
10:29:29.646 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
10:29:29.647 INFO GenomicsDBImport - Picard Version: 2.22.8
10:29:29.647 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:29:29.647 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:29:29.647 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:29:29.647 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:29:29.647 INFO GenomicsDBImport - Deflater: IntelDeflater
10:29:29.647 INFO GenomicsDBImport - Inflater: IntelInflater
10:29:29.647 INFO GenomicsDBImport - GCS max retries/reopens: 20
10:29:29.647 INFO GenomicsDBImport - Requester pays: disabled
10:29:29.647 INFO GenomicsDBImport - Initializing engine
10:29:29.654 INFO GenomicsDBImport - Shutting down engine
[November 9, 2020 10:29:29 AM CET] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2307915776
***********************************************************************
A USER ERROR has occurred: Bad input: Expected a file with 2 fields per line in the format
Sample File
but found line: "" with 1 fields
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Any help on this.
Thanks,
-
Hi Vinod Kumar, here are two things that may help:
- Submit your GATK command using the gatk wrapper script, not with calling the jar file. You can find more information here in point #2.
- There are issues with your map file. A detailed description can be found in the GenomicsDB documentation. It seems that you have flipped the order and have file \t sample instead of sample \t file. The second issue is that there is a line with only whitespace in your map file. Make sure that you delete that line.
-
Hi Genevieve Brandt (she/her),
Thanks for the reply.
Yes, I saw there was a line with white space and I removed it and the error is gone.
I've multiple GVCF samples in multiple directories and I thought through sample_map file I can give the sample name and their locations in these directories so I don't need to put all the 1000 samples in one directory and the program will automatically select files from the location mentioned in the sample_map file.
I just tried 5 different files (gvcf) from 5 different directories and it is not working. I tried various types of sample_map formats and program is giving me error about "A USER ERROR has occurred: Failed to create reader".
It feels to me I've to generate a list of samples from different directories and feed it to program through -V option.
Is it possible to give path to GVCF files in different directories though sample_map file or I am taking it in wrong way?
Thanks,
-
Vinod Kumar, for posts regarding GATK issues, we require three items to be included in the post.
- GATK version number
- Exact command used
- Complete Stack Trace/Error log [Use -DSTACK_TRACE_ON_USEREXCEPTION to print the stack trace.] How to submit java arguments.
Your post is missing one of these items that must be included so that we can thoroughly look into the problem.
-
Hi Genevieve Brandt (she/her),
I am using GATK genomicsDBimport to generate datastore of variants. I've 1000 samples in 10 different directories. ValidateVariants is working well with samples. Initially, I am trying only 5 samples from 5 different directories to test the program. I used this script:
gatk --java-options "-Xmx50g" GenomicsDBImport --genomicsdb-workspace-path /prj/pflaphy-robot/genomicsDB/samplesmerged --batch-size 0 -L chr1 --sample-name-map /prj/pflaphy-robot/cohort.sample_map
sample_map file look like: sample name_tab_path_to_sample_file
D370_EKDL200000636 /prj/pflaphy_robot/haploc1_plate1
S382_UKKD19060006 /prj/pflaphy_robot/haploc1_plate2
457_EKDL190031784 /prj/pflaphy_robot/haploc1_plate3
83_EKDL190132027 /prj/pflaphy_robot/haploc1_plate4
526_EKDL190132218 /prj/pflaphy_robot/haploc1_plate5Bit I am getting this error: Error log
Using GATK jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50g -jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /prj/pflaphy-robot/genomicsDB/samplesmerged --batch-size 0 -L chr1 --sample-name-map /prj/pflaphy-robot/cohort1.sample_map
09:04:20.789 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 11, 2020 9:04:21 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:04:21.141 INFO GenomicsDBImport - ------------------------------------------------------------
09:04:21.142 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
09:04:21.142 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
09:04:21.143 INFO GenomicsDBImport - Executing as vkumar@suc01005 on Linux v4.18.16-300.fc29.x86_64 amd64
09:04:21.143 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_181-b15
09:04:21.143 INFO GenomicsDBImport - Start Date/Time: November 11, 2020 9:04:20 AM CET
09:04:21.144 INFO GenomicsDBImport - ------------------------------------------------------------
09:04:21.144 INFO GenomicsDBImport - ------------------------------------------------------------
09:04:21.145 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
09:04:21.145 INFO GenomicsDBImport - Picard Version: 2.23.3
09:04:21.145 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:04:21.145 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:04:21.145 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:04:21.145 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:04:21.145 INFO GenomicsDBImport - Deflater: IntelDeflater
09:04:21.145 INFO GenomicsDBImport - Inflater: IntelInflater
09:04:21.146 INFO GenomicsDBImport - GCS max retries/reopens: 20
09:04:21.146 INFO GenomicsDBImport - Requester pays: disabled
09:04:21.146 INFO GenomicsDBImport - Initializing engine
09:04:21.203 INFO GenomicsDBImport - Shutting down engine
[November 11, 2020 9:04:21 AM CET] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2026897408
***********************************************************************A USER ERROR has occurred: Failed to create reader from file:///prj/pflaphy_robot/haploc1_plate3
Same error when sample_map file was used in different way:
D370_EKDL200000636 /prj/pflaphy_robot/haploc1_plate1/D370_EKDL200000636.g.vcf.gz
S382_UKKD19060006 /prj/pflaphy_robot/haploc1_plate2/S382_UKKD19060006.g.vcf.gz
83_EKDL190132027 /prj/pflaphy_robot/haploc1_plate4/83_EKDL190132027.g.vcf.gz
526_EKDL190132218 /prj/pflaphy_robot/haploc1_plate5/526_EKDL190132218.g.vcf.gzAgain the same error extended: Error log
Using GATK jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50g -jar /vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /prj/pflaphy-robot/genomicsDB/samplesmerged --batch-size 0 -L chr1 --sample-name-map /prj/pflaphy-robot/cohort.sample_map
16:34:05.764 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/vol/biotools/share/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 10, 2020 4:34:06 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:34:06.074 INFO GenomicsDBImport - ------------------------------------------------------------
16:34:06.075 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
16:34:06.075 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
16:34:06.075 INFO GenomicsDBImport - Executing as vkumar@suc01014 on Linux v4.20.4-200.fc29.x86_64 amd64
16:34:06.075 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-b13
16:34:06.075 INFO GenomicsDBImport - Start Date/Time: November 10, 2020 4:34:05 PM CET
16:34:06.075 INFO GenomicsDBImport - ------------------------------------------------------------
16:34:06.076 INFO GenomicsDBImport - ------------------------------------------------------------
16:34:06.076 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
16:34:06.077 INFO GenomicsDBImport - Picard Version: 2.23.3
16:34:06.077 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:34:06.077 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:34:06.077 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:34:06.077 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:34:06.077 INFO GenomicsDBImport - Deflater: IntelDeflater
16:34:06.077 INFO GenomicsDBImport - Inflater: IntelInflater
16:34:06.077 INFO GenomicsDBImport - GCS max retries/reopens: 20
16:34:06.078 INFO GenomicsDBImport - Requester pays: disabled
16:34:06.078 INFO GenomicsDBImport - Initializing engine
16:34:06.130 INFO GenomicsDBImport - Shutting down engine
[November 10, 2020 4:34:06 PM CET] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2105016320
***********************************************************************A USER ERROR has occurred: Failed to create reader from file:///prj/pflaphy_robot/haploc1_plate5/526_EKDL190132218-2a-AK6521-AK6691_HJHHCDSXX_L3.g.vcf.gz
ValidateVariants is working well with samples.
Any suggestion? Thanks,
Vinod,
-
Vinod Kumar This issue has been previously discussed on the forum here, please see that link first.
-
Hi Viond Kumar,
I also have the same problem with you, with the error information like that:
A USER ERROR has occurred: Failed to create reader from file:///biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException: Failed to create reader from file:///biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:880)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getHeaderFromPath(GenomicsDBImport.java:521)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.initializeHeaderAndSampleMappings(GenomicsDBImport.java:489)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onStartup(GenomicsDBImport.java:420)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: /biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz, for input source: file:///biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:102)
at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:127)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:833)
... 9 more
Caused by: java.nio.file.NoSuchFileException: /biomedja01/disk1/lxu/joint-genotyping/A090388_exome.g.vcf.gz
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at htsjdk.samtools.seekablestream.SeekablePathStream.<init>(SeekablePathStream.java:41)
at htsjdk.tribble.util.ParsingUtils.openInputStream(ParsingUtils.java:107)
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:253)Could you tell me how did you solve that? much thanks
-
Hi Shirely,
Please see the previous link I shared for more information. Vinod Kumar, do you have any more helpful tips to share?
Genevieve
-
Hi Genevieve
I have read and tried all useful posts and suggestions that your team shared, but also got the same error information. Below is the solutions I have tried:
1) Put the sample.g.vcf.gz and sample.g.vcf.gz.tbi in the same folder.
2) uncompressed these g.vcf.gz file and indexed g.vcf files
3) uncompressed these g.vcf.gz file and compressed them again with bgzip, and then indexed them with tabix and IndexFeatureFile respectively.
I also used the SelectVariants to check my file based on your suggestion. I didn't receive error information but no variants was selected in the output.
The gatk version I used is: us.gcr.io/broad-gatk/gatk:4.1.8.0.
The workflow I used is: warp/pipelines/broad/dna_seq/germline/joint_genotyping/.
Thanks for your any suggestions.
-
In my case, I tried various things like you but finally I found that I had a problem in sample_map file, there was an extra '/' , I removed it and it worked for me.
Error in creating Reader looks similar to my problem which was related to filename and its location, please see carefully all your filenames and their locations and then run it or first run for few samples and check whether it is working or not.
Thanks,
Please sign in to leave a comment.
9 comments