GenomincsDBImport Failed to create reader from file: An index is required, but none found., for input source:
AnsweredIf you are seeing an error, please provide(REQUIRED) :
a) GATK version used:
gatk-4.2.2.0
b) Exact command used:
while read -a line
do
/home/sparks35/gatk-4.2.2.0/gatk --java-options "-Xmx36g -Xms36g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" GenomicsDBImport \
--genomicsdb-workspace-path $PROJHOME/data/seqs/aligned_reads_Ogor1.0/07_genomicsDB/${line[0]}_database \
--batch-size 50 \
-L ${line[0]} \
--sample-name-map $PROJHOME/scripts/07_GenomicsDBImport/genomicsDB2.txt \
--tmp-dir tmpdir \
--reader-threads 4 &
done < $PROJHOME/data/seqs/aligned_reads_Ogor1.0/chromosome_list.txt
wait
c) Entire error log:
The error log is quite long because I am running this over the 28 chromosomes, so I am printing one of the errors here. I can provide the whole log if needed.
Oct 12, 2021 1:31:03 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
01:31:03.343 INFO GenomicsDBImport - ------------------------------------------------------------
01:31:03.344 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.2.0
01:31:03.344 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
01:31:03.344 INFO GenomicsDBImport - Executing as sparks35@bell-b007.rcac.purdue.edu on Linux v3.10.0-1127.19.1.el7.x86_64 amd64
01:31:03.344 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_262-b10
01:31:03.344 INFO GenomicsDBImport - Start Date/Time: October 12, 2021 1:31:02 AM EDT
01:31:03.344 INFO GenomicsDBImport - ------------------------------------------------------------
01:31:03.344 INFO GenomicsDBImport - ------------------------------------------------------------
01:31:03.345 INFO GenomicsDBImport - HTSJDK Version: 2.24.1
01:31:03.345 INFO GenomicsDBImport - Picard Version: 2.25.4
01:31:03.345 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
01:31:03.345 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:31:03.345 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:31:03.345 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:31:03.345 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:31:03.345 INFO GenomicsDBImport - Deflater: IntelDeflater
01:31:03.345 INFO GenomicsDBImport - Inflater: IntelInflater
01:31:03.345 INFO GenomicsDBImport - GCS max retries/reopens: 20
01:31:03.345 INFO GenomicsDBImport - Requester pays: disabled
01:31:03.345 INFO GenomicsDBImport - Initializing engine
01:31:03.566 INFO GenomicsDBImport - Shutting down engine
01:31:03.567 INFO GenomicsDBImport - Shutting down engine
01:31:03.569 INFO GenomicsDBImport - Shutting down engine
01:31:03.571 INFO GenomicsDBImport - Shutting down engine
01:31:03.572 INFO GenomicsDBImport - Shutting down engine
01:31:03.572 INFO GenomicsDBImport - Shutting down engine
[October 12, 2021 1:31:03 AM EDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=37044092928
***********************************************************************
A USER ERROR has occurred: Failed to create reader from file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz because
of the following error:
An index is required, but none found., for input source: file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.g
z
***********************************************************************
[October 12, 2021 1:31:03 AM EDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.05 minutes.org.broadinstitute.hellbender.exceptions.UserException: Failed to create reader from file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz because of the following error:
An index is required, but none found., for input source: file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:1014)Runtime.totalMemory()=37044092928
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getHeaderFromPath(GenomicsDBImport.java:576)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.initializeHeaderAndSampleMappings(GenomicsDBImport.java:544)***********************************************************************
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onStartup(GenomicsDBImport.java:450)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)A USER ERROR has occurred: Failed to create reader from file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz because of the following error:
An index is required, but none found., for input source: file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
*********************************************************************** at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
01:31:03.573 INFO GenomicsDBImport - Shutting down engine
Caused by: htsjdk.tribble.TribbleException: An index is required, but none found., for input source: file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz
at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:135)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:966)
... 9 more
org.broadinstitute.hellbender.exceptions.UserException: Failed to create reader from file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz because of the following error:
An index is required, but none found., for input source: file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:1014)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getHeaderFromPath(GenomicsDBImport.java:576)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.initializeHeaderAndSampleMappings(GenomicsDBImport.java:544)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onStartup(GenomicsDBImport.java:450)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)[October 12, 2021 1:31:03 AM EDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.01 minutes.
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Runtime.totalMemory()=37044092928
Caused by: htsjdk.tribble.TribbleException: An index is required, but none found., for input source: file:///scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/
06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz
Here is the head of my file map and the head of the file where the vcfs are so you can see the directory path and what is in the directory.
LAE_006 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_006_Ogor1.0_hapcalls.vcf.gz
LAE_012 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_012_Ogor1.0_hapcalls.vcf.gz
LAE_024 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_024_Ogor1.0_hapcalls.vcf.gz
LAE_030 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_030_Ogor1.0_hapcalls.vcf.gz
LAE_036 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_036_Ogor1.0_hapcalls.vcf.gz
LAE_042 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_042_Ogor1.0_hapcalls.vcf.gz
LAE_053 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_053_Ogor1.0_hapcalls.vcf.gz
LAE_056 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_056_Ogor1.0_hapcalls.vcf.gz
LAE_057 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_057_Ogor1.0_hapcalls.vcf.gz
LAE_058 /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls/LAE_058_Ogor1.0_hapcalls.vcf.gz
-rw-r--r-- 1 sparks35 student 5.0G Oct 11 09:18 LAE_006_Ogor1.0_hapcalls.vcf.gz
-rw-r--r-- 1 sparks35 student 307K Oct 11 09:21 LAE_006_Ogor1.0_hapcalls.vcf.gz.csi
-rw-r--r-- 1 sparks35 student 4.8G Oct 11 09:13 LAE_012_Ogor1.0_hapcalls.vcf.gz
-rw-r--r-- 1 sparks35 student 304K Oct 11 09:17 LAE_012_Ogor1.0_hapcalls.vcf.gz.csi
-rw-r--r-- 1 sparks35 student 5.1G Oct 11 09:19 LAE_024_Ogor1.0_hapcalls.vcf.gz
-rw-r--r-- 1 sparks35 student 308K Oct 11 09:23 LAE_024_Ogor1.0_hapcalls.vcf.gz.csi
-rw-r--r-- 1 sparks35 student 4.2G Oct 11 09:08 LAE_030_Ogor1.0_hapcalls.vcf.gz
-rw-r--r-- 1 sparks35 student 298K Oct 11 09:11 LAE_030_Ogor1.0_hapcalls.vcf.gz.csi
-rw-r--r-- 1 sparks35 student 5.1G Oct 11 09:23 LAE_036_Ogor1.0_hapcalls.vcf.gz
Some additional notes. I am running haplotypecaller with the -L flag similar to the script above and then concatenating those files back into a single vcf. Haplotypecaller outputs the .tbi files but I couldn't figure out how to properly concatenate them, so once the vcf file is concatenated I just index it again as a .csi. I have done this both with tabix and bcftools index. Perhaps the solution is to not do it this way and run independently.
module purge
module load bioinfo
module load bcftools/1.11
module load samtools/1.7
PROJHOME=/scratch/bell/sparks35/GL_Pink_Salmon
ASSEMBLY=/scratch/bell/sparks35/GL_Pink_Salmon/data/assemblies/Ogor_1.0/GCA_017355495.1_Ogor_1.0_genomic.fna
MDUPES=/scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/mark_dupes/spark_out
HAPCALLS=/scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls
while read -a line
do
/home/sparks35/gatk-4.2.2.0/gatk --java-options "-Xmx9g -Djava.io.tmpdir=/scratch/bell/sparks35/tmpdir" HaplotypeCaller \
-I $MDUPES/LAE_087_Ogor1.0_dupmarked.bam \
-O $HAPCALLS/${line[0]}_LAE_087_Ogor1.0.vcf.gz \
-R $ASSEMBLY \
-ERC GVCF \
-L ${line[0]} &
done < $PROJHOME/data/seqs/aligned_reads_Ogor1.0/chromosome_list.txt
wait
cd /scratch/bell/sparks35/GL_Pink_Salmon/data/seqs/aligned_reads_Ogor1.0/06_hap_calls
bcftools concat -Oz \
CM029847.1_LAE_087_Ogor1.0.vcf.gz CM029861.1_LAE_087_Ogor1.0.vcf.gz CM029848.1_LAE_087_Ogor1.0.vcf.gz CM029862.1_LAE_087_Ogor1.0.vcf.gz \
CM029849.1_LAE_087_Ogor1.0.vcf.gz CM029863.1_LAE_087_Ogor1.0.vcf.gz CM029850.1_LAE_087_Ogor1.0.vcf.gz CM029864.1_LAE_087_Ogor1.0.vcf.gz \
CM029851.1_LAE_087_Ogor1.0.vcf.gz CM029865.1_LAE_087_Ogor1.0.vcf.gz CM029852.1_LAE_087_Ogor1.0.vcf.gz CM029866.1_LAE_087_Ogor1.0.vcf.gz \
CM029853.1_LAE_087_Ogor1.0.vcf.gz CM029867.1_LAE_087_Ogor1.0.vcf.gz CM029854.1_LAE_087_Ogor1.0.vcf.gz CM029868.1_LAE_087_Ogor1.0.vcf.gz \
CM029855.1_LAE_087_Ogor1.0.vcf.gz CM029869.1_LAE_087_Ogor1.0.vcf.gz CM029856.1_LAE_087_Ogor1.0.vcf.gz CM029870.1_LAE_087_Ogor1.0.vcf.gz \
CM029857.1_LAE_087_Ogor1.0.vcf.gz CM029871.1_LAE_087_Ogor1.0.vcf.gz CM029858.1_LAE_087_Ogor1.0.vcf.gz CM029872.1_LAE_087_Ogor1.0.vcf.gz \
CM029859.1_LAE_087_Ogor1.0.vcf.gz CM029873.1_LAE_087_Ogor1.0.vcf.gz CM029860.1_LAE_087_Ogor1.0.vcf.gz > LAE_087_Ogor1.0_hapcalls.vcf.gz
tabix --csi LAE_087_Ogor1.0_hapcalls.vcf.gz
rm -rf CM*LAE_087*
-
Hi Morgan Sparks,
If GenomicsDBImport is having a hard time finding the indexes, you can specify them with the option --read-index.
Let me know if this solves the issue!
Best,
Genevieve
-
Thanks, that makes sense. I thought about that when I was reading the documentation, but I was confused as to what exactly put there. My file map has 134 samples, so do I just direct --read-index to the directory where the indexes are or do I have to have some kind of corresponding index map or similar.
-
I believe it should be a list in the same order as your samples
-
As in
sample_name1 sample_name1.g.vcf.gz.csi
sample_name2 sample_name2.g.vcf.gz.csi
sample_name3 sample_name3.g.vcf.gz.csi
sample_name4 sample_name4.g.vcf.gz.csiOr just
sample_name1.g.vcf.gz.csi
sample_name2.g.vcf.gz.csi
sample_name3.g.vcf.gz.csi
sample_name4.g.vcf.gz.csi -
I think the second option is correct. It would be that way for all the other tools, but GenomicsDB is a little different, so you might want to test with a smaller list to verify it works.
-
Sounds good, thanks for the help!
Please sign in to leave a comment.
6 comments