PathSeqBuildReferenceTaxonomy is returning an error as `A USER ERROR has occurred: Bad input: Expected taxonomy ID to be an integer but found "4OWNbacteria|complete"`
REQUIRED for all errors and issues:
a) GATK version used: 4.4.0.0
b) Exact command used: nohup ./gatk PathSeqBuildReferenceTaxonomy \
> --reference NCBI_viral_refseq.fna \
> --output NCBI_viral_refseq.db \
> --refseq-catalog RefSeq-release221.catalog.gz \
> --tax-dump new_taxdump.tar.gz \
> --java-options "-Xmx60g"
c) Entire program log:
15:21:53.702 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data/fast/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:21:53.730 INFO PathSeqBuildReferenceTaxonomy - ------------------------------------------------------------
15:21:53.732 INFO PathSeqBuildReferenceTaxonomy - The Genome Analysis Toolkit (GATK) v4.4.0.0
15:21:53.732 INFO PathSeqBuildReferenceTaxonomy - For support and documentation go to https://software.broadinstitute.org/gatk/
15:21:53.732 INFO PathSeqBuildReferenceTaxonomy - Executing as atripathi@cstcalculon.cst.local on Linux v5.14.0-362.8.1.el9_3.x86_64 amd64
15:21:53.732 INFO PathSeqBuildReferenceTaxonomy - Java runtime: OpenJDK 64-Bit Server VM v17.0.9+9
15:21:53.733 INFO PathSeqBuildReferenceTaxonomy - Start Date/Time: November 30, 2023 at 3:21:53 PM CST
15:21:53.733 INFO PathSeqBuildReferenceTaxonomy - ------------------------------------------------------------
15:21:53.733 INFO PathSeqBuildReferenceTaxonomy - ------------------------------------------------------------
15:21:53.733 INFO PathSeqBuildReferenceTaxonomy - HTSJDK Version: 3.0.5
15:21:53.734 INFO PathSeqBuildReferenceTaxonomy - Picard Version: 3.0.0
15:21:53.734 INFO PathSeqBuildReferenceTaxonomy - Built for Spark Version: 3.3.1
15:21:53.734 INFO PathSeqBuildReferenceTaxonomy - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:21:53.734 INFO PathSeqBuildReferenceTaxonomy - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:21:53.734 INFO PathSeqBuildReferenceTaxonomy - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:21:53.734 INFO PathSeqBuildReferenceTaxonomy - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:21:53.734 INFO PathSeqBuildReferenceTaxonomy - Deflater: IntelDeflater
15:21:53.734 INFO PathSeqBuildReferenceTaxonomy - Inflater: IntelInflater
15:21:53.735 INFO PathSeqBuildReferenceTaxonomy - GCS max retries/reopens: 20
15:21:53.735 INFO PathSeqBuildReferenceTaxonomy - Requester pays: disabled
15:21:53.735 INFO PathSeqBuildReferenceTaxonomy - Initializing engine
15:21:53.735 INFO PathSeqBuildReferenceTaxonomy - Done initializing engine
15:21:53.735 INFO PathSeqBuildReferenceTaxonomy - Parsing reference and files... (this may take a few minutes)
15:21:59.405 INFO PathSeqBuildReferenceTaxonomy - Shutting down engine
[November 30, 2023 at 3:21:59 PM CST] org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqBuildReferenceTaxonomy done. Elapsed time: 0.10 minutes.
Runtime.totalMemory()=1912602624
***********************************************************************
A USER ERROR has occurred: Bad input: Expected taxonomy ID to be an integer but found "4OWNbacteria|complete"
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException$BadInput: Bad input: Expected taxonomy ID to be an integer but found "4OWNbacteria|complete"
at org.broadinstitute.hellbender.tools.spark.pathseq.PSBuildReferenceTaxonomyUtils.parseTaxonId(PSBuildReferenceTaxonomyUtils.java:63)
at org.broadinstitute.hellbender.tools.spark.pathseq.PSBuildReferenceTaxonomyUtils.parseCatalog(PSBuildReferenceTaxonomyUtils.java:129)
at org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqBuildReferenceTaxonomy.doWork(PathSeqBuildReferenceTaxonomy.java:143)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.lang.NumberFormatException: For input string: "4OWNbacteria|complete"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
at java.base/java.lang.Integer.parseInt(Integer.java:668)
at java.base/java.lang.Integer.valueOf(Integer.java:999)
at org.broadinstitute.hellbender.tools.spark.pathseq.PSBuildReferenceTaxonomyUtils.parseTaxonId(PSBuildReferenceTaxonomyUtils.java:61)
... 8 more
Using GATK jar /data/fast/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /data/fast/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar PathSeqBuildReferenceTaxonomy --reference NCBI_viral_refseq.fna --output NCBI_viral_refseq.db --refseq-catalog RefSeq-release221.catalog.gz --tax-dump new_taxdump.tar.gz
I searched for the string "4OWNbacteria|complete" in all the inputs but I don't find it anywhere.
-
I wasn't able to reproduce your error with RefSeq-release221.catalog.gz. Is it possible your file is corrupted? The md5sum I get is e600678fde0c8656012541890b2d8b31 if you want to check.
Please sign in to leave a comment.
1 comment