GenomeSTRiP SVPreprocess failed
AnsweredDear Forum member
I downloaded latest version of GenomeStrip (Release 2.00.1982) and trying to run SVPreprocess on my genome files samples. I kept getting error about "Cannot determine library identifier for read xxx". Thank you in advance for your assistance.
Command: java -Xmx4g -cp ${classpath} org.broadinstitute.gatk.queue.QCommandLine -S ${SV_DIR}/qscript/SVPreprocess.q -S ${SV_DIR}/qscript/SVQScript.q -cp ${classpath} -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar -configFile ${SV_DIR}/conf/genstrip_parameters.txt -R ${referencepath} -I genomestrip.bam.list -md output_metadata_directory -bamFilesAreDisjoint true -jobLogDir output_metadata_directory/logDir -L 6:161000000-162000000 -run
Error:
##### ERROR --
##### ERROR stack trace
org.broadinstitute.sv.commandline.ArgumentException: Cannot determine library identifier for read ST-E00219:476:HKV7TCCXY:1:1108:10460:64755
at org.broadinstitute.sv.metadata.isize.ComputeInsertSizeHistogramsWalker.processRead(ComputeInsertSizeHistogramsWalker.java:125)
at org.broadinstitute.sv.util.gatk.SVBaseReadWalker.simulateTraversal(SVBaseReadWalker.java:234)
at org.broadinstitute.sv.util.gatk.SVBaseReadWalker.onTraversalDone(SVBaseReadWalker.java:189)
at org.broadinstitute.sv.metadata.isize.ComputeInsertSizeHistogramsWalker.onTraversalDone(ComputeInsertSizeHistogramsWalker.java:77)
at org.broadinstitute.sv.metadata.isize.ComputeInsertSizeHistogramsWalker.onTraversalDone(ComputeInsertSizeHistogramsWalker.java:33)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:115)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:145)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:95)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:66)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7.GS-r1941-0-gb493839):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Cannot determine library identifier for read ST-E00219:476:HKV7TCCXY:1:1108:10460:64755
##### ERROR ------------------------------------------------------------------------------------------
-
Thank you for your post Jenny Xu. I'm going to tag Bob Handsaker to get back to you shortly.
-
This is most likely because your input bam/cram files do not contain library information (i.e. there is no LB tag on the @RG headers). It is strongly recommended that you inject library information, since the technical characteristics of the data (insert size distribution, GC-bias, etc.) depend on the library.
If your headers are missing the LB tag, there are three choices. First, you could reheader your input files to make them compliant.
Second, you could use `-libraryKey READGROUP`. This will cause each read group to be treated like a separate library.
Third, you could try an experimental undocumented feature that allows you to remap read group information on the fly. To do this, you must first pre-create your metadata directory and inside the directory create a text file read_groups.dat. This should be a tab delimited text file with three columns with headers READGROUP SAMPLE LIBRARY. This will override the information from the headers in the input file (specifically SM and LB), based on the read group IDs. To use this, all of your read group IDs must be unique across all input files.
-
Thanks Bob. Adding the LB tag in the header solves the issue.
Please sign in to leave a comment.
3 comments