ReciprocalOverlapAnnotator error for Illumina data (both ERDS and CNVnator)
Hi,
I am running two separate reciprocal overlap (RO) analysis on ERDS variants and CNVnator variants.
1. While running RO between two ERDS dataset, I was getting an error stating the 'REF' cannot be empty. So substituted the REF column with N. After running the script, I got the following error.
INFO 07:31:51,493 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 07:31:51,495 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7.GS-r1941-0-gb493839, Compiled 2020/01/21 11:34:26
INFO 07:31:51,496 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 07:31:51,496 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 07:31:51,496 HelpFormatter - [Tue Jun 02 07:31:51 EDT 2020] Executing on Linux 3.10.0-862.9.1.el7.x86_64 amd64
INFO 07:31:51,496 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_201-b09
INFO 07:31:51,499 HelpFormatter - Program Args: -A ReciprocalOverlap -R /scratch/RESOURCES/hg19/human_g1k_v37.fasta -vcf /scratch/wgs/XKGP7WU/KEL8946.20190904/edited_7_0480_006.erds.vcf -comparisonFile /scratch/wgs/XKGP7WU/KEL8946.20190904/edited_7_0480_007.erds.vcf -O erds.FAM1_MU006_FAM1_MU007.ReciprocalOverlap.vcf -reciprocalOverlapRankBy FRACTION -writeReport true -reportDirectory reportdir -T SVVariantAnnotatorWalker
INFO 07:31:51,503 HelpFormatter - Executing as root@localhost.localdomain on Linux 3.10.0-862.9.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_201-b09.
INFO 07:31:51,503 HelpFormatter - Date/Time: 2020/06/02 07:31:51
INFO 07:31:51,503 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 07:31:51,503 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 07:31:51,514 02-Jun-2020 GenomeAnalysisEngine - Strictness is SILENT
INFO 07:31:51,627 02-Jun-2020 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 07:31:51,706 02-Jun-2020 GenomeAnalysisEngine - Preparing for traversal
INFO 07:31:51,711 02-Jun-2020 GenomeAnalysisEngine - Done preparing for traversal
INFO 07:31:51,711 02-Jun-2020 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 07:31:51,712 02-Jun-2020 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 07:31:51,712 02-Jun-2020 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 07:31:51,712 02-Jun-2020 Walker - Initializing annotator framework ...
INFO 07:31:51,768 02-Jun-2020 Walker - Annotator framework initialization complete.
INFO 07:31:51,775 02-Jun-2020 Walker - Processing input file /scratch/wgs/XKGP7WU/KEL8946.20190904/edited_7_0480_006.erds.vcf ...
##### ERROR --
##### ERROR stack trace
htsjdk.tribble.TribbleException$MalformedFeatureFile: Error parsing line at byte position: LineIteratorImpl(SynchronousLineReader), for input source: /scratch/wgs/XKGP7WU/KEL8946.20190904/edited_7_0480_006.erds.vcf
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:387)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.<init>(TribbleIndexedFeatureReader.java:342)
at htsjdk.tribble.TribbleIndexedFeatureReader.iterator(TribbleIndexedFeatureReader.java:309)
at org.broadinstitute.sv.util.vcf.VCFReader.iterator(VCFReader.java:69)
at org.broadinstitute.sv.annotation.SVVariantAnnotatorWalker.processVCFFile(SVVariantAnnotatorWalker.java:184)
at org.broadinstitute.sv.annotation.SVVariantAnnotatorWalker.map(SVVariantAnnotatorWalker.java:134)
at org.broadinstitute.sv.annotation.SVVariantAnnotatorWalker.map(SVVariantAnnotatorWalker.java:73)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:106)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:145)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:95)
at org.broadinstitute.sv.main.SVAnnotator.main(SVAnnotator.java:92)
Caused by: java.lang.NumberFormatException: For input string: "N"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at htsjdk.variant.vcf.VCFUtils.parseVcfDouble(VCFUtils.java:262)
at htsjdk.variant.vcf.AbstractVCFCodec.parseQual(AbstractVCFCodec.java:620)
at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:422)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:384)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:373)
... 16 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7.GS-r1941-0-gb493839):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Error parsing line at byte position: LineIteratorImpl(SynchronousLineReader), for input source: /scratch/wgs/XKGP7WU/KEL8946.20190904/edited_7_0480_006.erds.vcf
##### ERROR ------------------------------------------------------------------------------------------
2. While running RO analysis between two CNVnator dataset, I get the output dat file and vcf file. While analyzing the dat file, I came across that some ID's in 'BESTHIT' , start position in 'BHSTART' aren't present in any of my input file. I don't know how that's possible.
For example, the highlighted ID 'CNVnator_dup_3' isn't present in any of my input file.
Can you please let me know how to proceed?
Thanks!
-
Hi,
You are using a very old version. We do not support GATK3 anymore. Please upgrade to the latest version of GATK4.
-
I will upgrade and re-run the scripts.
-
Hi,
As you know Reciprocal Overlap Annotator is a part of GenomeStrip. When we downloaded svtoolkit, its dependencies i.e. gatk comes with it in the same folder. And the docs say that you can't guarantee any other gatk version, only the gatk they provide in svtoolkit, hence the gatk jar come from their folder. We have the latest version of svtoolkit.
That's what we read on your website. Please let me know how to proceed.
Thanks!
-
I wonder if you might be able to answer Ghausia Begum's question.
-
With respect to the first question, it is almost certainly a malformed input file. Try breaking up the file into smaller pieces until you find where the input is malformed.
With respect to the second question, for a variant to be listed, it has to be in the input. There is nothing in the code that would manufacture a variant that wasn't in the input. So should double check your input files.
Please sign in to leave a comment.
5 comments