java.lang.IllegalArgumentException: Invalid interval in FuncotateSegments
Hi,
I tried to annotated a called segment file after following the somatic CNV detection workflow of GATK:
gatk --java-options "-Xmx10g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID" FuncotateSegments \
--data-sources-path funcotator_dataSources.v1.7.20200521s/ \
--ref-version hg19 \
--output-file-format SEG \
-R hs37d5.fa \
--segments sample.called.seg \
-O sample.seg.funcotated.tsv \
--transcript-list funcotator_dataSources.v1.7.20200521s/transcriptList.exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt
But I got the following error message:
12:37:55.534 INFO FuncotateSegments - The following datasources support funcotation on segments:
12:37:55.535 INFO FuncotateSegments - Gencode 34 CANONICAL
12:37:55.542 INFO FuncotatorEngine - VCF sequence dictionary detected as B37 in HG19 annotation mode. Performing conversion.
12:37:55.542 WARN FuncotatorEngine - WARNING: You are using B37 as a reference. Funcotator will convert your variants to GRCh37, and this will be fine in the vast majority of cases. There MAY be some errors (e.g. in the Y chromosome, but possibly in other places as well) due to changes between the two references.
12:37:55.679 INFO ProgressMeter - Starting traversal
12:37:55.679 INFO ProgressMeter - Current Locus Elapsed Minutes Features Processed Features/Minute
12:37:56.198 WARN FuncotatorUtils - Reference allele is different than the reference coding sequence (strand: -, alt = G, ref G != T reference coding seq) @[chr1:13839497]! Substituting given allele for sequence code (TTC->GTC)
12:37:56.213 INFO FuncotateSegments - Shutting down engine
[February 9, 2022 12:37:56 PM EST] org.broadinstitute.hellbender.tools.funcotator.FuncotateSegments done. Elapsed time: 0.24 minutes.
Runtime.totalMemory()=3139436544
java.lang.IllegalArgumentException: Invalid interval. Contig:chr1 start:29534 end:14501
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:804)
at org.broadinstitute.hellbender.utils.SimpleInterval.validatePositions(SimpleInterval.java:59)
at org.broadinstitute.hellbender.utils.SimpleInterval.<init>(SimpleInterval.java:35)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.segment.SegmentExonUtils.findInclusiveExonIndex(SegmentExonUtils.java:95)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.segment.SegmentExonUtils.determineSegmentExonPosition(SegmentExonUtils.java:63)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createSegmentFuncotations(GencodeFuncotationFactory.java:2938)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createSegmentFuncotations(GencodeFuncotationFactory.java:2914)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnSegment(GencodeFuncotationFactory.java:2866)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:239)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:211)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:182)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForSegment$2(FuncotatorEngine.java:218)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForSegment(FuncotatorEngine.java:221)
at org.broadinstitute.hellbender.tools.funcotator.FuncotateSegments.apply(FuncotateSegments.java:191)
at org.broadinstitute.hellbender.tools.funcotator.FuncotateSegments.apply(FuncotateSegments.java:59)
at org.broadinstitute.hellbender.engine.FeatureWalker.lambda$traverse$0(FeatureWalker.java:99)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.engine.FeatureWalker.traverse(FeatureWalker.java:97)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Here's how my called segment file looks like:
CONTIG START END NUM_POINTS_COPY_RATIO MEAN_LOG2_COPY_RATIO CALL
1 14645 13839497 2764 -0.121225 0
1 13839498 55529537 8713 -0.060943 0
1 55534430 142797736 6763 0.050711 0
1 142803161 143164144 9 -1.797248 -
1 143186822 156929235 3970 -0.077460 0
1 156929872 224009136 8811 0.024671 0
1 224116102 224116470 1 -4.545156 -
1 224124170 249230997 3307 0.004490 0
2 41203 137402680 14122 -0.000470 0
2 137402681 215911009 8594 0.077261 0
2 215914005 243081349 4299 -0.032370 0
I used GATK/4.2.4.1. Would you please kindly let me know the cause for the invalid interval error?Thanks a lot!
-
It seems not to be resolved — still waiting for the GATK team.
I switched to oncotator instead since I used GRCh37 reference genome for that dataset, and it worked well!
-
Hi tc,
It looks like you have a reference mismatch issue happening with your files. The error message seems to indicate that there is a contig called chr1 in your funcotator data sources interval:
java.lang.IllegalArgumentException: Invalid interval. Contig:chr1 start:29534 end:14501
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:804)If your segment file has contigs named 1 and 2, they won't match up with the chr1 naming convention. Make sure the reference versions match for all of your files!
Best,
Genevieve
-
Hi Genevieve,
Thank you for looking into my issue. Yes, the reference genome I used is b37, where contigs are 1,2,3,..... When running FunctateSegments, the following messages pop up:
12:37:55.542 INFO FuncotatorEngine - VCF sequence dictionary detected as B37 in HG19 annotation mode. Performing conversion.
12:37:55.542 WARN FuncotatorEngine - WARNING: You are using B37 as a reference. Funcotator will convert your variants to GRCh37, and this will be fine in the vast majority of cases. There MAY be some errors (e.g. in the Y chromosome, but possibly in other places as well) due to changes between the two references.So I assume it has already taken care of the inconsistency between my genome build and the data source genome build, correct?
Thanks,
TC
-
Hi Genevieve,
Sorry for another message. I also tried to convert the contig names in b37 to those in hg19 (by simply adding chr, so 1 will be converted to chr1). After that, I re-ran FuncotateSegments with the modified fasta and segment files, and again the same error message showed up:
Invalid interval. Contig:chr1 start:29534 end:14501
I noticed that the start position is larger than the end position - will that be an issue? I really appreciate your kind help!
Best,
TC
-
Hi tc,
It's not possible to change contig names just by adding a different naming scheme for the contigs, since the start and end positions would also need to be changed. We have a tool for this, LiftOver.
I will follow up with our developer team to figure if this problem from the naming scheme or from the start and end position of the interval. I will get back to you early next week regarding that!
Best,
Genevieve
-
Hi tc,
I followed up with the developers regarding your issue and I have an update. I found out that I was incorrect thinking that the problem was a reference mismatch issue. Your original command should work just fine and you don't need to update the reference versions.
Something is wrong in your segments file because the interval does look invalid. (Contig:chr1 start:29534 end:14501). Could you post your segment file here? If it's too long, I can share with you our bug reporting instructions.
Could you also follow up with the commands you used to create the segments file?
Thank you, and I'm so sorry for leading us astray at first!
Genevieve
-
Hi Genevieve,
I really appreciate all your generous supports.
Basically, I have several tumor samples with unmatched normal samples. I am following GATK's somatic CNV calling workflow.The data I have is whole exome sequencing data. The following starts with the bam files with base quality recalibration through GATK v4.2.3.0 and I am using GATK v4.2.3.0 (and trying more recent versions) for the somatic CNV calling analysis as well. Here are the command lines I used to generate the segment files:
## preprocess interval list
gatk --java-options "-Xmx48g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID" PreprocessIntervals \
-R hs37d5.fa \
-L SeqCap_EZ_Exome_v3_capture_hs37d5.bed \
--bin-length 0 \
--padding 250 \
--interval-merging-rule OVERLAPPING_ONLY \
-O preprocessed_intervals.interval_list## calculate read coverage for each tumor sample
gatk --java-options "-Xmx48g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID" CollectReadCounts \
-I sample.recal.bam \
-L preprocessed_intervals.interval_list \
--interval-merging-rule OVERLAPPING_ONLY \
-O sample.counts.hdf5## create PON with normal samples
gatk --java-options "-Xmx10g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID" CreateReadCountPanelOfNormals \
-I normal1.counts.hdf5 \
-I normal2.counts.hdf5 \
-I normal3.counts.hdf5 \....
-O pon.hdf5
## denoise
gatk --java-options "-Xmx10g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID" DenoiseReadCounts \
-I sample.counts.hdf5 \
--count-panel-of-normals pon.hdf5 \
--standardized-copy-ratios sample.standardizedCR.tsv \
--denoised-copy-ratios sample.denoisedCR.tsv## model segments
gatk --java-options "-Xmx20g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID" ModelSegments \
--denoised-copy-ratios sample.denoisedCR.tsv \
--output-prefix sample \
-O $outSegment \
--number-of-smoothing-iterations-per-fit 0 \
--number-of-changepoints-penalty-factor 1.0 \
--kernel-variance-copy-ratio 0 \
--smoothing-credible-interval-threshold-copy-ratio 2.0## call
gatk --java-options "-Xmx10g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID" CallCopyRatioSegments \
--input $outSegment/sample.cr.seg \
--output $outSegment/sample.called.segAlso, here is the sample.call.seg file associated with the error message I reported:
@HD VN:1.6
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@SQ SN:MT LN:16569
@SQ SN:GL000207.1 LN:4262
@SQ SN:GL000226.1 LN:15008
@SQ SN:GL000229.1 LN:19913
@SQ SN:GL000231.1 LN:27386
@SQ SN:GL000210.1 LN:27682
@SQ SN:GL000239.1 LN:33824
@SQ SN:GL000235.1 LN:34474
@SQ SN:GL000201.1 LN:36148
@SQ SN:GL000247.1 LN:36422
@SQ SN:GL000245.1 LN:36651
@SQ SN:GL000197.1 LN:37175
@SQ SN:GL000203.1 LN:37498
@SQ SN:GL000246.1 LN:38154
@SQ SN:GL000249.1 LN:38502
@SQ SN:GL000196.1 LN:38914
@SQ SN:GL000248.1 LN:39786
@SQ SN:GL000244.1 LN:39929
@SQ SN:GL000238.1 LN:39939
@SQ SN:GL000202.1 LN:40103
@SQ SN:GL000234.1 LN:40531
@SQ SN:GL000232.1 LN:40652
@SQ SN:GL000206.1 LN:41001
@SQ SN:GL000240.1 LN:41933
@SQ SN:GL000236.1 LN:41934
@SQ SN:GL000241.1 LN:42152
@SQ SN:GL000243.1 LN:43341
@SQ SN:GL000242.1 LN:43523
@SQ SN:GL000230.1 LN:43691
@SQ SN:GL000237.1 LN:45867
@SQ SN:GL000233.1 LN:45941
@SQ SN:GL000204.1 LN:81310
@SQ SN:GL000198.1 LN:90085
@SQ SN:GL000208.1 LN:92689
@SQ SN:GL000191.1 LN:106433
@SQ SN:GL000227.1 LN:128374
@SQ SN:GL000228.1 LN:129120
@SQ SN:GL000214.1 LN:137718
@SQ SN:GL000221.1 LN:155397
@SQ SN:GL000209.1 LN:159169
@SQ SN:GL000218.1 LN:161147
@SQ SN:GL000220.1 LN:161802
@SQ SN:GL000213.1 LN:164239
@SQ SN:GL000211.1 LN:166566
@SQ SN:GL000199.1 LN:169874
@SQ SN:GL000217.1 LN:172149
@SQ SN:GL000216.1 LN:172294
@SQ SN:GL000215.1 LN:172545
@SQ SN:GL000205.1 LN:174588
@SQ SN:GL000219.1 LN:179198
@SQ SN:GL000224.1 LN:179693
@SQ SN:GL000223.1 LN:180455
@SQ SN:GL000195.1 LN:182896
@SQ SN:GL000212.1 LN:186858
@SQ SN:GL000222.1 LN:186861
@SQ SN:GL000200.1 LN:187035
@SQ SN:GL000193.1 LN:189789
@SQ SN:GL000194.1 LN:191469
@SQ SN:GL000225.1 LN:211173
@SQ SN:GL000192.1 LN:547496
@SQ SN:NC_007605 LN:171823
@SQ SN:hs37d5 LN:35477943
@RG ID:GATKCopyNumber SM:BCC11
CONTIG START END NUM_POINTS_COPY_RATIO MEAN_LOG2_COPY_RATIO CALL
1 14645 13839497 2764 -0.121225 0
1 13839498 55529537 8713 -0.060943 0
1 55534430 142797736 6763 0.050711 0
1 142803161 143164144 9 -1.797248 -
1 143186822 156929235 3970 -0.077460 0
1 156929872 224009136 8811 0.024671 0
1 224116102 224116470 1 -4.545156 -
1 224124170 249230997 3307 0.004490 0
2 41203 137402680 14122 -0.000470 0
2 137402681 215911009 8594 0.077261 0
2 215914005 243081349 4299 -0.032370 0
3 239031 47038956 5118 0.009681 0
3 47038957 58572997 3763 -0.065127 0
3 58574589 195508491 11551 0.026357 0
3 195510680 197897076 515 -0.098993 0
4 53052 10080924 1749 -0.046775 0
4 10082637 190906382 12100 0.058093 0
5 90287 175512331 14348 0.027925 0
5 175517039 175520499 2 -4.991414 -
5 175523341 180688118 1299 -0.075836 0
6 203091 44268662 7671 -0.042585 0
6 44268663 170893132 9798 0.056956 0
7 192894 6791292 1171 -0.131613 0
7 6797353 55273389 3742 0.044908 0
7 55273390 76070264 1457 -0.129004 0
7 76070803 97488354 1815 0.086610 0
7 97488355 102279932 1639 -0.159168 -
7 102296366 128040290 2071 0.065099 0
7 128040291 158937264 4046 -0.012691 0
8 141912 144808202 10006 0.026475 0
8 144808818 146279801 643 -0.114060 0
9 14454 127563542 9556 0.019478 0
9 127563543 141110154 3851 -0.096781 0
10 92579 5032494 329 -0.000889 0
10 5037206 5038349 2 -14.977845 -
10 5040460 135478219 12754 -0.012301 0
11 179900 4360144 1330 -0.117832 0
11 4388281 45891936 4130 0.042529 0
11 45891937 48267440 788 -0.129053 0
11 48267441 60777081 1489 0.048555 0
11 60777082 72945646 3975 -0.102681 0
11 72945852 116703945 3885 0.041158 0
11 116706169 119599551 1103 -0.096473 0
11 119981668 134606223 1616 0.019748 0
12 67603 148828 21 0.725691 +
12 148829 8213022 1769 -0.054302 0
12 8234585 49130170 3871 0.057302 0
12 49152726 58015869 3675 -0.100486 0
12 58016026 108589849 4192 0.044206 0
12 108589850 133811196 3929 -0.098388 0
13 19041678 115092969 6252 0.039258 0
14 19109923 105415347 10117 -0.012817 0
14 105415623 105417253 8 -8.071341 -
14 105418058 107283528 475 -0.028850 0
15 20083596 21370343 74 0.136001 0
15 21902673 22567374 47 0.836055 +
15 22690734 23603644 148 -0.334553 -
15 23604246 102516646 10885 -0.012914 0
16 66814 90260790 12697 -0.091759 0
17 5671 81188494 18026 -0.082622 0
18 47413 78005554 5151 0.037724 0
19 104308 59094013 17992 -0.147760 0
20 68020 62934886 7982 -0.047934 0
21 9589998 48084509 3554 -0.013765 0
22 16158553 51237569 6915 -0.083246 0
X 200560 155255524 11792 -0.056097 0
Y 4982210 28600517 24 0.164919 +Thank you again for taking care of this:)
Best,
TC
-
Thanks tc. It does look like this is a bug with FuncotateSegments so I have created an issue ticket: https://github.com/broadinstitute/gatk/issues/7676. Our developers will take a closer look there and will work on the solution to this issue.
-
Hi Genevieve,
Hope everything is going well with you. I am writing to follow up with you on this issue. Wondering if there would be any update. I believe your development team would be super busy.
Alternatively, I can use the deprecated tool "oncotator" to do function annotation of the called segments. I would appreciate it very much if you would give any advice on using oncotator vs GATK/FuncotateSegments.
Best,
TC
-
Hi tc,
Thanks for checking in about this. It doesn't look like there is much of an update yet on the bug ticket fix right now. The GATK developers are working on a fix for it but they do have many other open issues as well. Unfortunately, the GATK team does not support Oncotator anymore so I'm unable to provide much guidance on using this tool. You are welcome to give it a try and there may be other users on the forum that have some advice. Please let me know if you need anything else from the GATK team right now.
Kind regards,
Pamela
-
Hi tc, I'm having a similar issue. Did you resolve it or you switched to oncotator?
-
Thanks tc for your insight about how you worked around this issue!
Adel S it looks like it was you who commented on the github thread? The extra information that more users are seeing this will definitely help our developer team! They have not yet had a chance to fix this bug. Any more progress they make will be posted on the github thread!
Thank you both!
Please sign in to leave a comment.
12 comments