Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

java.lang.IllegalArgumentException: Invalid interval in FuncotateSegments

0

12 comments

  • Avatar
    tc

    It seems not to be resolved — still waiting for the GATK team.

    I switched to oncotator instead since I used GRCh37 reference genome for that dataset, and it worked well!

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi tc,

    It looks like you have a reference mismatch issue happening with your files. The error message seems to indicate that there is a contig called chr1 in your funcotator data sources interval:

    java.lang.IllegalArgumentException: Invalid interval. Contig:chr1 start:29534 end:14501
        at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:804)

    If your segment file has contigs named 1 and 2, they won't match up with the chr1 naming convention. Make sure the reference versions match for all of your files!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    tc

    Hi Genevieve,

    Thank you for looking into my issue. Yes, the reference genome I used is b37, where contigs are 1,2,3,..... When running FunctateSegments, the following messages pop up:

    12:37:55.542 INFO  FuncotatorEngine - VCF sequence dictionary detected as B37 in HG19 annotation mode.  Performing conversion.
    12:37:55.542 WARN  FuncotatorEngine - WARNING: You are using B37 as a reference.  Funcotator will convert your variants to GRCh37, and this will be fine in the vast majority of cases.  There MAY be some errors (e.g. in the Y chromosome, but possibly in other places as well) due to changes between the two references.

    So I assume it has already taken care of the inconsistency between my genome build and the data source genome build, correct? 

    Thanks,

    TC 

     

     

     

    0
    Comment actions Permalink
  • Avatar
    tc

    Hi Genevieve,

    Sorry for another message. I also tried to convert the contig names in b37 to those in hg19 (by simply adding chr, so 1 will be converted to chr1). After that, I re-ran FuncotateSegments with the modified fasta and segment files, and again the same error message showed up:

    Invalid interval. Contig:chr1 start:29534 end:14501

    I noticed that the start position is larger than the end position - will that be an issue? I really appreciate your kind help!

    Best,

    TC

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi tc,

    It's not possible to change contig names just by adding a different naming scheme for the contigs, since the start and end positions would also need to be changed. We have a tool for this, LiftOver

    I will follow up with our developer team to figure if this problem from the naming scheme or from the start and end position of the interval. I will get back to you early next week regarding that!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi tc,

    I followed up with the developers regarding your issue and I have an update. I found out that I was incorrect thinking that the problem was a reference mismatch issue. Your original command should work just fine and you don't need to update the reference versions.

    Something is wrong in your segments file because the interval does look invalid. (Contig:chr1 start:29534 end:14501). Could you post your segment file here? If it's too long, I can share with you our bug reporting instructions.

    Could you also follow up with the commands you used to create the segments file?

    Thank you, and I'm so sorry for leading us astray at first!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    tc

    Hi Genevieve,

    I really appreciate all your generous supports. 

    Basically, I have several tumor samples with unmatched normal samples. I am following GATK's somatic CNV calling workflow.The data I have is whole exome sequencing data. The following starts with the bam files with base quality recalibration through GATK v4.2.3.0 and I am using GATK v4.2.3.0 (and trying more recent versions) for the somatic CNV calling analysis as well. Here are the command lines I used to generate the segment files:

    ## preprocess interval list

    gatk --java-options "-Xmx48g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID"  PreprocessIntervals \
        -R hs37d5.fa \
        -L SeqCap_EZ_Exome_v3_capture_hs37d5.bed \
        --bin-length 0 \
        --padding 250 \
        --interval-merging-rule OVERLAPPING_ONLY \
        -O preprocessed_intervals.interval_list

    ## calculate read coverage for each tumor sample
    gatk --java-options "-Xmx48g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID"  CollectReadCounts \
    -I sample.recal.bam \
    -L preprocessed_intervals.interval_list \
    --interval-merging-rule OVERLAPPING_ONLY \
    -O sample.counts.hdf5

    ## create PON with normal samples

    gatk --java-options "-Xmx10g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID"  CreateReadCountPanelOfNormals \
    -I normal1.counts.hdf5 \
    -I normal2.counts.hdf5 \
    -I normal3.counts.hdf5 \

    ....

    -O pon.hdf5

    ## denoise

    gatk --java-options "-Xmx10g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID"  DenoiseReadCounts \
    -I sample.counts.hdf5 \
    --count-panel-of-normals pon.hdf5 \
    --standardized-copy-ratios sample.standardizedCR.tsv \
    --denoised-copy-ratios sample.denoisedCR.tsv

    ## model segments

    gatk --java-options "-Xmx20g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID"  ModelSegments \
    --denoised-copy-ratios sample.denoisedCR.tsv \
    --output-prefix sample \
    -O $outSegment  \
    --number-of-smoothing-iterations-per-fit 0 \
    --number-of-changepoints-penalty-factor 1.0 \
    --kernel-variance-copy-ratio 0 \
    --smoothing-credible-interval-threshold-copy-ratio 2.0

    ## call
    gatk --java-options "-Xmx10g -Djava.io.tmpdir=/lscratch/$SLURM_JOBID"  CallCopyRatioSegments \
    --input $outSegment/sample.cr.seg \
    --output $outSegment/sample.called.seg

     

    Also, here is the sample.call.seg file associated with the error message I reported:

     

    @HD    VN:1.6
    @SQ    SN:1    LN:249250621
    @SQ    SN:2    LN:243199373
    @SQ    SN:3    LN:198022430
    @SQ    SN:4    LN:191154276
    @SQ    SN:5    LN:180915260
    @SQ    SN:6    LN:171115067
    @SQ    SN:7    LN:159138663
    @SQ    SN:8    LN:146364022
    @SQ    SN:9    LN:141213431
    @SQ    SN:10    LN:135534747
    @SQ    SN:11    LN:135006516
    @SQ    SN:12    LN:133851895
    @SQ    SN:13    LN:115169878
    @SQ    SN:14    LN:107349540
    @SQ    SN:15    LN:102531392
    @SQ    SN:16    LN:90354753
    @SQ    SN:17    LN:81195210
    @SQ    SN:18    LN:78077248
    @SQ    SN:19    LN:59128983
    @SQ    SN:20    LN:63025520
    @SQ    SN:21    LN:48129895
    @SQ    SN:22    LN:51304566
    @SQ    SN:X    LN:155270560
    @SQ    SN:Y    LN:59373566
    @SQ    SN:MT    LN:16569
    @SQ    SN:GL000207.1    LN:4262
    @SQ    SN:GL000226.1    LN:15008
    @SQ    SN:GL000229.1    LN:19913
    @SQ    SN:GL000231.1    LN:27386
    @SQ    SN:GL000210.1    LN:27682
    @SQ    SN:GL000239.1    LN:33824
    @SQ    SN:GL000235.1    LN:34474
    @SQ    SN:GL000201.1    LN:36148
    @SQ    SN:GL000247.1    LN:36422
    @SQ    SN:GL000245.1    LN:36651
    @SQ    SN:GL000197.1    LN:37175
    @SQ    SN:GL000203.1    LN:37498
    @SQ    SN:GL000246.1    LN:38154
    @SQ    SN:GL000249.1    LN:38502
    @SQ    SN:GL000196.1    LN:38914
    @SQ    SN:GL000248.1    LN:39786
    @SQ    SN:GL000244.1    LN:39929
    @SQ    SN:GL000238.1    LN:39939
    @SQ    SN:GL000202.1    LN:40103
    @SQ    SN:GL000234.1    LN:40531
    @SQ    SN:GL000232.1    LN:40652
    @SQ    SN:GL000206.1    LN:41001
    @SQ    SN:GL000240.1    LN:41933
    @SQ    SN:GL000236.1    LN:41934
    @SQ    SN:GL000241.1    LN:42152
    @SQ    SN:GL000243.1    LN:43341
    @SQ    SN:GL000242.1    LN:43523
    @SQ    SN:GL000230.1    LN:43691
    @SQ    SN:GL000237.1    LN:45867
    @SQ    SN:GL000233.1    LN:45941
    @SQ    SN:GL000204.1    LN:81310
    @SQ    SN:GL000198.1    LN:90085
    @SQ    SN:GL000208.1    LN:92689
    @SQ    SN:GL000191.1    LN:106433
    @SQ    SN:GL000227.1    LN:128374
    @SQ    SN:GL000228.1    LN:129120
    @SQ    SN:GL000214.1    LN:137718
    @SQ    SN:GL000221.1    LN:155397
    @SQ    SN:GL000209.1    LN:159169
    @SQ    SN:GL000218.1    LN:161147
    @SQ    SN:GL000220.1    LN:161802
    @SQ    SN:GL000213.1    LN:164239
    @SQ    SN:GL000211.1    LN:166566
    @SQ    SN:GL000199.1    LN:169874
    @SQ    SN:GL000217.1    LN:172149
    @SQ    SN:GL000216.1    LN:172294
    @SQ    SN:GL000215.1    LN:172545
    @SQ    SN:GL000205.1    LN:174588
    @SQ    SN:GL000219.1    LN:179198
    @SQ    SN:GL000224.1    LN:179693
    @SQ    SN:GL000223.1    LN:180455
    @SQ    SN:GL000195.1    LN:182896
    @SQ    SN:GL000212.1    LN:186858
    @SQ    SN:GL000222.1    LN:186861
    @SQ    SN:GL000200.1    LN:187035
    @SQ    SN:GL000193.1    LN:189789
    @SQ    SN:GL000194.1    LN:191469
    @SQ    SN:GL000225.1    LN:211173
    @SQ    SN:GL000192.1    LN:547496
    @SQ    SN:NC_007605    LN:171823
    @SQ    SN:hs37d5    LN:35477943
    @RG    ID:GATKCopyNumber    SM:BCC11
    CONTIG    START    END    NUM_POINTS_COPY_RATIO    MEAN_LOG2_COPY_RATIO    CALL
    1    14645    13839497    2764    -0.121225    0
    1    13839498    55529537    8713    -0.060943    0
    1    55534430    142797736    6763    0.050711    0
    1    142803161    143164144    9    -1.797248    -
    1    143186822    156929235    3970    -0.077460    0
    1    156929872    224009136    8811    0.024671    0
    1    224116102    224116470    1    -4.545156    -
    1    224124170    249230997    3307    0.004490    0
    2    41203    137402680    14122    -0.000470    0
    2    137402681    215911009    8594    0.077261    0
    2    215914005    243081349    4299    -0.032370    0
    3    239031    47038956    5118    0.009681    0
    3    47038957    58572997    3763    -0.065127    0
    3    58574589    195508491    11551    0.026357    0
    3    195510680    197897076    515    -0.098993    0
    4    53052    10080924    1749    -0.046775    0
    4    10082637    190906382    12100    0.058093    0
    5    90287    175512331    14348    0.027925    0
    5    175517039    175520499    2    -4.991414    -
    5    175523341    180688118    1299    -0.075836    0
    6    203091    44268662    7671    -0.042585    0
    6    44268663    170893132    9798    0.056956    0
    7    192894    6791292    1171    -0.131613    0
    7    6797353    55273389    3742    0.044908    0
    7    55273390    76070264    1457    -0.129004    0
    7    76070803    97488354    1815    0.086610    0
    7    97488355    102279932    1639    -0.159168    -
    7    102296366    128040290    2071    0.065099    0
    7    128040291    158937264    4046    -0.012691    0
    8    141912    144808202    10006    0.026475    0
    8    144808818    146279801    643    -0.114060    0
    9    14454    127563542    9556    0.019478    0
    9    127563543    141110154    3851    -0.096781    0
    10    92579    5032494    329    -0.000889    0
    10    5037206    5038349    2    -14.977845    -
    10    5040460    135478219    12754    -0.012301    0
    11    179900    4360144    1330    -0.117832    0
    11    4388281    45891936    4130    0.042529    0
    11    45891937    48267440    788    -0.129053    0
    11    48267441    60777081    1489    0.048555    0
    11    60777082    72945646    3975    -0.102681    0
    11    72945852    116703945    3885    0.041158    0
    11    116706169    119599551    1103    -0.096473    0
    11    119981668    134606223    1616    0.019748    0
    12    67603    148828    21    0.725691    +
    12    148829    8213022    1769    -0.054302    0
    12    8234585    49130170    3871    0.057302    0
    12    49152726    58015869    3675    -0.100486    0
    12    58016026    108589849    4192    0.044206    0
    12    108589850    133811196    3929    -0.098388    0
    13    19041678    115092969    6252    0.039258    0
    14    19109923    105415347    10117    -0.012817    0
    14    105415623    105417253    8    -8.071341    -
    14    105418058    107283528    475    -0.028850    0
    15    20083596    21370343    74    0.136001    0
    15    21902673    22567374    47    0.836055    +
    15    22690734    23603644    148    -0.334553    -
    15    23604246    102516646    10885    -0.012914    0
    16    66814    90260790    12697    -0.091759    0
    17    5671    81188494    18026    -0.082622    0
    18    47413    78005554    5151    0.037724    0
    19    104308    59094013    17992    -0.147760    0
    20    68020    62934886    7982    -0.047934    0
    21    9589998    48084509    3554    -0.013765    0
    22    16158553    51237569    6915    -0.083246    0
    X    200560    155255524    11792    -0.056097    0
    Y    4982210    28600517    24    0.164919    +

     

    Thank you again for taking care of this:)

    Best,

    TC

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks tc. It does look like this is a bug with FuncotateSegments so I have created an issue ticket: https://github.com/broadinstitute/gatk/issues/7676. Our developers will take a closer look there and will work on the solution to this issue.

    0
    Comment actions Permalink
  • Avatar
    tc

    Hi Genevieve,

    Hope everything is going well with you. I am writing to follow up with you on this issue. Wondering if there would be any update. I believe your development team would be super busy.

    Alternatively, I can use the deprecated tool "oncotator" to do function annotation of the called segments. I would appreciate it very much if you would give any advice on using oncotator vs GATK/FuncotateSegments.

    Best,

    TC

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi tc,

    Thanks for checking in about this. It doesn't look like there is much of an update yet on the bug ticket fix right now. The GATK developers are working on a fix for it but they do have many other open issues as well. Unfortunately, the GATK team does not support Oncotator anymore so I'm unable to provide much guidance on using this tool. You are welcome to give it a try and there may be other users on the forum that have some advice. Please let me know if you need anything else from the GATK team right now. 

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Adel S

    Hi tc, I'm having a similar issue. Did you resolve it or you switched to oncotator?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks tc for your insight about how you worked around this issue!

    Adel S it looks like it was you who commented on the github thread? The extra information that more users are seeing this will definitely help our developer team! They have not yet had a chance to fix this bug. Any more progress they make will be posted on the github thread! 

    Thank you both!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk