CombineGVCFs 4.1.2.0 throws "java.lang.IllegalArgumentException: Features added out of order:"
AnsweredHello GATK team,
I'm trying to combine GVCF with 4.1.2.0 that were generated with 4.1.1.0. But I got an error. I saw some reference about this error in an old post saying it should have been fixed (?).
Thank you for your help.
this is my command.
/ccc/work/cont007/fg0019/lindenbp/packages/gatk/gatk-4.1.2.0/gatk \
--java-options "-Djava.io.tmpdir=." CombineGVCFs \
-R "/ccc/work/cont007/fg/fg/biobank/by-taxonid/9606/hs37d5/hs37d5_all_chr.fasta" \
--dbsnp "/ccc/work/cont007/fg/fg/biobank/by-taxonid/9606/hs37d5/variants/hs37d5_all_chr_dbsnp-142.vcf" \
-L "/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/f5/2c638754977e7a00dc05d16cc1802a/TMP/cluster.000000083.bed" \
-V "/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/6a/a94b20b19c148af67a2ff6338746db/chunck.aaaaaaaah.list" \
-O "combine0.g.vcf.gz"
and the stacktrace:
09:55:36.426 INFO FeatureManager - Using codec VCFCodec to read file file:///ccc/scratch/cont007/fg0156/lindenbp/20200305/work/b1/
b631a369db30ef069b97f8a46f9bf8/sample.g.vcf.gz
09:55:37.952 INFO FeatureManager - Using codec BEDCodec to read file file:///ccc/scratch/cont007/fg0156/lindenbp/20200305/work/f5/
2c638754977e7a00dc05d16cc1802a/TMP/cluster.000000083.bed
09:55:37.969 INFO IntervalArgumentCollection - Processing 1173466 bp from intervals
09:55:37.978 INFO CombineGVCFs - Done initializing engine
09:55:37.997 INFO ProgressMeter - Starting traversal
09:55:37.997 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
09:55:38.656 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at
location chr1:1293764 the annotation MLEAC=[1, 0] was not a numerical value and was ignored
09:55:48.005 INFO ProgressMeter - chr1:86115155 0.2 119000 713429.3
09:55:58.321 INFO ProgressMeter - chr1:197086900 0.3 279000 823656.8
09:56:08.599 INFO ProgressMeter - chr2:75889783 0.5 407000 797987.1
09:56:18.755 INFO ProgressMeter - chr2:198650852 0.7 496000 730163.4
09:56:28.112 INFO CombineGVCFs - Shutting down engine
[March 17, 2020 9:56:28 AM CET] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 0.92 minutes.
Runtime.totalMemory()=1137180672
java.lang.IllegalArgumentException: Features added out of order: previous (TabixFeature{referenceIndex=1, start=241726030, end=241726030, featureStartFilePosition=510745015674, featureEndFilePosition=-1}) > next (TabixFeature{referenceIndex=1, start=440531, end=440531, featureStartFilePosition=510745016107, featureEndFilePosition=-1})
at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:89)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.add(IndexingVariantContextWriter.java:203)
at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:240)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.endPreviousStates(CombineGVCFs.java:407)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.mergeWithNewVCs(CombineGVCFs.java:325)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.apply(CombineGVCFs.java:164)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:68)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:119)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:117)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.traverse(MultiVariantWalkerGroupedOnStart.java:122)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1039)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
it that helps, here is the content of the bed:
$ cat /ccc/scratch/cont007/fg0156/lindenbp/20200305/work/f5/2c638754977e7a00dc05d16cc1802a/TMP/cluster.000000083.bed | grep 241726030 -C 1
chr2 241696635 241697111
chr2 241725651 241726030
chr3 440599 440931
and the variants in the region:
#/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/4e/991c286a73ab17db54a397eac7b67f/sample.g.vcf.gz
chr2 241725649 . A <NON_REF> . . END=241725650 GT:DP:GQ:MIN_DP:PL 0/0:33:85:33:0,85,1
038
chr2 241725651 . G <NON_REF> . . END=241725651 GT:DP:GQ:MIN_DP:PL 0/0:31:44:31:0,44,9
39
chr2 241725652 . G <NON_REF> . . END=241725652 GT:DP:GQ:MIN_DP:PL 0/0:33:85:33:0,85,1
068
chr2 241726029 . A <NON_REF> . . END=241726029 GT:DP:GQ:MIN_DP:PL 0/0:53:0:53:0,0,120
5
chr2 241726030 . C <NON_REF> . . END=241726037 GT:DP:GQ:MIN_DP:PL 0/0:53:99:48:0,120,
1800
#/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/c6/9bb682d0263c03b92f66e03df52597/sample.g.vcf.gz
chr2 241725650 . T <NON_REF> . . END=241725657 GT:DP:GQ:MIN_DP:PL 0/0:29:81:27:0,81,878
chr2 241726029 . A <NON_REF> . . END=241726029 GT:DP:GQ:MIN_DP:PL 0/0:30:50:30:0,50,792
chr2 241726030 . C <NON_REF> . . END=241726030 GT:DP:GQ:MIN_DP:PL 0/0:29:62:29:0,62,858
chr2 241726031 . C <NON_REF> . . END=241726031 GT:DP:GQ:MIN_DP:PL 0/0:29:87:29:0,87,947
#/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/98/13132f89f22d0a48d091faa1d1e18d/sample.g.vcf.gz
chr2 241725650 . T <NON_REF> . . END=241725650 GT:DP:GQ:MIN_DP:PL 0/0:29:73:29:0,73,868
chr2 241725651 . G <NON_REF> . . END=241725652 GT:DP:GQ:MIN_DP:PL 0/0:28:81:27:0,81,899
chr2 241726029 . A <NON_REF> . . END=241726029 GT:DP:GQ:MIN_DP:PL 0/0:38:0:38:0,0,795
chr2 241726030 . C <NON_REF> . . END=241726035 GT:DP:GQ:MIN_DP:PL 0/0:46:99:41:0,106,1315
#/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/4a/1fd2504809538be5d0dcdd7fac4171/sample.g.vcf.gz
chr2 241725650 . T <NON_REF> . . END=241725650 GT:DP:GQ:MIN_DP:PL 0/0:35:91:35:0,91,1
128
chr2 241725651 . G <NON_REF> . . END=241725651 GT:DP:GQ:MIN_DP:PL 0/0:34:77:34:0,77,1
072
chr2 241725652 . G <NON_REF> . . END=241725653 GT:DP:GQ:MIN_DP:PL 0/0:36:99:35:0,105,
1096
chr2 241726025 . AAGGACCCCTGAGTGAGGAGATGGGGGCCGCCATC A,<NON_REF> 243.6 . BaseQRankSum=0.964;DP=33;Ex
cessHet=3.0103;MLEAC=1,0;MLEAF=0.5,0;MQRankSum=-1.417;RAW_MQandDP=118009,33;ReadPosRankSum=-1.226 GT:AD:DP:GQ:PL:SB 0/1
:18,8,0:26:99:251,0,4896,305,4922,5227:10,8,5,3
#/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/67/7a306f39c15eb49a3bb8aad24a1785/sample.g.vcf.gz
chr2 241725650 . T <NON_REF> . . END=241725650 GT:DP:GQ:MIN_DP:PL 0/0:37:99:37:0,111,1216
chr2 241725651 . G <NON_REF> . . END=241725651 GT:DP:GQ:MIN_DP:PL 0/0:37:76:37:0,76,1126
chr2 241725652 . G <NON_REF> . . END=241725652 GT:DP:GQ:MIN_DP:PL 0/0:37:99:37:0,111,1215
chr2 241726026 . A <NON_REF> . . END=241726035 GT:DP:GQ:MIN_DP:PL 0/0:39:99:36:0,102,1290
#/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/ee/cf3c0be51cad2e6dd28d616ff370dc/sample.g.vcf.gz
chr2 241725632 . C <NON_REF> . . END=241725659 GT:DP:GQ:MIN_DP:PL 0/0:41:99:34:0,102,1142
chr2 241726025 . A <NON_REF> . . END=241726037 GT:DP:GQ:MIN_DP:PL 0/0:42:99:41:0,108,1133
#/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/8a/fb8c783c8df00d8549822c752a351e/sample.g.vcf.gz
chr2 241725648 . G <NON_REF> . . END=241725652 GT:DP:GQ:MIN_DP:PL 0/0:37:99:35:0,102,1530
chr2 241726029 . A <NON_REF> . . END=241726029 GT:DP:GQ:MIN_DP:PL 0/0:54:34:54:0,34,1380
chr2 241726030 . C <NON_REF> . . END=241726037 GT:DP:GQ:MIN_DP:PL 0/0:59:99:55:0,120,1800
#/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/b1/b631a369db30ef069b97f8a46f9bf8/sample.g.vcf.gz
chr2 241725650 . T <NON_REF> . . END=241725652 GT:DP:GQ:MIN_DP:PL 0/0:24:72:24:0,72,753
chr2 241726029 . A <NON_REF> . . END=241726029 GT:DP:GQ:MIN_DP:PL 0/0:45:81:45:0,81,1225
chr2 241726030 . C <NON_REF> . . END=241726035 GT:DP:GQ:MIN_DP:PL 0/0:50:99:40:0,120,1310
thank you for your help !
-
Hi Yokofakun
A few things to try:
- Please ensure that you are using the same interval file that you used to generate the gvcfs.
- Try running combinegvcf without the -L file. If you have used -L in the previous steps, you don't need to provide it with the combinegvcf step.
- Try to upgrade to the latest version of GATK4.1.5.0 and let me know if the issue persists.
-
Hi Bhanu,
thank you for your answer
> Try to upgrade to the latest version of GATK4.1.5.0 and let me know if the issue persists.
I upgraded and I've got an error (I'm not testing the very same interval than above in my workflow)
```
14:17:35.396 INFO IntervalArgumentCollection - Processing 3613248 bp from intervals
14:17:35.403 INFO CombineGVCFs - Done initializing engine
14:17:35.421 INFO ProgressMeter - Starting traversal
14:17:35.421 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
14:17:35.619 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contex
ts at location chr1:1474167 the annotation MLEAC=[1, 0] was not a numerical value and was ignored
14:17:45.426 INFO ProgressMeter - chr1:150267061 0.2 459000 2752898.8
14:17:55.448 INFO ProgressMeter - chr2:33717365 0.3 944000 2828323.2
14:18:05.450 INFO ProgressMeter - chr2:171673672 0.5 1416000 2829359.3
14:18:15.452 INFO ProgressMeter - chr3:33441658 0.7 1756000 2632026.0
14:18:25.657 INFO ProgressMeter - chr3:124214286 0.8 1989000 2375681.8
14:18:35.950 INFO ProgressMeter - chr3:186569880 1.0 2162000 2143175.8
14:18:39.714 INFO CombineGVCFs - Shutting down engine
[March 19, 2020 2:18:39 PM CET] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 1.12 minutes.
Runtime.totalMemory()=1190658048
java.lang.IllegalArgumentException: Invalid interval. Contig:chr4 start:419635 end:419578
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:733)
at org.broadinstitute.hellbender.utils.SimpleInterval.validatePositions(SimpleInterval.java:59)
at org.broadinstitute.hellbender.utils.SimpleInterval.<init>(SimpleInterval.java:35)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.apply(CombineGVCFs.java:162)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:131)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:106)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:120)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
```with :
```
#!/bin/bash -ue
/ccc/work/cont007/fg0019/lindenbp/packages/gatk/gatk/gatk --java-options " -Xmx5g -Djava.io.tmpdir=." CombineGVCFs \
-R "/ccc/work/cont007/fg/fg/biobank/by-taxonid/9606/hs37d5/hs37d5_all_chr.fasta" \
--dbsnp "/ccc/work/cont007/fg/fg/biobank/by-taxonid/9606/hs37d5/variants/hs37d5_all_chr_dbsnp-142.vcf" \
-L /ccc/scratch/cont007/fg0156/lindenbp/20200305/work/f5/3ab29685dff61bdd8f1bef1f3d3245/TMP/cluster.000000092.bed \
-V "/ccc/scratch/cont007/fg0156/lindenbp/20200305/work/6a/a94b20b19c148af67a2ff6338746db/chunck.aaaaaaaah.list" \
-O "combine0.g.vcf.gz"```
> Please ensure that you are using the same interval file that you used to generate the gvcfs.
the original gvcfs where called genome-wide, without interval
> Try running combinegvcf without the -L file.
when I only keep the bed records on chr4 from my bed file (`grep chr4 /ccc/scratch/cont007/fg0156/lindenbp/20200305/work/f5/3ab29685dff61bdd8f1bef1f3d3245/TMP/cluster.000000092.bed`) . It works ! (??)
-
Hi Yokofakun
Ah, ok so the problem is that your intervals are not sorted correctly. As you can see in the error,
Invalid interval. Contig:chr4 start:419635 end:419578
the contig end is smaller that the start.
Typically we use the same intervals from the HaplotypeCaller step on the CombineGVCFs step. Since you do not use any intervals with HaplotypeCaller, your CombineGVCFs should also either not use any intervals or only intervals at the contig level to avoid this error.
-
ok, I see, thanks.
-
I am getting this error even after I sort the vcf file before `LeftAlignAndTrimVariants`. I get this below but there IS NOT even a variant that starts at 1017460. It doesn't exist. This is also a vardict vcf file. Sometime this step fails for sample and other times I get this type of error:
java.lang.IllegalArgumentException: Features added out of order: previous (TabixFeature{referenceIndex=9, start=1017460, end=1017632, featureStartFilePosition=439472982240, featureEndFilePosition=-1}) > next (TabixFeature{referenceIndex=9, start=1017458, end=1017474, featureStartFilePosition=439472982865, featureEndFilePosition=-1})
-
FYI, I tried this with version 3.6-0-g89b7209 and it worked so it's something that was introduced after that.
-
Brian Wiley we only support GATK4 at this point so if you want to look into different versions we just stick with GATK4. Which GATK4 version got the above error? Could you also please provide your command line?
-
Thanks Genevieve,
This is in version from broadinstitute/gatk:4.1.8.1 but it also happens in version 4.2.0.0. Here is my command line:
java \
-Dsamjdk.use_async_io_read_samtools=false \
-Dsamjdk.use_async_io_write_samtools=true \
-Dsamjdk.use_async_io_write_tribble=false \
-Dsamjdk.compression_level=2 \
-Xmx8g -jar /gatk/gatk-package-4.1.8.1-local.jar LeftAlignAndTrimVariants \
-O /cromwell-executions/CH_exome_Final.cwl/31b68957-8aca-4310-98b8-1e5673d25193/call-vardict/vardict.cwl/4ba30c6b-0d8a-4ce3-b0f8-491aba74e8c3/call-filter/fp_filter.cwl/e2fc7715-80cd-43a6-808e-2fdb859c79d8/call-normalize_variants/execution/normalized.vcf.gz \
-R GRCh38_full_analysis_set_plus_decoy_hla.fa
-V /cromwell-executions/CH_exome_Final.cwl/31b68957-8aca-4310-98b8-1e5673d25193/call-vardict/vardict.cwl/4ba30c6b-0d8a-4ce3-b0f8-491aba74e8c3/call-filter/fp_filter.cwl/e2fc7715-80cd-43a6-808e-2fdb859c79d8/call-normalize_variants/inputs/1971875813/merged.sanitized.vcf.gzThis happens also on an un-sanitized vcf, i.e. just concat of the intervals vcfs from vardict I get this error. I meant to say earlier this happens only in GATK4 and not in GATK3 so something was introduced with respect to manipulating the start positions to not be exactly as they are in the vcf file in version 4.
-
Brian Wiley could you try using SortVcf to reorder before LeftAlignAndTrimVariants in case there are issues with the order of the variants?
This is most likely unrelated to this issue, but we do recommend that you run gatk using the gatk wrapper script and not with the java -jar usage. Strange errors can come up when you do not use the gatk wrapper script.
Please sign in to leave a comment.
9 comments