FilterFuncotations Duplicate key error
AnsweredHello,
I'm using the `FilterFuncotations` to process the output from the `Functotator` as per this WARP [pipeline](warp/AnnotationFiltration.wdl at cec97750e3819fd88ba382534aaede8e05ec52df · broadinstitute/warp (github.com)).
```
/home/azzaea/software/gatk/gatk-4.2.2.0/gatk --java-options "-Xmx3072m" \
FilterFuncotations \
--variant /scratch/FPTVM/src/warp/pipelines/broad/annotation_filtration/cromwell-executions/AnnotationFiltration/4e3bd06b-3018-4c94-ac98-feb78b924d1f/call-FilterFuncotations/shard-0/inputs/1333115969/104566-001-001.filtered.vcf.funcotated.vcf.gz \
--output 104566-001-001.filtered.vcf.filtered.vcf.gz \
--ref-version hg38 \
--allele-frequency-data-source gnomad --lenient true
```
However, the command fails with the error message below:
```
[October 14, 2021 at 12:20:24 PM CEST] org.broadinstitute.hellbender.tools.funcotator.FilterFuncotations done. Elapsed time: 16.57 minutes.
Runtime.totalMemory()=1134559232
java.lang.IllegalStateException: Duplicate key Gencode_34_annotationTranscript (attempted merging values ENST00000450305.2 and ENST00000456328.2)
at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1603)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at org.broadinstitute.hellbender.tools.funcotator.filtrationRules.AlleleFrequencyUtils.lambda$buildMaxMafRule$1(AlleleFrequencyUtils.java:30)
at org.broadinstitute.hellbender.tools.funcotator.filtrationRules.FuncotationFilter.lambda$checkFilter$0(FuncotationFilter.java:48)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:558)
at org.broadinstitute.hellbender.tools.funcotator.filtrationRules.FuncotationFilter.checkFilter(FuncotationFilter.java:49)
at org.broadinstitute.hellbender.tools.funcotator.FilterFuncotations.lambda$null$0(FilterFuncotations.java:194)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at org.broadinstitute.hellbender.tools.funcotator.FilterFuncotations.lambda$null$1(FilterFuncotations.java:196)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.broadinstitute.hellbender.tools.funcotator.FilterFuncotations.lambda$getMatchingFilters$2(FilterFuncotations.java:192)
at java.base/java.util.HashMap$Values.forEach(HashMap.java:976)
at org.broadinstitute.hellbender.tools.funcotator.FilterFuncotations.getMatchingFilters(FilterFuncotations.java:191)
at org.broadinstitute.hellbender.tools.funcotator.FilterFuncotations.secondPassApply(FilterFuncotations.java:174)
at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.nthPassApply(TwoPassVariantWalker.java:19)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverse$0(MultiplePassVariantWalker.java:40)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverseVariants$1(MultiplePassVariantWalker.java:77)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverseVariants(MultiplePassVariantWalker.java:75)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:40)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
```
I'm not sure how to work around this- Is it due to `--transcript-selection-mode` in the Functotator or something else is going on? would you kindly advise?
Thank you,
Azza
-
Hi Azza,
We took a look at the stack trace and this looks to be a GATK bug in FilterFuncotations. There are two transcripts for the same gene (DDX11L1: ENST00000450305.2 and ENST00000456328.2) and the way that this code is written assumes that each gene only has one transcript.
I created a ticket for our development team to fix this bug here. However, since this is an experimental tool, it is not our highest priority to solve first. You can follow along with the ticket for when it will be solved.
Thank you for writing into the forum!
Best,
Genevieve
-
Azza Ahmed the team was able to get to this quite quickly, the PR fix is here and will be merged after some reviews.
If you want to test that it works ahead of time, you can download the GATK branch tb_fix_build_max_maf_rule and run FilterFuncotations from that version.
-
Great! Thank you very much.
I will experiment with it and get back to you.
-
Thank you Azza Ahmed! It will definitely help with our testing.
-
Thank you again for the quick fix. I’m happy to confirm FilterFunctotator now resolves such transcript issues gracefully, and the pipeline runs to completion successfully- producing expected outputs.
I note however that all the variants in my file (1 sample, WGS) are annotated as NOT_CLINSIG. I wonder why/how.
Looking at the logs from the Functotator itself, I note the warnings and errors below- are they normal/benign?
Again, much due gratitude for your help.
Azza
-
These warnings are fine, they are just indicating at these sites with an alternate allele of a spanning deletion are not able to be annotated functionally: https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele-
-
Azza Ahmed thank you for your help in testing the PR! The fix has been successfully merged and is in our newest release of GATK, 4.2.3.0: https://gatk.broadinstitute.org/hc/en-us/articles/4409678362139
Please sign in to leave a comment.
7 comments