Funcotator cannot complete funcotaion for variant due to alternate allele
AnsweredI'm attempting to annotate germline variants after VQSR with Funcotator using GATK 4.1.4.1.
GATK command is:
gatk Funcotator \
-R ${REFERENCE_GENOME} \
-V ${OUT}/germline.filtered.vcf.gz \
-O ${OUT}/annotated.germline.vcf \
--output-file-format VCF \
--data-sources-path /mnt/data/rbueno/analysis_files/MedGenome_FamilialMPMs/Annotation_data_sources/funcotator_dataSources.v1.6.20190124s \
--ref-version hg19
I get many warnings and it terminates with a String index out of range error. Any help is appreciated.
The tail end of the output follows:
07:33:14.569 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756762-69756762 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.580 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756764-69756764 due to alternate allele: *
07:33:14.580 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756764-69756764 due to alternate allele: *
07:33:16.681 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:70289137-70289137 due to alternate allele: *
07:33:16.681 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:70289137-70289137 due to alternate allele: *
07:33:17.957 INFO VcfFuncotationFactory - dbSNP 9606_b150 cache hits/total: 521/453691
07:33:18.138 INFO Funcotator - Shutting down engine
[May 28, 2020 7:33:18 AM EDT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 34.35 minutes.
Runtime.totalMemory()=3822059520
java.lang.StringIndexOutOfBoundsException: String index out of range: 545
at java.lang.String.substring(String.java:1963)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.initializeForInsertion(ProteinChangeInfo.java:256)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.<init>(ProteinChangeInfo.java:93)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.create(ProteinChangeInfo.java:371)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createSequenceComparison(GencodeFuncotationFactory.java:2003)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createCodingRegionFuncotationForProteinCodingFeature(GencodeFuncotationFactory.java:1193)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createExonFuncotation(GencodeFuncotationFactory.java:1044)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:978)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:805)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:789)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.lambda$createGencodeFuncotationsByAllTranscripts$0(GencodeFuncotationFactory.java:474)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationsByAllTranscripts(GencodeFuncotationFactory.java:475)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:530)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:233)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:201)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:172)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:147)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:157)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:903)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:857)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
-
Hi Mark Godek
Actually, the asterisk in the ALT VCF field is OK and it does not throw an exception, all you see are just warnings. Refer to https://samtools.github.io/hts-specs/VCFv4.3.pdf for an explanation of an asterisk in the ALT field (page 8, section 5).
As for the actual StringIndexOutOfBoundsException, you can find the most probable reasons for that here (https://gatk.broadinstitute.org/hc/en-us/community/posts/360060301671-Picard-Liftover-hg38-to-hg19-Error-) in the Bhanu Gandham's answer. You have to make sure the FASTA reference genome, your VCF and the --data-sources-path use the same contig names. Check that all VCFs (yours and in data-sources-path) are sorted, re-sort them just in case.
Such exceptions are usually easily googled using the exception type with the magic word "gatk".
-
Thanks.
Yeah, I imagined it had something to do with the data sources. I'm using b37 for everything but the annotation sources in the broad bundle only have hg19 and hg38.
I was hoping it would be find because --allow-hg19-gencode-b37-contig-matching defaults to true.
Is there a place I can get the b37 data sources or will turning on the online gnomeAD feature help?
Maybe it did the contig matching wrong because I told it the reference was hg19 instead of b37. I just wanted it to run, I'll try again with --ref-version b37.
-
Look, hg19 (GRCh37) uses chrN-like chromosome naming consensus, b37 (same build de facto) uses N-like chromosome naming. I have never worked with Funcotator and I'm not sure how it handles such cases with --ref-version option. However, I can say that the best practice is to use one contig naming convention everywhere. So look into your germline.filtered.vcf.gz file and see which chromosome names are used (chr1, 1, NC_000001.10, etc). Look into the contig names in the reference genome fasta file and in --data-sources-path if possible.
Eventually, you can use bcftools annotate --rename-chrs to rename chromosome names in VCF files.
-
I tried renaming the chromosomes in the filtered variants before annotating and using an hg19 reference and it's still complaining about contigs.
I think it annotated fine with the b37 naming, I'm concerned about the StringIndexOutOfBoundsException because it makes the job script exit with a code 250.
I think the only solution might be to find b37 Annotation_data_sources.
-
Can you confirm that Funcotator does produce some sort of output (-O ${OUT}/annotated.germline.vcf)? I mean that the output VCF contains more than just a header. Since your Funcotator log contains the following
07:33:14.569 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756762-69756762 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.580 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756764-69756764 due to alternate allele: *I guess you get a part o the input data which is processed. Try to find the exact line in the input VCF file (-V ${OUT}/germline.filtered.vcf.gz) which causes an exception. For that look at the last successfully processed entry in ${OUT}/annotated.germline.vcf and run the analysis starting from that line specifying the -L argument with the region you suppose the error comes from. When you find the entry causing an error, I presume you will be able to find out the source of the exception. Otherwise, post the entry here and we'll look into it.
-
12:70747010 was the last successfully annotated position.
12:70747693 causes the error.
12:70753674 would be the next position to be annotated if the exception did not occur.
12 70747010 . G A 84.70 PASS AC=2;AF=0.200;AN=10;DP=6;ExcessHet=0.2482;FS=0.000;InbreedingCoeff=0.3406;MLEAC=3;MLEAF=0.300;MQ=60.00;POSITIVE_TRAIN_SITE;QD=29.67;SOR=0.693;VQSLOD=19.61;culprit=FS GT:AD:DP:GQ:PGT:PID:PL:PS ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,36 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,29 0/0:1,0:1:3:.:.:0,3,33 0/0:1,0:1:3:.:.:0,3,33 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 1|1:0,2:2:6:1|1:70746998_G_A:90,6,0:70746998
12 70747693 . TAA TA,TAAA,TAAAA,TAAAAA,T 819.41 . AC=6,5,5,6,3;AF=0.200,0.167,0.167,0.200,0.100;AN=30;BaseQRankSum=-6.740e-01;DP=665;ExcessHet=4.7216;FS=23.296;InbreedingCoeff=-0.1901;MLEAC=6,5,4,4,3;MLEAF=0.200,0.167,0.133,0.133,0.100;MQ=59.95;MQRankSum=0.00;QD=11.38;ReadPosRankSum=-8.860e-01;SOR=3.180 GT:AD:DP:GQ:PL 1/4:0,3,0,0,2,0:5:18:104,84,166,86,56,99,57,18,74,81,40,0,57,73,77,102,71,108,86,71,125 1/2:2,3,1,1,0,0:7:8:94,56,78,48,0,59,58,8,66,114,64,18,69,128,159,82,36,77,122,141,145 2/4:1,0,2,0,1,0:5:9:89,65,109,49,57,61,48,87,40,78,9,61,0,44,56,65,109,57,87,61,109 1/3:0,1,0,1,0,0:2:10:31,16,20,37,22,61,20,0,39,35,30,10,52,40,51,37,22,61,39,52,61 1/3:3,6,1,1,0,0:11:7:68,7,84,66,26,142,33,0,124,172,55,19,150,198,273,76,44,165,185,227,227 0/3:3,0,0,2,0,0:5:8:20,21,72,8,54,44,0,60,46,65,21,72,54,60,72,21,72,54,60,72,72 3/5:1,0,0,0,0,3:5:8:70,38,48,73,46,108,52,38,97,113,57,32,103,113,133,8,14,16,0,9,45 0/2:1,0,2,1,0,0:4:2:36,36,71,0,18,9,15,48,2,39,37,76,18,51,84,36,71,18,48,76,71 1/4:1,2,1,0,2,0:6:10:57,35,75,39,20,64,28,10,61,81,14,0,49,78,95,57,43,76,79,69,100 0/4:3,0,0,0,3,2:8:24:64,64,152,43,114,97,24,116,90,105,0,105,74,97,120,31,85,40,35,26,172 2/5:0,1,1,0,0,2:4:16:61,35,49,53,18,72,70,37,79,103,74,45,83,109,131,16,26,0,20,30,67 1/2:0,3,1,0,0,0:4:18:53,27,78,38,0,54,55,18,65,103,65,35,69,115,144,63,32,65,98,112,102 0/4:2,0,1,0,0,0:7:10:70,62,138,54,92,83,10,76,48,60,0,85,38,60,77,62,138,92,76,85,138 0/5:1,0,1,0,0,2:4:3:35,51,96,39,65,64,51,96,65,96,51,96,65,96,96,0,46,3,46,46,65 3/4:1,0,0,2,1,0:4:2:27,25,31,21,28,26,2,11,6,7,14,21,21,0,19,25,31,28,11,21,31
12 70753674 . G T 50.91 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;InbreedingCoeff=0.3492;MLEAC=4;MLEAF=1.00;MQ=60.00;POSITIVE_TRAIN_SITE;QD=25.46;SOR=0.693;VQSLOD=20.52;culprit=FS GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 1/1:0,2:2:6:49,6,0 ./.:0,0:0:.:0,0,0Since the Funcotator worked for chromosomes 1-12 up til position 70747693, I tried running the Funcotator -L with an intervals file of chromosomes 13 through Y and it completed fine. What I may end up doing is using -XL 12:70747693-70753673 to exclude those positions in chromosome 12.
Do you know why that site couldn't be annotated?
Unrelated but when I look into the log file, I also got some errors (example below) about variant overlaps. Is this normal or is there a way to correct it? Thanks again.
01:20:34.783 ERROR GencodeFuncotationFactory - Problem creating a GencodeFuncotation on transcript ENST00000377067.3 for variant: chr13:92050910-92050934(CGGAGGCGGCGGCGGCGGCGGCAGT* -> C): Variant overlaps transcript but is not completely contained within it. Funcotator cannot currently handle this case. Transcript: ENST00000377067.3 Variant: [VC Unknown @ chr13:92050910-92050934 Q738.93 of type=INDEL alleles=[CGGAGGCGGCGGCGGCGGCGGCAGT*, *, C] attr={AC=[4, 1], AF=[0.133, 0.033], AN=30, BaseQRankSum=2.74, DP=269, ExcessHet=0.8219, FS=2.617, InbreedingCoeff=0.2396, MLEAC=[4, 1], MLEAF=[0.133, 0.033], MQ=60.00, MQRankSum=0.00, QD=15.08, ReadPosRankSum=0.499, SOR=0.368} GT=GT:AD:DP:GQ:PL 0/0:14,0,0:14:7:0,7,467,7,467,467 0/0:30,0,0:30:72:0,72,1080,72,1080,1080 0/0:19,0,0:19:45:0,45,675,45,675,675 0/0:13,0,0:13:39:0,39,510,39,510,510 0/0:41,0,0:41:99:0,111,1665,111,1665,1665 0/1:9,2,0:11:57:57,0,371,84,377,461 0/1:11,7,0:18:99:262,0,405,295,426,721 0/0:17,0,0:17:45:0,45,675,45,675,675 0/2:8,0,2:10:60:60,84,405,0,321,315 0/0:14,0,0:14:7:0,7,371,7,371,371 0/0:14,0,0:14:39:0,39,585,39,585,585 0/0:35,0,0:35:99:0,99,1485,99,1485,1485 0/0:9,0,0:9:27:0,27,331,27,331,331 1/1:0,10,0:10:31:451,31,0,451,31,451 0/0:10,0,0:10:30:0,30,368,30,368,368 filters=
Edit: yeah I was able to get the full exome annotated with the -XL 12:70747693-70753673 option.
Thanks again.
-
Hi Mark Godek and danilovkiri
The developer of Funcotator is very interested in this issue and is also looking into it. I have created a github issue ticket for it and you can follow its progress here: https://github.com/broadinstitute/gatk/issues/6651
-
Hi Mark Godek,
We have released a Funcotator bug fix for a very similar issue to the one you have here. https://github.com/broadinstitute/gatk/issues/6289.
Could you test the newest GATK version (4.2.3.0) and let us know if it solves your problem as well? We can then mark your issue ticket as solved: https://github.com/broadinstitute/gatk/issues/6651.
Thank you,
Genevieve
Please sign in to leave a comment.
8 comments