Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Funcotator errors

0

27 comments

  • Avatar
    Jason Cerrato

    Hi Mia,

    Thanks for writing in. The error message suggests that the BAM or SAM is malformed—can you confirm that your file is valid using ValidateSamFile?

    https://gatk.broadinstitute.org/hc/en-us/articles/360042478272-ValidateSamFile-Picard-

    Many thanks,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    Thanks. Not sure that I understand because I am not using BAM nor SAM files in this workflow at all - my input file is a VCF that was produced by Mutect2.. you can see how was the analysis set up following the instructions in my post above.

     

    Please let me know and many thanks,

    Mia 

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Sorry for the confusion there—in this case you may want to validate the VCF to ensure there isn't anything malformed about it: https://gatk.broadinstitute.org/hc/en-us/articles/360042914291-ValidateVariants

    I also see in the log that there's this warning:

    21:50:43.053 WARN  FuncotatorEngine - WARNING: You are using B37 as a reference.  Funcotator will convert your variants to GRCh37, and this will be fine in the vast majority of cases.  There MAY be some errors (e.g. in the Y chromosome, but possibly in other places as well) due to changes between the two references.

    I can't say for certain whether this is the underlying issue, but ValidateVariants should be fairly revealing in what's wrong with the VCF, if anything.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Thanks Jason: 

    1) The VCF I am trying to annotate was generated using Mutect_pon workflow. Bam files were in hg19 reference, but the ref_fai, ref_fasta and ref_dict arguments I specified used b37 reference - can you please check with the developers if this may be giving me the issue now when I am trying to Funcotate the resulting PON vcf?

    2) Would it please be possible to link me to public files of ref_fai, ref_fasta and ref_dict arguments (required for all sorts of GATK workflows) for both h19 and GRCh37 references? 

    3) Would it please be possible to let me know or point me to a documentation regarding how compatible are GRCh37, hg19 and b37 references across the GATK pipelines - in terms of both expected results and running one against another. For example:

    3a) can I use PON from b37 in Mutect2 analysis using bams in GRCh37?

    3b) If bam files are mapped to one reference (GRCh37, hg19 or b37) do ref_fai, ref_fasta and ref_dict arguments used in Mutect have to match the reference used in mapping or any from the three reference types can be used? I understand that reference needs to be matching between GRCh37 and GRCh38, but not sure what is the case for different versions of GRCh37 (i.e. hg19 and b37)

    Thanks

    Mia

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    2) You can find the resources in the GATK Resource Bundle. The files of interest are found in the Broad-owned bucket gs://gcp-public-data--broad-references. You can access these files in the console by going here: https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references

    You can find the exact path for the file by examining the file within the workspace (such as in one of the featured workspaces).

     

    For your other questions, can you take a look at GATK's Human genome reference builds documentation, particularly the section titled Legacy assemblies, and let me know if/which questions remain?

    Many thanks,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    Thanks for looking into this:

    1)  The link you gave me refers to three references as 'b37/GRCh37 and hg19', and suggests that b37 and GRCh37 are different from hg19, but does not talk about possible differences between b37 and GRCh37. However,  the Functotator error noted:  

    You are using B37 as a reference. Funcotator will convert your variants to GRCh37, and this will be fine in the vast majority of cases. There MAY be some errors (e.g. in the Y chromosome, but possibly in other places as well) due to changes between the two references.

    Thus b37 and GRCh37 appear to be different (as was also my understanding) and this link does not explain whether they are different, nor how interchangeable these are in GATK resources and workflows.

    Would it please be possible to link me to a resource that answers those two exact questions, and if not, I would appreciate if you or somebody from your team can please comment.

     2) If b37 is indeed different from GRCh37, can you please link me to GRCh37 resources, as the ones you provided are only for hg19/b37 ? I am specifically after GRCh37 files for  ref_fai, ref_fasta and ref_dict arguments. 

    3) Thanks for providing the link to bundle. I see that 'hg19' folder contains files termed 'b37', which is further confusing because the document you linked says that hg19 and b37 are different. Can you please clarify with the team? If they are different, can I please be linked to hg19 resources (the same ones as above)? 

    4) Can somebody from the team advise whether Funcotator would fail at all because of the error you noted with regards to the references? It is possible that we are going down a completely wrong road here... 

    5) I would love to validate the VCF using the tool you suggested, but the documentation says nothing about how to install that tool. In other words, typing gatk into terminal, even after calling the gatk dotkit, returns

    gatk: command not found 

    I should also note again, that the VCF that I am feeding into Funcotator is a direct and unmodified version of the VCF produced by Mutect2 - so if this was a formatting issue, it would again be a question for the GATK team. Either way, I would appreciate if somebody can look at this more closely and look at the actual files, that are all available as part of the referenced workflow and jobID in my first post.  

     

    Thanks

    Mia

     

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Let's try to get the ValidateVariants tool running first to see if this is an angle worth tackling, and then revisit the rest of the questions where needed. I'll be happy to see if we can get you some assistance from the GATK team, or others who are familiar with the workflow's functionality, once we confirm whether the vcf is valid so that they don't need to be worried about that part playing a role.

    The ValidateVariants tool comes with GATK, which you can download using the button at the top-right of the website.

    As is the case with any of these tools, you can use them by downloading GATK4. You can find more information about how to install and use GATK in the Getting Started with GATK4 article found in the Getting Started section of the User Guide.

    The error you are seeing on the Broad server is likely an issue with the dotkit being malformed, not with GATK in general. You would have to write in to BITS to get it fixed. It's also quite old so I would recommend inquiring about creation of a dotkit with a more recent GATK version (server has 4.0.4 and latest is 4.1.8.1).

    If you're interested in that route, you can request a dotkit by going here: https://broad.service-now.com/sp?id=sc_cat_item&sys_id=c48a528bdb1da3400f1b6033ca96190d

    Let us know how it goes!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    I ran the Validate Variants tool as follows:

    ./gatk ValidateVariants -R /Users/mpetljak/Desktop/Mutect_input_Homo_sapiens_assembly19.fasta -V /Users/mpetljak/Desktop/37be10ca-d8da-4d2c-898f-5f9b108bae96_Mutect2_Panel_092bf821-bc47-48d9-9561-d3c3fbf874d6_call-MergeVCFs_GTEX_WES_GRCh37_Under40.vcf -dbsnp /Users/mpetljak/Desktop/hg19_v0_Homo_sapiens_assembly19.dbsnp.vcf

    The output is copied below, I am not sure what it means because the documentation for the tool only says how to run it, but it contains no information on expected outputs/outcomes. Can you please let me know what are the next steps to get funcotator working? 

    /gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.dylib

    Aug 17, 2020 4:45:32 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    16:45:32.559 INFO  ValidateVariants - ------------------------------------------------------------

    16:45:32.559 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.8.1

    16:45:32.559 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/

    16:45:32.559 INFO  ValidateVariants - Executing as mpetljak@wma06-4df on Mac OS X v10.14.6 x86_64

    16:45:32.559 INFO  ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v14.0.2+12-46

    16:45:32.559 INFO  ValidateVariants - Start Date/Time: August 17, 2020 at 4:45:32 PM EDT

    16:45:32.559 INFO  ValidateVariants - ------------------------------------------------------------

    16:45:32.559 INFO  ValidateVariants - ------------------------------------------------------------

    16:45:32.560 INFO  ValidateVariants - HTSJDK Version: 2.23.0

    16:45:32.560 INFO  ValidateVariants - Picard Version: 2.22.8

    16:45:32.560 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    16:45:32.560 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    16:45:32.560 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    16:45:32.560 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    16:45:32.560 INFO  ValidateVariants - Deflater: IntelDeflater

    16:45:32.560 INFO  ValidateVariants - Inflater: IntelInflater

    16:45:32.560 INFO  ValidateVariants - GCS max retries/reopens: 20

    16:45:32.560 INFO  ValidateVariants - Requester pays: disabled

    16:45:32.560 INFO  ValidateVariants - Initializing engine

    16:45:32.729 INFO  FeatureManager - Using codec VCFCodec to read file file:///Users/mpetljak/Desktop/hg19_v0_Homo_sapiens_assembly19.dbsnp.vcf

    16:45:32.843 INFO  FeatureManager - Using codec VCFCodec to read file file:///Users/mpetljak/Desktop/37be10ca-d8da-4d2c-898f-5f9b108bae96_Mutect2_Panel_092bf821-bc47-48d9-9561-d3c3fbf874d6_call-MergeVCFs_GTEX_WES_GRCh37_Under40.vcf

    16:45:32.855 INFO  ValidateVariants - Done initializing engine

    16:45:32.855 INFO  ProgressMeter - Starting traversal

    16:45:32.856 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute

    16:45:43.183 INFO  ProgressMeter -          1:214394669              0.2                 46000         267312.3

    16:45:53.502 INFO  ProgressMeter -          2:153871947              0.3                 85000         247021.2

    16:46:03.707 INFO  ProgressMeter -          3:106520887              0.5                114000         221710.8

    16:46:13.914 INFO  ProgressMeter -          4:106751028              0.7                149000         217740.8

    16:46:24.193 INFO  ProgressMeter -           5:97133431              0.9                178000         208041.1

    16:46:34.196 INFO  ProgressMeter -           6:57560717              1.0                217000         212259.5

    16:46:44.555 INFO  ProgressMeter -           7:90417820              1.2                260000         217576.3

    16:46:54.862 INFO  ProgressMeter -          8:134230179              1.4                298000         218032.8

    16:47:04.902 INFO  ProgressMeter -          10:42364888              1.5                341000         222282.6

    16:47:14.938 INFO  ProgressMeter -          11:71932130              1.7                379000         222766.5

    16:47:25.235 INFO  ProgressMeter -          12:85121928              1.9                403000         215166.7

    16:47:35.296 INFO  ProgressMeter -          14:55129132              2.0                433000         212185.6

    16:47:45.443 INFO  ProgressMeter -          16:73825612              2.2                491000         222193.7

    16:47:55.540 INFO  ProgressMeter -          19:19682119              2.4                547000         230038.1

    16:48:05.743 INFO  ProgressMeter -           X:13879124              2.5                621000         243711.0

    16:48:10.581 INFO  ProgressMeter -           Y:59013982              2.6                653885         248743.7

    16:48:10.581 INFO  ProgressMeter - Traversal complete. Processed 653885 total variants in 2.6 minutes.

    16:48:10.582 INFO  ValidateVariants - Shutting down engine

    [August 17, 2020 at 4:48:10 PM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 2.64 minutes.

    Runtime.totalMemory()=1028653056

     

     

     

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hello,

    An update: I changed the pre-packaged Funcotator data source version from  funcotator_dataSources.v1.6.20190124s.tar.gz to the latest version funcotator_dataSources.v1.7.20200521s.tar.gz

    This took the workflow further down the line, but I am now getting an error that suggest there may be something wrong with one of the GATK source files: file:///cromwell_root/datasources_dir/gencode/hg19/gencode.v34lift37.annotation.REORDERED.gtf  

    The relevant part of the log is copied below and the full version can be accessed here:

    https://storage.cloud.google.com/fc-secure-a46c7502-d26e-4217-b1d4-7d80a20d7456/70c611bc-a5ab-44f2-8fc5-50da9e9b6e67/Funcotator/01171fe4-441f-40c8-86c0-4d622aa38960/call-Funcotate/Funcotate.log?authuser=0

    Can you please let me know what are next steps?

    Thanks,

    Mia

    htsjdk.tribble.TribbleException$MalformedFeatureFile: Error parsing line: LineIteratorImpl(SynchronousLineReader), for input source: file:///cromwell_root/datasources_dir/gencode/hg19/gencode.v34lift37.annotation.REORDERED.gtf at htsjdk.tribble.TribbleIndexedFeatureReader$QueryIterator.readNextRecord(TribbleIndexedFeatureReader.java:510) at htsjdk.tribble.TribbleIndexedFeatureReader$QueryIterator.<init>(TribbleIndexedFeatureReader.java:426) at htsjdk.tribble.TribbleIndexedFeatureReader.query(TribbleIndexedFeatureReader.java:297) at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:567) at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:536) at org.broadinstitute.hellbender.engine.FeatureManager.getFeatures(FeatureManager.java:353) at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:173) at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.queryFeaturesFromFeatureContext(DataSourceFuncotationFactory.java:304) at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.getFeaturesFromFeatureContext(DataSourceFuncotationFactory.java:219) at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:197) at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:172) at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:147) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:157) at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:903) at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:857) at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206) at org.broadinstitute.hellbender.Main.main(Main.java:292) Caused by: java.lang.NumberFormatException: For input string: "chr1:+:11869-12227" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.valueOf(Long.java:803) at org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfFeature.<init>(GencodeGtfFeature.java:224) at org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfExonFeature.<init>(GencodeGtfExonFeature.java:19) at org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfExonFeature.create(GencodeGtfExonFeature.java:23) at org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfFeature$FeatureType$4.create(GencodeGtfFeature.java:777) at org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfFeature.create(GencodeGtfFeature.java:320) at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:138) at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:23) at htsjdk.tribble.TribbleIndexedFeatureReader$QueryIterator.readNextRecord(TribbleIndexedFeatureReader.java:486) ... 43 more

     

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    I'm glad to hear there's been some progress with the ValidateVariants tool. Can you share the file with jcerrato@broadinstitute.org so I can examine it in-depth? I'm currently getting an error trying to access it.

    https://storage.cloud.google.com/fc-secure-a46c7502-d26e-4217-b1d4-7d80a20d7456/70c611bc-a5ab-44f2-8fc5-50da9e9b6e67/Funcotator/01171fe4-441f-40c8-86c0-4d622aa38960/call-Funcotate/Funcotate.log?authuser=0

    Can you confirm that file:///cromwell_root/datasources_dir/gencode/hg19/gencode.v34lift37.annotation.REORDERED.gtf came from the funcotator data sources tar.gz? If it's from somewhere else, please let me know!

    It looks like it's possible that the tool isn't playing well with our own files or the standard genomic reference files, I'm happy to get some more experienced GATK help here!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    The file flagged in the log that might be the issue is not the one that I specified, and is either part of the GATK's embedded Funcotator workflow or part of the Funcotator sources provided by the GATK. To see if the latter is the case, you can see whether this file exists in (gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.7.20200521s.tar.gz ) that I used; but either way if you determine from the log that this particular file is an issue, it may be a good time to bring in the GATK team because whatever is the source of the file, it is not user-specified. 

    Can you along the way please let me know whether you were happy with the VCF validation from above -  does the output suggest that the VCF is OK and how do you determine that? 

    Thanks,

    Mia

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    A couple of my colleagues have verified that your output for ValidateVariants suggests there aren't any issues—if there were, you would get warnings/errors.

    I'm reaching out to an in-house GATK expert to more closely examine the log you've provided.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    They've informed me that this same issue has been brought up recently. The conclusion is that GATK is not ready for the v1.7 data sources. There is currently an incompatibility that will be updated shortly. For now they recommend using v1.6.20190124.

    Read more about that other reported issue here: https://gatk.broadinstitute.org/hc/en-us/community/posts/360072132411-Funcotator-datasources-v1-7-gencode-raise-error

    However, I remember in your original inquiry that you reported your issue when using v1.6.2019012. Is it accurate to say that this would still be an issue at present, or would you need to rerun the job with any changes you've made (if any) to confirm? If there is still an issue, please let me know and I'll bring it up with a GATK expert right away.

    P.S. I noticed in your original inquiry that you used version Mutect2 version 4.1.6.0 to annotate the VCF you are now trying to use in Funcotator. If you do need to rerun a job, can you also try with the 4.1.6.0 version of Funcotator and GATK to see if keeping things consistent with versions helps resolve some issues?

    Let me know.

    Many thanks,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    Thanks for following up. Correct, the original issue was raised re. failures I was getting when running with v1.6.2019012 Funcotator sources.

    I could try the 4.1.6.0. version of Funcotator, to have the version matched with the version of Mutect used to produce PON that I am trying to Funcotate, but I actually used 4.1.7.0. docker with Mutect 4.1.6.0, because 4.1.6.0. docker was bugged and it gave me issues before (https://gatk.broadinstitute.org/hc/en-us/community/posts/360060174372-Haplotype-Caller-4-1-6-0-java-lang-IllegalStateException-Smith-Waterman-alignment-failure-). We discussed this on one of the separate threads. So I could not get all the things down to 4.1.6.0 (because docker is bugged), neither to .7 version (because sources are not ready as you explained).

    It would be great if you can please ask somebody from the team to look at my original enquiry, relevant information summarized below:

    1) I created a PON using Mutect 4.1.6.0. with 4.1.7.0. docker; because 4.1.6.0. docker was bugged

    2) I am trying to annotate the PON VCF with Funcotator - I am feeding VCF from step 1 into analyses directly, w/o any modifications to VCF. VCF was checked with ValidateVariants workflow. Funcotrator info: 

    Funcotator version: 4.1.7.0

    GATK docker: 4.1.7.0 

    Funcotator sources:  funcotator_dataSources.v1.6.20190124s.tar.gz

    3) I am sending you logs of two failed attempts via email, as well as the input json file. 

    4) For other information workspace is called 661-Clonal hematopoiesis and your team already has access to it. The job ID is 3aef30a2-8caf-4bef-8b5d-8574c16a096a and you should be able to access all of the other relevant details that you may need to troubleshoot.

    5) I suspect the issue is to do with source files, because when I tried a later version (.7) the workflow got further down the line, but then failed again (presumably because as you explained, this version of sources is not ready yet)

    Thanks,

    Mia

     

      

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Just as an update, we're looking to get advice from the author of the Funcotator workflow on this error.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Here's what I've heard from the author:

    Funcotator does not produce full funcotations for spanning deletions, so any `*` alt alleles will generate the message:
    GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr4:9249899-9249899 due to alternate allele: *
     
    For these alleles, some annotations will be produced, but a "complete" funcotation will not be.
     

    The real error here is this second message.Funcotator expects that all Alt alleles will be standard bases (ATGC) and assumes that the genome sequence around all variants will also be standard bases (ATGC).One of two things has happened.
    1. M2 has somehow produced a variant that includes N bases.  (unlikely)
    2. The region in the reference to which a variant in their file mapped includes N bases.  (more likely)
     
    Can you search your VCF for Alt alleles containing N bases?  If any of these exist, then this is the problem. Let us know if this is the case.
     
    If not, then this is a bug in how Funcotator handles N bases in the predicted protein sequence and the author would like to know which variant is causing this problem.  You should be able to look at the last annotated variant in your output. Please give us the coordinates of the next variant in your input file after the last annotated variant.
     
    Kind regards,
     
    Jason
    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    Thanks for addressing this with the team.

    I checked and there are no 'N' variants among the wild-type bases the VCF.

    In terms of the next variant in the input, following the last annotated one:

    #CHROM POS ID REF ALT QUAL FILTER INFO

    4 9274640 . A ATCACTG,ATCCTG . . BETA=0.989,0.141;FRACTION=0.022

    Thanks,

    Mia 

    P.S. if this does not resolve it, the Funcotator outputs from that failed job are here:

    https://console.cloud.google.com/storage/browser/fc-secure-a46c7502-d26e-4217-b1d4-7d80a20d7456/3aef30a2-8caf-4bef-8b5d-8574c16a096a/Funcotator/92ed0246-8436-4c93-a50b-5f2dc6fc5bd6/call-Funcotate?authuser=0&prefix=

    I am hoping you might be able to open with the access we gave you before 

     

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Thanks for that. I've passed the details on to the author and I can confirm I am able to access that link.

    I'll let you know once I hear back.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Thanks Jason. I'd really appreciate every effort to help me resolve this by the end of this week as we have not been able to move these analyses anywhere forward for 3 weeks now (since the issue was first raised).

    Best wishes,

    Mia

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    I'm still awaiting word from the author. I sent an update request yesterday—I'll send another today if I don't hear by 11AM.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    I've heard back from the author. They were able to reproduce the error with the variant you provided.  They're digging deeper into the issue, but they think it's looking like a bug.

    In the meantime, if you can remove that variant from the file, Funcotator should run correctly on the rest of the file (assuming no other variants have the same issue). If you find that any other variants are, please let me know and I'll flag them up with the author ASAP.

    Kind regards,
    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Thanks Jason, I will try that.

    Given that I need to modify the vcf and create a new index file, can you please let me know :

    1.  is the workflow going to accept .vcf.gz.tbi index format ? 

    2.  if not, how would I generate .idx index file that is output of Mutect2 ?

    I tried this, but this is giving be .gz.tbi (https://gatk.broadinstitute.org/hc/en-us/articles/360036899892-IndexFeatureFile)

    3. if yes, is .vcf.gz.tbi compatible with .vcf, or should the vcf be vcf.gz ? 

    Thanks,

    Mia 

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Here's what I've heard back from the author.

    Short answer to all 3: It should accept a vcf.gz.tbi, and everything should "just work"
    With respect to #3 - Yes - I think it should be g-zipped. iI this doesn't work I can try to put in the fix today / get it in Monday.
     
    Let us know how it goes for you and I'll let them know if you run into any trouble.
     
    Kind regards,
    Jason
    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    I was rerunning the same workflow with the new vcf and new vcf.idx and now I get the following error:

     

    The job ran further than the previous variant, but I still get the 'incompletely' funcotated VCF.

    I do not think that the error is to do with a specific variant, as it was last time, because there is no such error in .err file, nor in the worklog. 

    Submission ID: 753d047a-e91a-4134-a200-b753cfa2bd0a

    Workspace ID: a46c7502-d26e-4217-b1d4-7d80a20d7456

    Worklog:  gs://fc-secure-a46c7502-d26e-4217-b1d4-7d80a20d7456/753d047a-e91a-4134-a200-b753cfa2bd0a/Funcotator/f0a0794a-eaf5-4b10-884d-5604b1aadda0/call-Funcotate/Funcotate.log

    Err file: https://00e9e64bac3db512a1bb2ef50a60db52cde2c9cc153b4d8af2-apidata.googleusercontent.com/download/storage/v1/b/fc-secure-a46c7502-d26e-4217-b1d4-7d80a20d7456/o/753d047a-e91a-4134-a200-b753cfa2bd0a%2FFuncotator%2Ff0a0794a-eaf5-4b10-884d-5604b1aadda0%2Fcall-Funcotate%2Fstderr?qk=AD5uMEtsTzl3aMs0ZJ5jQ5ggD2qOFLgMDS6RqGjxrzJfaMl0rTOXd5wMkHldHQ5L1DrMzFoC8S4xcMowXoTYt4BeBCHD3yz1Lbk1AEpq9aGghJn4PYB_-L9VgvTK7zt4-Byot8a4uQwUSZxB4IO_p7bQsYRxBTFd5ZtFmiQI_Drm15zRFSzmpGkEtsMuLZAZEHU56a5FdR3ySTbBs5wrIf9PBtyQZVgIaL_tQPlGCdLLYY3MXCzspLWMTq-U8dBvQ50-stRJpqhfKqdvZLhpPLqXFZK3j3P6bkt1SAV3M8h-AhGRqqVKeZqqwH4E0f1Om_uKSIdUIrfV99XIbI3iT7AetjEVgCuuwalI4AaiN90vzwmJ7UB0l89k0EhkntNVZzHppQukfCNhJ1ykAzDvUBWMoOcAlut-ccdnJufthHZzroxOehHS7pTBrRjcaGGVKSzGyNfuri811rs2nVgCFQlHZAjoAKoVHq8CtYICzSSMpAsTy0FZ_ngyZxaLQ5mjw7b-GblaTMVN6PFZaFGhL002Kt6FCntPnDaKMw9GGlkrfPnu3ENPnp2Wb6lngp5RkQj7qrUK984oqdwLwItr9nsagUzS0pXHpEbC9-KWeBTqMur4T_4rDXuG4VlMhw3N6E99Mh20s3_otHZiccR0g0ct7Y0pGtVqx2hoU89KlSWHbuqf9LszH6JEsKbA1CVMCgCkFqebpEI4K4B6Ded9srTPdHsoOMFqL1JSKcbocyP5CxNHCw58KxT7fpMudWA8P2QiEDFtSJg-Y0g3db-CwPqKHl4hpv9iU15MkOG1xUqr-vIxBm27Pp0RJOpNS96Ljt2xbnqAT2bCWrWkXx2cDmGafg99XPmlLG2boz6hFsRoxCReWaMrLm1h2HLEteqvx6ig0IiTF8tLCOACAZe1FflHq7rpfrNeUh6e2KfBDulAey9sSx0PoF1st7CgjqftY9YlFsn11h8eARiQ5Gc2RsHZvSP1R0ukJg&isca=1

    I see the same error reported here:

    https://support.terra.bio/hc/en-us/community/posts/360062156772-Error-editing-a-wdl-from-a-featured-workspace

    And the suggestion that worked was to add continueOnReturnCode item in the runtime block for the Funcotate task.

     

    However, I do not know how to do that. If you think the issue is the same, can you please let me know where and what to insert into my WDL, so that I can copy-paste it? I pasted the entire WDL as I have it now below. 

    Thanks!

    Mia

     

     

    Synopsis: Funcotator 4.1.7.0
    Run workflow with inputs defined by file paths
    Run workflow(s) with inputs defined by data table
     Use call caching Delete intermediate outputs
    SCRIPT
     
     
     
    INPUTS
     
     
     
    OUTPUTS
     
     
     
    RUN ANALYSIS
     
    # Run Funcotator on a set of called variants.
    #
    # Description of inputs:
    #
    #   Required:
    #     String gatk_docker                       - GATK Docker image in which to run
    #     File ref_fasta                           - Reference FASTA file.
    #     File ref_fasta_index                     - Reference FASTA file index.
    #     File ref_fasta_dict                      - Reference FASTA file sequence dictionary.
    #     File variant_vcf_to_funcotate            - Variant Context File (VCF) containing the variants to annotate.
    #     File variant_vcf_to_funcotate_index      - Index file corresponding to the input Variant Context File (VCF) containing the variants to annotate.
    #     String reference_version                 - Version of the reference being used.  Either `hg19` or `hg38`.
    #     String output_file_name                  - Path to desired output file.
    #     String output_format                     - Output file format (either VCF or MAF).
    #     Boolean compress				           - Whether to compress the resulting output file.
    #     Boolean use_gnomad                       - If true, will enable the gnomAD data sources in the data source tar.gz, if they exist.
    #
    #   Optional:
    #     File? interval_list                      - Intervals to be used for traversal.  If specified will only traverse the given intervals.
    #     File? data_sources_tar_gz                - Path to tar.gz containing the data sources for Funcotator to create annotations.
    #     String? transcript_selection_mode        - Method of detailed transcript selection.  This will select the transcript for detailed annotation (either `CANONICAL` or `BEST_EFFECT`).
    #     Array[String]? transcript_selection_list - Set of transcript IDs to use for annotation to override selected transcript.
    #     Array[String]? annotation_defaults       - Annotations to include in all annotated variants if the annotation is not specified in the data sources (in the format <ANNOTATION>:<VALUE>).  This will add the specified annotation to every annotated variant if it is not already present.
    #     Array[String]? annotation_overrides      - Override values for annotations (in the format <ANNOTATION>:<VALUE>).  Replaces existing annotations of the given name with given values.
    #     File? gatk4_jar_override                 - Override Jar file containing GATK 4.0.  Use this when overriding the docker JAR or when using a backend without docker.
    #     String? funcotator_extra_args            - Extra command-line arguments to pass through to Funcotator.  (e.g. " --exclude-field foo_field --exclude-field bar_field ")
    #
    # This WDL needs to decide whether to use the ``gatk_jar`` or ``gatk_jar_override`` for the jar location.  As of cromwell-0.24,
    # this logic *must* go into each task.  Therefore, there is a lot of duplicated code.  This allows users to specify a jar file
    # independent of what is in the docker file.  See the README.md for more info.
    #
    workflow Funcotator {
        String gatk_docker
        File ref_fasta
        File ref_fasta_index
        File ref_dict
        File variant_vcf_to_funcotate
        File variant_vcf_to_funcotate_index
        String reference_version
        String output_file_base_name
        String output_format
        Boolean compress
        Boolean use_gnomad
    
        File? interval_list
        File? data_sources_tar_gz
        String? transcript_selection_mode
        Array[String]? transcript_selection_list
        Array[String]? annotation_defaults
        Array[String]? annotation_overrides
        String? funcotator_extra_args
    
        File? gatk4_jar_override
    
        call Funcotate {
            input:
                gatk_docker               = gatk_docker,
                ref_fasta                 = ref_fasta,
                ref_fasta_index           = ref_fasta_index,
                ref_dict                  = ref_dict,
                input_vcf                 = variant_vcf_to_funcotate,
                input_vcf_idx             = variant_vcf_to_funcotate_index,
                reference_version         = reference_version,
                output_file_base_name     = output_file_base_name,
                output_format             = output_format,
                compress                  = compress,
                use_gnomad                = use_gnomad,
    
                interval_list             = interval_list,
                data_sources_tar_gz       = data_sources_tar_gz,
                transcript_selection_mode = transcript_selection_mode,
                transcript_selection_list = transcript_selection_list,
                annotation_defaults       = annotation_defaults,
                annotation_overrides      = annotation_overrides,
                extra_args                = funcotator_extra_args,
    
                gatk_override             = gatk4_jar_override
        }
    
        output {
            File funcotated_file_out = Funcotate.funcotated_output_file
            File funcotated_file_out_idx = Funcotate.funcotated_output_file_index
        }
    }
    
    ################################################################################
    
    task Funcotate {
    
        # ==============
        # Inputs
        File ref_fasta
        File ref_fasta_index
        File ref_dict
    
        File input_vcf
        File input_vcf_idx
    
        String reference_version
    
        String output_file_base_name
        String output_format
    
        Boolean compress
        Boolean use_gnomad
    
        # This should be updated when a new version of the data sources is released
        # TODO: Make this dynamically chosen in the command.
        File? data_sources_tar_gz = "gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.6.20190124s.tar.gz"
    
        String? control_id
        String? case_id
        String? sequencing_center
        String? sequence_source
        String? transcript_selection_mode
        File? transcript_selection_list
        Array[String]? annotation_defaults
        Array[String]? annotation_overrides
        Array[String]? funcotator_excluded_fields
        Boolean? filter_funcotations
        File? interval_list
    
        String? extra_args
    
        # ==============
        # Process input args:
    
        String output_maf = output_file_base_name + ".maf"
        String output_maf_index = output_maf + ".idx"
    
        String output_vcf = output_file_base_name + if compress then ".vcf.gz" else ".vcf"
        String output_vcf_idx = output_vcf +  if compress then ".tbi" else ".idx"
    
        String output_file = if output_format == "MAF" then output_maf else output_vcf
        String output_file_index = if output_format == "MAF" then output_maf_index else output_vcf_idx
    
        String transcript_selection_arg = if defined(transcript_selection_list) then " --transcript-list " else ""
        String annotation_def_arg = if defined(annotation_defaults) then " --annotation-default " else ""
        String annotation_over_arg = if defined(annotation_overrides) then " --annotation-override " else ""
        String filter_funcotations_args = if defined(filter_funcotations) && (filter_funcotations) then " --remove-filtered-variants " else ""
        String excluded_fields_args = if defined(funcotator_excluded_fields) then " --exclude-field " else ""
    
        String interval_list_arg = if defined(interval_list) then " -L " else ""
    
        String extra_args_arg = select_first([extra_args, ""])
    
        # ==============
        # Runtime options:
        String gatk_docker
    
        File? gatk_override
        Int? mem
        Int? preemptible_attempts
        Int? max_retries
        Int? disk_space_gb
        Int? cpu
    
        Boolean use_ssd = false
    
        # Mem is in units of GB but our command and memory runtime values are in MB
        Int default_ram_mb = 1024 * 3
        Int machine_mem = if defined(mem) then mem *1024 else default_ram_mb
        Int command_mem = machine_mem - 1024
    
        # Calculate disk size:
        Float ref_size_gb = size(ref_fasta, "GiB") + size(ref_fasta_index, "GiB") + size(ref_dict, "GiB")
        Float vcf_size_gb = size(input_vcf, "GiB") + size(input_vcf_idx, "GiB")
        Float ds_size_gb = size(data_sources_tar_gz, "GiB")
    
        Int default_disk_space_gb = ceil( ref_size_gb + (ds_size_gb * 2) + (vcf_size_gb * 10) ) + 20
    
        # Silly hack to allow us to use the dollar sign in the command section:
        String dollar = "$"
    
        command <<<
            set -e
            export GATK_LOCAL_JAR=${default="/root/gatk.jar" gatk_override}
    
            # =======================================
            # Hack to validate our WDL inputs:
            #
            # NOTE: This happens here so that we don't waste time copying down the data sources if there's an error.
    
            if [[ "${output_format}" != "MAF" ]] && [[ "${output_format}" != "VCF" ]] ; then
                echo "ERROR: Output format must be MAF or VCF."
            fi
    
            # =======================================
            # Handle our data sources:
    
            # Extract the tar.gz:
            echo "Extracting data sources tar/gzip file..."
            mkdir datasources_dir
            tar zxvf ${data_sources_tar_gz} -C datasources_dir --strip-components 1
            DATA_SOURCES_FOLDER="$PWD/datasources_dir"
    
            # Handle gnomAD:
            if ${use_gnomad} ; then
                echo "Enabling gnomAD..."
                for potential_gnomad_gz in gnomAD_exome.tar.gz gnomAD_genome.tar.gz ; do
                    if [[ -f ${dollar}{DATA_SOURCES_FOLDER}/${dollar}{potential_gnomad_gz} ]] ; then
                        cd ${dollar}{DATA_SOURCES_FOLDER}
                        tar -zvxf ${dollar}{potential_gnomad_gz}
                        cd -
                    else
                        echo "ERROR: Cannot find gnomAD folder: ${dollar}{potential_gnomad_gz}" 1>&2
                        false
                    fi
                done
            fi
    
            # =======================================
            # Run Funcotator:
            gatk --java-options "-Xmx${command_mem}m" Funcotator \
                --data-sources-path $DATA_SOURCES_FOLDER \
                --ref-version ${reference_version} \
                --output-file-format ${output_format} \
                -R ${ref_fasta} \
                -V ${input_vcf} \
                -O ${output_file} \
                ${interval_list_arg} ${default="" interval_list} \
                --annotation-default normal_barcode:${default="Unknown" control_id} \
                --annotation-default tumor_barcode:${default="Unknown" case_id} \
                --annotation-default Center:${default="Unknown" sequencing_center} \
                --annotation-default source:${default="Unknown" sequence_source} \
                ${"--transcript-selection-mode " + transcript_selection_mode} \
                ${transcript_selection_arg}${default="" sep=" --transcript-list " transcript_selection_list} \
                ${annotation_def_arg}${default="" sep=" --annotation-default " annotation_defaults} \
                ${annotation_over_arg}${default="" sep=" --annotation-override " annotation_overrides} \
                ${excluded_fields_args}${default="" sep=" --exclude-field " funcotator_excluded_fields} \
                ${filter_funcotations_args} \
                ${extra_args_arg}
    
            # =======================================
            # Make sure we have a placeholder index for MAF files so this workflow doesn't fail:
            if [[ "${output_format}" == "MAF" ]] ; then
                touch ${output_maf_index}
            fi
        >>>
    
        runtime {
            docker: gatk_docker
            bootDiskSizeGb: 20
            memory: machine_mem + " MB"
            disks: "local-disk " + select_first([disk_space_gb, default_disk_space_gb]) + if use_ssd then " SSD" else " HDD"
            preemptible: select_first([preemptible_attempts, 3])
            maxRetries: select_first([max_retries, 0])
            cpu: select_first([cpu, 1])
        }
    
        output {
            File funcotated_output_file = "${output_file}"
            File funcotated_output_file_index = "${output_file_index}"
        }
    }
     
     

     

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    I'm seeing this in your .log file:

    23:37:25.124 INFO Funcotator - Shutting down engine
    [August 28, 2020 11:37:25 PM UTC] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 197.71 minutes.
    Runtime.totalMemory()=11615600640
    java.lang.StringIndexOutOfBoundsException: String index out of range: 218
    at java.lang.String.substring(String.java:1963)
    at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.initializeForInsertion(ProteinChangeInfo.java:256)
    at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.<init>(ProteinChangeInfo.java:93)
    at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.create(ProteinChangeInfo.java:371)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createSequenceComparison(GencodeFuncotationFactory.java:2010)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createCodingRegionFuncotationForProteinCodingFeature(GencodeFuncotationFactory.java:1200)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createExonFuncotation(GencodeFuncotationFactory.java:1051)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:985)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:812)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:796)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.lambda$createGencodeFuncotationsByAllTranscripts$0(GencodeFuncotationFactory.java:473)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationsByAllTranscripts(GencodeFuncotationFactory.java:474)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:529)
    at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:233)
    at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:201)
    at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:172)
    at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:147)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:157)
    at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:903)
    at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:857)
    at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)

    You may already be aware of this, but I wanted to point this out in case this is a genuine error you want to check for rather than bypassing the error code. I did a search for gatk funcotator java.lang.StringIndexOutOfBoundsException and found a couple of GATK forum posts where others seem to have run into similar issues. 

    https://gatk.broadinstitute.org/hc/en-us/community/posts/360067471451-Funcotator-cannot-complete-funcotaion-for-variant-due-to-alternate-allele

    https://gatk.broadinstitute.org/hc/en-us/community/posts/360056385852-Funcotator-error-StringIndexOutOfBoundsException

    The latter is by the user who also asked for help with setting up the continueOnReturnCode runtime element in their WDL. If you are certain you want to have this added to your WDL, I believe you will only need to add it to the runtime block for your Funcotator task. The user had to do a bit more because they had a default runtime that was provided to the task, but based on the WDL you shared, you should be able to just add the runtime attribute. See: https://cromwell.readthedocs.io/en/stable/RuntimeAttributes/#continueonreturncode

    For example:

        runtime {
            docker: gatk_docker
            bootDiskSizeGb: 20
            memory: machine_mem + " MB"
            disks: "local-disk " + select_first([disk_space_gb, default_disk_space_gb]) + if use_ssd then " SSD" else " HDD"
            preemptible: select_first([preemptible_attempts, 3])
            maxRetries: select_first([max_retries, 0])
            cpu: select_first([cpu, 1])
    continueOnReturnCode: [0, 1] }

    Let me know if you have any further questions or concerns.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    Just an update: I removed the new variant that was failing and that seemed to have worked.

    So overall, two variants failed due to different reasons, it is not ideal, but it is OK.

    Of course, it would be great if this could be fixed in future.

    Thank you for helping and best wishes,

    Mia

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    I'm glad to hear you were able to get it working after removing the two variants. The Github issues section will be the best place to go to keep track of work done toward solving these bugs.

    If there's anything else you're seeing that looks like a bug for investigation, let us know and we'll be happy to follow-up with the author(s).

    Kind regards,

    Jason

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk