Error on Google Cloud Platform: Joint GenotypingWf interval file is larger than 128000 Bytes
Deciding to continue to the Joint Genotyping workflow from the Running GATK Best Practices Google tutorial, I came across the error below. The error has been addressed in a previous issue. The solution that worked for the user is this:
java -Dsystem.input-read-limits.lines=500000 -jar /cromwell-34.jar
If this is still the recommended solution, where would I add this modification given that this is running on the Google Cloud Platform?
Command run:
$ gcloud alpha genomics pipelines run --pipeline-file wdl_pipeline.yaml --regions us-central1 --inputs-from-file WDL=${GATK_GOOGLE_DIR}/JointGenotypingWf.wdl,WORKFLOW_INPUTS=${GATK_GOOGLE_DIR}/JointGenotypingWf.hg38.inputs.json,WORKFLOW_OPTIONS=${GATK_GOOGLE_DIR}/JointGenotypingWf.options.json --env-vars WORKSPACE=${GATK_OUTPUT_DIR}/work,OUTPUTS=${GATK_OUTPUT_DIR}/output --logging ${GATK_OUTPUT_DIR}/logging/
End of error log:
2020-05-02 21:05:32,203 cromwell-system-akka.dispatchers.engine-dispatcher-28 INFO - Not triggering log of token queue status. Effective log interval = None
2020-05-02 21:05:32,270 cromwell_driver INFO: Job submitted to Cromwell. job id: 9eec64d6-5959-4c27-8a13-6b57ffbe034a
2020-05-02 21:05:48,283 cromwell-system-akka.dispatchers.engine-dispatcher-9 INFO - 1 new workflows fetched
2020-05-02 21:05:48,284 cromwell-system-akka.dispatchers.engine-dispatcher-9 INFO - WorkflowManagerActor Starting workflow UUID(9eec64d6-5959-4c27-8a13-6b57ffbe034a)
2020-05-02 21:05:48,288 cromwell-system-akka.dispatchers.engine-dispatcher-9 INFO - WorkflowManagerActor Successfully started WorkflowActor-9eec64d6-5959-4c27-8a13-6b57ffbe034a
2020-05-02 21:05:48,291 cromwell-system-akka.dispatchers.engine-dispatcher-9 INFO - Retrieved 1 workflows from the WorkflowStoreActor
2020-05-02 21:05:48,300 cromwell-system-akka.dispatchers.engine-dispatcher-30 INFO - WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
2020-05-02 21:05:48,592 cromwell-system-akka.dispatchers.engine-dispatcher-29 INFO - MaterializeWorkflowDescriptorActor [UUID(9eec64d6)]: Parsing workflow as WDL draft-2
2020-05-02 21:05:50,495 cromwell-system-akka.dispatchers.engine-dispatcher-29 INFO - MaterializeWorkflowDescriptorActor [UUID(9eec64d6)]: Call-to-Backend assignments: JointGenotyping.SNPGatherTranches -> JES, JointGenotyping.FinalGatherVcf -> JES, JointGenotyping.SNPsVariantRecalibratorCreateModel -> JES, JointGenotyping.IndelsVariantRecalibrator -> JES, JointGenotyping.SNPsVariantRecalibratorScattered -> JES, JointGenotyping.CollectMetricsOnFullVcf -> JES, JointGenotyping.HardFilterAndMakeSitesOnlyVcf -> JES, JointGenotyping.SNPsVariantRecalibratorClassic -> JES, JointGenotyping.DynamicallyCombineIntervals -> JES, JointGenotyping.CollectMetricsSharded -> JES, JointGenotyping.ApplyRecalibration -> JES, JointGenotyping.GenotypeGVCFs -> JES, JointGenotyping.GatherMetrics -> JES, JointGenotyping.SitesOnlyGatherVcf -> JES, JointGenotyping.ImportGVCFs -> JES
2020-05-02 21:06:09,092 cromwell-system-akka.dispatchers.engine-dispatcher-29 ERROR - WorkflowManagerActor Workflow 9eec64d6-5959-4c27-8a13-6b57ffbe034a failed (during ExecutingWorkflowState): java.lang.RuntimeException: Failed to evaluate 'JointGenotyping.num_of_original_intervals' (reason 1 of 1): Evaluating length(read_lines(unpadded_intervals_file)) failed: [Attempted 1 time(s)] - IOException: Could not read from gs://gatk-test-data/intervals/hg38.even.handcurated.20k.intervals: File gs://gatk-test-data/intervals/hg38.even.handcurated.20k.intervals is larger than 128000 Bytes. Maximum read limits can be adjusted in the configuration under system.input-read-limits.
at cromwell.engine.workflow.lifecycle.execution.keys.ExpressionKey.processRunnable(ExpressionKey.scala:29)
at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$startRunnableNodes$7(WorkflowExecutionActor.scala:523)
at cats.instances.ListInstances$$anon$1.$anonfun$traverse$2(list.scala:74)
at cats.instances.ListInstances$$anon$1.loop$2(list.scala:64)
at cats.instances.ListInstances$$anon$1.$anonfun$foldRight$1(list.scala:64)
at cats.Eval$.loop$1(Eval.scala:336)
at cats.Eval$.cats$Eval$$evaluate(Eval.scala:368)
at cats.Eval$Defer.value(Eval.scala:257)
at cats.instances.ListInstances$$anon$1.traverse(list.scala:73)
at cats.instances.ListInstances$$anon$1.traverse(list.scala:12)
at cats.Traverse$Ops.traverse(Traverse.scala:19)
at cats.Traverse$Ops.traverse$(Traverse.scala:19)
at cats.Traverse$ToTraverseOps$$anon$3.traverse(Traverse.scala:19)
at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.cromwell$engine$workflow$lifecycle$execution$WorkflowExecutionActor$$startRunnableNodes(WorkflowExecutionActor.scala:517)
at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:188)
at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:186)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168)
at akka.actor.FSM.processEvent(FSM.scala:687)
at akka.actor.FSM.processEvent$(FSM.scala:681)
at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$LoggingFSM$$super$processEvent(WorkflowExecutionActor.scala:51)
at akka.actor.LoggingFSM.processEvent(FSM.scala:820)
at akka.actor.LoggingFSM.processEvent$(FSM.scala:802)
at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.processEvent(WorkflowExecutionActor.scala:51)
at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:678)
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:672)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$Timers$$super$aroundReceive(WorkflowExecutionActor.scala:51)
at akka.actor.Timers.aroundReceive(Timers.scala:51)
at akka.actor.Timers.aroundReceive$(Timers.scala:40)
at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.aroundReceive(WorkflowExecutionActor.scala:51)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke(ActorCell.scala:557)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2020-05-02 21:06:09,120 cromwell-system-akka.dispatchers.engine-dispatcher-29 INFO - WorkflowManagerActor WorkflowActor-9eec64d6-5959-4c27-8a13-6b57ffbe034a is in a terminal state: WorkflowFailedState
ERROR: Status of job is not Submitted, Running, or Succeeded: Failed
-
Hey there ikeoluwao_o,
Thanks for writing in. This looks like it may be a bit out of scope for GATK or Terra, so I recommend posting your question to https://bioinformatics.stackexchange.com/ and tagging your post with Cromwell. This way a member of the Cromwell engineering team will be able to see your question and help assist.
Would you be willing to post there and see if you are able to get a helpful answer?
Kind regards,
Jason
Please sign in to leave a comment.
1 comment