Skip to content

Workflow failure examples

Five failures that do not look like downtime.

The server is up, the queue might be empty, and the provider accepted the event. The problem is that the customer-facing work did not actually complete.

Failure atlas

Green checks can describe four different truths at once.

Luota is for the last mile: did the customer-visible outcome happen?

Provider

200 OK

The external system accepted the first request.

Queue

empty

The worker consumed the task, but completion was not proven.

Customer

wrong state

The visible promise stayed broken after the green signal.

Operator

missing context

The incident starts with a search through logs instead of a record.

Atlas key

Read failures by the gap between signal and outcome.

01

Promise

Name the customer-visible state, not the job name.

02

Green signal

Record what looks healthy before the outcome is proven.

03

Gap

Find the missing state transition, receipt, file, or visible output.

04

Incident

Open one file with payload, owner, route, and retry context.

05

Review

Keep the retained history for the next operator.

Incident file 01

Revenue and trust

False comfortQueue emptyHTTP acceptedNo app error

Stripe accepted the webhook, but access never changed

invoice.payment_failed/local subscription state/entitlement update/customer email

Quiet failure

The handler returned 200, then the entitlement job never ran. Stripe looked delivered. The app looked healthy. The customer still had the wrong access state.

Luota proof

Close the run only after subscription state, access state, and customer notification all match the expected outcome.

Incident file 02

Customer promise

False comfortQueue emptyHTTP acceptedNo app error

The report rendered, but the email never sent

scheduled report/query/render/email provider/delivery receipt

Quiet failure

Cron fired and the query completed. A later delivery step failed after the process logged success, so the missing report became a support conversation.

Luota proof

Treat the provider receipt as evidence. If generation succeeds but delivery fails, the workflow remains open or fails with the provider response attached.

Incident file 03

Visible work

False comfortQueue emptyHTTP acceptedNo app error

The model batch finished, but output never appeared

input batch/model call/validation/persistence/user-visible result

Quiet failure

The worker completed the model call, but validation or persistence failed. The queue drained, dashboards stayed green, and the user saw stale work.

Luota proof

Treat visible output as the completion condition. Store the run timeline, payload tags, duration, and failure summary for the exact batch.

Incident file 04

State drift

False comfortQueue emptyHTTP acceptedNo app error

The queue drained, but the external update failed

queued task/third-party API call/local reconciliation/customer-facing state

Quiet failure

Workers consumed every task, so the queue looked empty. A downstream API rejected updates, leaving the state customers rely on unchanged.

Luota proof

Close the run after reconciliation, not after dequeue. Attach API status, external ids, deploy SHA, and host so the incident says what can be retried.

Incident file 05

Recovery

False comfortQueue emptyHTTP acceptedNo app error

The backup existed, but restore proof was missing

backup/upload/decrypt check/throwaway restore/freshness proof

Quiet failure

The dump command exited and a file existed, but nobody had evidence that the archive could be decrypted and restored when it mattered.

Luota proof

Use one lifecycle monitor for the drill. Success means the restore check completed, not merely that the backup command exited.

Where to start

Do not monitor everything first. Monitor the one path people would notice.

Luota is most useful when the success condition is a real outcome: access changed, report delivered, result visible, reconciliation complete, restore proven.

The first useful monitor is usually one workflow with revenue, trust, or support impact.

Wire the first monitor