01
Promise
Name the customer-visible state, not the job name.
Workflow failure examples
The server is up, the queue might be empty, and the provider accepted the event. The problem is that the customer-facing work did not actually complete.
Failure atlas
Green checks can describe four different truths at once.
Luota is for the last mile: did the customer-visible outcome happen?
Provider
200 OK
The external system accepted the first request.
Queue
empty
The worker consumed the task, but completion was not proven.
Customer
wrong state
The visible promise stayed broken after the green signal.
Operator
missing context
The incident starts with a search through logs instead of a record.
Atlas key
01
Name the customer-visible state, not the job name.
02
Record what looks healthy before the outcome is proven.
03
Find the missing state transition, receipt, file, or visible output.
04
Open one file with payload, owner, route, and retry context.
05
Keep the retained history for the next operator.
Use-case routes
Each route focuses on one concrete failure mode, what Luota captures, and when it should complement existing logs, uptime checks, and error tracking.
Scheduled scripts, backups, and maintenance jobs where a missed start needs context.
Open example
Queue workers, imports, syncs, and reconciliation tasks where dequeue does not prove completion.
Open example
Stripe events and access updates where the customer-visible state is the real promise.
Open example
Scheduled reports, exports, and digests that must reach the right destination.
Open example
Batch jobs where generated output must be validated, persisted, and visible in-product.
Open example
Dashboards, tables, files, and derived views where stale output is the failure.
Open example
Scheduled GitHub Actions jobs where green CI does not prove the operational outcome.
Open example
Practical setup guides for choosing heartbeat, run lifecycle, or freshness checks.
Open example
A migration path when simple cron pings need richer incident context.
Open example
A comparison for teams moving from ping-first monitoring to workflow evidence.
Open example
Incident file 01
Revenue and trust
invoice.payment_failed/local subscription state/entitlement update/customer email
Quiet failure
The handler returned 200, then the entitlement job never ran. Stripe looked delivered. The app looked healthy. The customer still had the wrong access state.
Luota proof
Close the run only after subscription state, access state, and customer notification all match the expected outcome.
Incident file 02
Customer promise
scheduled report/query/render/email provider/delivery receipt
Quiet failure
Cron fired and the query completed. A later delivery step failed after the process logged success, so the missing report became a support conversation.
Luota proof
Treat the provider receipt as evidence. If generation succeeds but delivery fails, the workflow remains open or fails with the provider response attached.
Incident file 03
Visible work
input batch/model call/validation/persistence/user-visible result
Quiet failure
The worker completed the model call, but validation or persistence failed. The queue drained, dashboards stayed green, and the user saw stale work.
Luota proof
Treat visible output as the completion condition. Store the run timeline, payload tags, duration, and failure summary for the exact batch.
Incident file 04
State drift
queued task/third-party API call/local reconciliation/customer-facing state
Quiet failure
Workers consumed every task, so the queue looked empty. A downstream API rejected updates, leaving the state customers rely on unchanged.
Luota proof
Close the run after reconciliation, not after dequeue. Attach API status, external ids, deploy SHA, and host so the incident says what can be retried.
Incident file 05
Recovery
backup/upload/decrypt check/throwaway restore/freshness proof
Quiet failure
The dump command exited and a file existed, but nobody had evidence that the archive could be decrypted and restored when it mattered.
Luota proof
Use one lifecycle monitor for the drill. Success means the restore check completed, not merely that the backup command exited.
Where to start
Luota is most useful when the success condition is a real outcome: access changed, report delivered, result visible, reconciliation complete, restore proven.
The first useful monitor is usually one workflow with revenue, trust, or support impact.
Wire the first monitor