Repairable automation

A tool is not ready for daily work when it can succeed in a demo. It is ready when its failure is small, visible, and repairable.

That distinction matters because most workflow tools fail in boring ways. They do not explode. They quietly skip a row, save a file in the wrong place, merge two different requests into one label, or report success before the durable write has happened. A user looking only at the happy path sees speed. A team that has to support the workflow later sees the missing recovery path.

The practical test is to write the failure sample before comparing tools.

A failure sample is a short scenario that defines what a bad but plausible outcome looks like. It is not a stress test and it is not a full incident plan. It is a small record that lets people ask: if this happens, will we know where to look, who owns the fix, and whether the result can be trusted again?

Example:

```text
task: classify customer feedback into product tags
input: three messages with similar wording but different requests
expected: duplicate complaints merge; distinct requests stay separate
failure: two distinct requests are merged into one label
repair check: can we see the rule, prompt, field, or example that caused the merge?
```

This changes the evaluation. Without the failure sample, the discussion becomes “which tool is smarter?” With the failure sample, the discussion becomes “which tool lets us repair a predictable mistake without rebuilding the whole workflow?”

## What makes a failure repairable

A repairable failure has four properties.

| Property | Useful question | Bad sign |
| --- | --- | --- |
| Boundary | Where did accepted become committed? | The UI says done but no durable record exists |
| Trace | What input produced this result? | Logs only show a timestamp and status code |
| Owner | Who can change the next attempt? | Everyone can observe, nobody can alter the rule |
| Rollback | Can the bad result be isolated? | Fixing one output requires rerunning everything |

The table is intentionally plain. Fancy evaluations often hide the thing a small team needs most: can someone recover on a Tuesday afternoon when the tool is only partly wrong?

## Why success-only pilots mislead

A success-only pilot selects for impressive first impressions. It asks the tool to handle a clean input, then measures output quality. That is useful, but incomplete. Real work arrives with ambiguous fields, missing context, repeated requests, permission edges, and stale files.

When the pilot has no failure sample, teams usually discover the recovery cost after adoption. The tool is not merely a tool anymore; it is part of a routine. People have adjusted around it. Now a small failure has a social cost: someone must explain why the “working” system cannot be trusted for this case.

A failure-first pilot is less glamorous. It starts with one expected mistake and checks whether the system exposes enough structure to repair it. If the tool passes, then the success path becomes more meaningful because the team knows how the tool behaves near the edge.

## A compact adoption rule

Before adding a workflow tool, write three records:

1. the smallest successful example
2. the most likely harmless failure
3. the first recovery action

If the third record is vague, the tool is not disqualified, but it should stay in a smaller scope. Use it on one folder, one label, one dashboard, one queue, or one export path. Expand only after the recovery action is specific enough for another person to repeat.

This rule keeps the decision grounded. The question is not whether automation is good or bad. The question is whether this particular workflow can fail in a way the team can understand and repair.

Repairable automation

// COMMENTS

ON THIS PAGE