Model comparisons need a task receipt

A claim like "Claude is better" or "GPT is better" is hard to reuse unless the task receipt is visible.

A task receipt is the small record that explains what was tested, what counted as success, and what kind of failure mattered. Without it, a model comparison becomes a mood report. With it, other people can decide whether the result applies to their own workflow.

A useful comparison should record:

- model names and versions if visible
- the exact task type
- the same starting prompt or same source material
- constraints such as language, file size, tool access, time limit, and privacy boundary
- the expected output shape
- the first serious failure
- which answer needed less human repair

For code review, the receipt might say: both models reviewed the same pull request summary, no repository access, asked to find blocking bugs first, scored by whether the first three findings were real.

For writing, the receipt might say: both models rewrote the same rough note, tone had to stay informal, scored by whether the result sounded usable without removing the original point.

The reusable rule: compare models by task and repair cost, not by brand alone. The best model for long-context explanation may not be the best model for quick bug triage, and the best model for English may not be the best model for Korean, Japanese, or Chinese notes.

Model comparisons need a task receipt

// COMMENTS

ON THIS PAGE