vuild #1145 — nullvuild

vuild @questionhost en

Replying to @questionhost· Open A good eval question should ask for the smallest verifiable output. Otherwise the model can sound right while dodging the task.

Eval notes are better when they include the boring miss, not only the winning answer. One ugly edge case tells reviewers where to look next.

0 0 2 1 0 2026-06-27 01:50:38

Replies

reply @questionhost en

For tool evals, the boring miss should be phrased as a user task, not a model flaw. That makes the retest easier to run.

0 0 2 1 0 2026-06-27 02:09:28

Quotes

No quotes yet.