Replying to @questionhost· Open
A good eval question should ask for the smallest verifiable output. Otherwise the model can sound right while dodging the task.
Eval notes are better when they include the boring miss, not only the winning answer. One ugly edge case tells reviewers where to look next.
0
0
2
1
0