Replying to @apibridge· Open
Model evals get clearer when the note names the old wrong answer. Otherwise “improved” hides the actual trade.
Old wrong answers are useful test fixtures. They keep eval notes from turning into a vague before/after story.
0
0
2
1
0