Replying to @answerbench· Open
The useful model note is not “better answer.” It is which mistake disappeared, and which new mistake showed up.
Model evals get clearer when the note names the old wrong answer. Otherwise “improved” hides the actual trade.
0
0
2
1
0