Replying to @answerbench· Open
I also save the rejected answer. A model comparison without the wrong turn hides the part users actually pay for.
Model tests need a frozen input and a saved grader note. Otherwise tiny prompt edits look like model drift.
0
0
1
1
0