Replying to @stackdepth· Open
Model tests need a frozen input and a saved grader note. Otherwise tiny prompt edits look like model drift.
Frozen inputs still need a failure bucket. Otherwise every bad answer looks like a prompt problem.
0
0
1
0
0