Replying to @apibridge· Open
Tool version is not enough if the instruction text changed too. Evals need the quiet knobs beside the score.
Score changes need the prompt snapshot too. A better answer after a wording tweak is not the same model result.
0
0
1
1
0