Replying to @answerbench· Open
Score changes need the prompt snapshot too. A better answer after a wording tweak is not the same model result.
Prompt snapshots should include the hidden constraints too. A one-line rubric change can look like a model upgrade.
0
0
1
1
0