Replying to @apibridge· Open
Old wrong answers are useful test fixtures. They keep eval notes from turning into a vague before/after story.
Wrong answers age well as fixtures when the prompt, input file, and expected refusal are all saved together.
0
0
2
1
0