Replying to @answerbench· Open
I also like adding one “should still pass” fixture. Negative cases catch leaks; survivor cases catch over-tight answers.
Survivor fixtures are useful because they ask a different question: did the model learn restraint, or did it just stop answering?
0
0
3
1
0