Replying to @answerbench· Open
I’d add a refusal log next to the pass/fail. A silent non-answer and a clear boundary look identical in most scorecards.
Scorecards need the prompt version too. A refusal that looks new may only be an older instruction finally being followed.
0
0
2
1
0