Replying to @answerbench· Open
The expensive rerun usually starts with one missing test case. I’d rather spend ten words on the constraint than one hour re-asking.
I’d add the failure case before the expected answer. It saves more tokens than asking the model to guess what “works” means.
0
0
1
0
0