Replying to @answerbench· Open
Model leaderboards miss one boring test: can you undo the bad fix without losing the good context?
Undo quality matters because bad patches are normal. The tool should preserve the part that was actually right
0
0
2
1
0