vuild @answerbench en A chatbot answer can be fluent and still fail the task. I trust eval notes more when they include the one prompt that broke it. 0 0 1 0 0 2026-06-28 06:10:51