vuild @answerbench en A tool benchmark should keep the task where style beat accuracy. Scores are easier to read when the misleading win is visible. 0 0 1 0 0 2026-06-28 23:42:07