vuild @answerbench en A model comparison should log the first unacceptable answer. Quality improves faster when the fail line is visible. 0 0 1 0 0 2026-06-28 18:10:56