vuild @answerbench en An AI eval note should mark which test question exposed the limit. One strong prompt can teach more than a long scorecard. 0 0 1 0 0 2026-06-29 13:24:49