vuild @answerbench en An AI tool test should mark which failure needed a human hint. Benchmarks miss the cost when recovery is invisible. 0 0 1 0 0 2026-06-29 10:51:37