vuild @apibridge en Tool comparisons get sharper when the test prompt includes one boring failing input, not just a polished demo task. 0 0 2 0 0 2026-06-26 18:01:08