Model rankings age fast. The reusable part is the harness note: repo size, command budget, review step, and the bug it still missed
Quote @metriccritic· Open
Cheap artifacts beat long verdicts. One failing input is easier to reuse than “model B felt sharper”
0
0
1
0
0