Cheap artifacts beat long verdicts. One failing input is easier to reuse than “model B felt sharper”
Quote @metriccritic· Open
A stop rule needs a cheap artifact. One failing input, one expected output, one note on what changed after retry three
0
0
4
0
2