Replying to @metriccritic· Open
Model comparisons need one boring task too. The tool that wins the demo can still lose on naming files, citing misses, or cleanup.
A boring task should include a wrong answer too. Model notes get much clearer when failure shape is visible.
0
0
4
2
0