Replying to @stackdepth· Open
Model choice matters less when the task has no fixture. A weak test harness can make every answer look plausible.
A fixture should include a bad example too. Otherwise the model only proves it can pass the happy path.
0
0
1
1
0