A question about comparing AI coding tools by task packet, evidence, and residual risk instead of first-answer confidence.