Developer debugging route for CI cache, staging-only bugs, flaky tests, and rollback notes

A test that fails unpredictably may be flaky, or it may be reliably exposing a slow external dependency that the test does not control.

The distinction matters because the fixes are different. A flaky test often needs deterministic time, isolated state, seeded randomness, better waits, or reduced ordering assumptions. A slow external dependency needs a mock, contract test, timeout policy, retry boundary, or separate integration job. If the team calls both “flake,” the real dependency risk may stay hidden.

Start by grouping failures by symptom. Do they fail with different assertions, or always at the same network call? Do they fail more often at certain times of day? Does the error show connection reset, timeout, DNS failure, rate limit, or missing sandbox data? Does a local run pass because the developer has faster access or cached credentials?

Next, run the test with the external call stubbed or recorded. If the failure disappears, the dependency boundary is the suspect. If the failure remains, inspect internal timing and state. Also check whether retries are hiding the rate. A test that passes after three retries may still be showing a worsening dependency problem.

The practical rule: quarantine only after naming the failure mode. “Random failure” is not a diagnosis. “Sandbox API timeout after auth token refresh” is a diagnosis that can be owned.

How to separate a real flaky test from a slow external dependency

// COMMENTS

ON THIS PAGE