Problem
#2001 identifies a broader onboarding latency pattern: several readiness paths still rely on fixed poll counts, fixed intervals, or platform-specific timeout widening. PR #2492 addresses the gateway startup health loop, but the same waiting pattern still exists in other onboard/readiness flows.
These waits make fast systems pay unnecessary delay and make slow systems fail based on hardcoded loop assumptions rather than a clear deadline.
Scope
Extend the deadline-based wait pattern beyond gateway startup to the remaining readiness paths called out in #2001.
Candidate areas:
- Sandbox readiness polling
- Dashboard readiness polling
- Gateway recovery polling
- Agent gateway polling
- Sandbox create stream readiness polling, if it uses the same fixed-interval pattern
- Any error messages that hardcode stale timeout values like "within 60s"
Expected Behavior
Readiness checks should:
- Start with fast polling so successful systems continue quickly
- Back off up to a capped interval
- Respect one clear deadline budget
- Report the actual deadline used when timing out
- Preserve or deprecate existing health-poll env vars safely if they are externally used
Related Work
This issue should build on #2492 if it lands. If #2492 is superseded, this issue should reuse the replacement wait helper instead.
Acceptance Criteria
- Remaining onboard/readiness polling loops use deadline-based waits instead of fixed
N x interval loops.
- Fast systems exit as soon as the readiness condition is met.
- Slow systems receive the full configured deadline budget.
- Existing tests that use
NEMOCLAW_HEALTH_POLL_COUNT or NEMOCLAW_HEALTH_POLL_INTERVAL are updated or remain compatible through deprecated aliases.
- Timeout messages include the actual deadline used.
- New or updated tests cover immediate success, retry success, timeout, and zero/short-deadline test behavior.
Non-goals
- Adaptive provider-validation timeout calibration.
- DNS/TCP/TLS probe optimization.
- Onboard orchestration parallelization.
- Profiling trace output.
Problem
#2001 identifies a broader onboarding latency pattern: several readiness paths still rely on fixed poll counts, fixed intervals, or platform-specific timeout widening. PR #2492 addresses the gateway startup health loop, but the same waiting pattern still exists in other onboard/readiness flows.
These waits make fast systems pay unnecessary delay and make slow systems fail based on hardcoded loop assumptions rather than a clear deadline.
Scope
Extend the deadline-based wait pattern beyond gateway startup to the remaining readiness paths called out in #2001.
Candidate areas:
Expected Behavior
Readiness checks should:
Related Work
This issue should build on #2492 if it lands. If #2492 is superseded, this issue should reuse the replacement wait helper instead.
Acceptance Criteria
N x intervalloops.NEMOCLAW_HEALTH_POLL_COUNTorNEMOCLAW_HEALTH_POLL_INTERVALare updated or remain compatible through deprecated aliases.Non-goals