Summary
A run whose embedded engine launches but never produces a first session event
(no assistant message, no tool call, nothing) is only caught by the overall
per-task turn timeout (timeout_seconds). Because that timeout must be sized for
the slowest legitimate full turn, a fast "session-start hang" can block for many
minutes — or hours — before it is reaped, and it surfaces as the generic
waiting for session.idle: context deadline exceeded, indistinguishable from a
genuinely long-running turn that timed out.
There is no separate time-to-first-event guard.
Where it comes from (code references, main)
The whole turn is one blocking call governed by a single deadline:
internal/orchestration/runner.go — executeRun wraps the entire Execute
call in context.WithTimeout(ctx, timeout) where timeout comes from
executionTimeout(tc), i.e. only Config.TimeoutSec / TestCase.TimeoutSec.
That single deadline must cover the slowest legitimate full turn.
internal/execution/copilot.go — Execute calls
session.SendAndWait(ctx, …), which blocks until the session reaches a
terminal state or ctx expires.
internal/execution/session_events_collector.go — the collector's done
channel is closed only on a terminal SessionEventTypeSessionIdle /
SessionEventTypeSessionError. A session that emits no events at all never
signals done, so SendAndWait blocks for the full ctx deadline.
internal/execution/engine.go even documents that the deadline is entirely the
caller's: "The caller controls cancellation and deadlines through ctx." There is
no internal first-event guard anywhere (a repo search for a "first event" /
"time to first" timeout finds only an unrelated PRD metric).
Why one coarse timeout can't catch it
timeout_seconds is necessarily large for long multi-turn / lifecycle evals
(e.g. a downstream consumer runs golden suites at timeout_seconds: 14400 = 4 h).
A no-first-turn wedge therefore burns up to that entire budget before being
reaped — turning a fast failure into a multi-minute/hour stall and losing the
data point — even though a healthy session emits its first event within seconds.
Observed symptom (downstream)
A copilot-sdk run launched the embedded engine, logged
level=INFO msg="using Copilot CLI" … copilot_1.0.49.exe, then emitted nothing
further — no transcript, no grades — until an external watchdog killed it ~38
min in. The captured output was only the process startup banner (~251 bytes);
ExecutionResponse.Events was empty.
Proposed fix
Add a time-to-first-event deadline distinct from timeout_seconds:
- Signal the first event of any type from
SessionEventsCollector (a FirstEvent()
channel closed on the first On(...) call).
- In
copilot.go Execute, arm a short timer before SendAndWait that cancels a
child context (with a distinct cause) if no event arrives in time, and disarm it
the instant the first event lands. Map that cancellation cause to a distinct,
actionable error (e.g. session start timeout: no first turn within Ns) so a
session-start hang is not conflated with a normal long turn that timed out.
- New opt-in config
first_event_timeout_seconds (eval Config + per-task
override), defaulting to 0 (disabled) so existing specs are unchanged.
This fires on the genuine signature (no events at all) and errs toward never
false-aborting a slow-but-live first turn (any event disarms it). Keeping the
window comfortably above realistic first-turn latency avoids false positives.
A deeper fix could also live in github/copilot-sdk (surface a session.error
when session bootstrap / the first model request stalls), but waza should defend
regardless — an embedded engine should never be able to wedge the runner for the
full (large) turn budget.
I'll open a PR implementing the above.
Environment
Windows, embedded Copilot SDK copilot_1.0.49, copilot-sdk executor, agent on a
long multi-turn (golden/lifecycle) suite with a large timeout_seconds.
Summary
A run whose embedded engine launches but never produces a first session event
(no assistant message, no tool call, nothing) is only caught by the overall
per-task turn timeout (
timeout_seconds). Because that timeout must be sized forthe slowest legitimate full turn, a fast "session-start hang" can block for many
minutes — or hours — before it is reaped, and it surfaces as the generic
waiting for session.idle: context deadline exceeded, indistinguishable from agenuinely long-running turn that timed out.
There is no separate time-to-first-event guard.
Where it comes from (code references,
main)The whole turn is one blocking call governed by a single deadline:
internal/orchestration/runner.go—executeRunwraps the entireExecutecall in
context.WithTimeout(ctx, timeout)wheretimeoutcomes fromexecutionTimeout(tc), i.e. onlyConfig.TimeoutSec/TestCase.TimeoutSec.That single deadline must cover the slowest legitimate full turn.
internal/execution/copilot.go—Executecallssession.SendAndWait(ctx, …), which blocks until the session reaches aterminal state or
ctxexpires.internal/execution/session_events_collector.go— the collector'sdonechannel is closed only on a terminal
SessionEventTypeSessionIdle/SessionEventTypeSessionError. A session that emits no events at all neversignals done, so
SendAndWaitblocks for the fullctxdeadline.internal/execution/engine.goeven documents that the deadline is entirely thecaller's: "The caller controls cancellation and deadlines through ctx." There is
no internal first-event guard anywhere (a repo search for a "first event" /
"time to first" timeout finds only an unrelated PRD metric).
Why one coarse timeout can't catch it
timeout_secondsis necessarily large for long multi-turn / lifecycle evals(e.g. a downstream consumer runs golden suites at
timeout_seconds: 14400= 4 h).A no-first-turn wedge therefore burns up to that entire budget before being
reaped — turning a fast failure into a multi-minute/hour stall and losing the
data point — even though a healthy session emits its first event within seconds.
Observed symptom (downstream)
A
copilot-sdkrun launched the embedded engine, loggedlevel=INFO msg="using Copilot CLI" … copilot_1.0.49.exe, then emitted nothingfurther — no transcript, no grades — until an external watchdog killed it ~38
min in. The captured output was only the process startup banner (~251 bytes);
ExecutionResponse.Eventswas empty.Proposed fix
Add a time-to-first-event deadline distinct from
timeout_seconds:SessionEventsCollector(aFirstEvent()channel closed on the first
On(...)call).copilot.goExecute, arm a short timer beforeSendAndWaitthat cancels achild context (with a distinct cause) if no event arrives in time, and disarm it
the instant the first event lands. Map that cancellation cause to a distinct,
actionable error (e.g.
session start timeout: no first turn within Ns) so asession-start hang is not conflated with a normal long turn that timed out.
first_event_timeout_seconds(evalConfig+ per-taskoverride), defaulting to
0(disabled) so existing specs are unchanged.This fires on the genuine signature (no events at all) and errs toward never
false-aborting a slow-but-live first turn (any event disarms it). Keeping the
window comfortably above realistic first-turn latency avoids false positives.
A deeper fix could also live in
github/copilot-sdk(surface asession.errorwhen session bootstrap / the first model request stalls), but waza should defend
regardless — an embedded engine should never be able to wedge the runner for the
full (large) turn budget.
I'll open a PR implementing the above.
Environment
Windows, embedded Copilot SDK
copilot_1.0.49,copilot-sdkexecutor, agent on along multi-turn (golden/lifecycle) suite with a large
timeout_seconds.