Session-start hang: a run that produces no first event blocks until the full turn timeout (no time-to-first-event guard)

## Summary

A run whose embedded engine launches but never produces a **first session event**
(no assistant message, no tool call, nothing) is only caught by the overall
per-task turn timeout (`timeout_seconds`). Because that timeout must be sized for
the *slowest legitimate full turn*, a fast "session-start hang" can block for many
minutes — or hours — before it is reaped, and it surfaces as the generic
`waiting for session.idle: context deadline exceeded`, indistinguishable from a
genuinely long-running turn that timed out.

There is no separate **time-to-first-event** guard.

## Where it comes from (code references, `main`)

The whole turn is one blocking call governed by a single deadline:

- `internal/orchestration/runner.go` — `executeRun` wraps the entire `Execute`
  call in `context.WithTimeout(ctx, timeout)` where `timeout` comes from
  `executionTimeout(tc)`, i.e. **only** `Config.TimeoutSec` / `TestCase.TimeoutSec`.
  That single deadline must cover the slowest legitimate full turn.
- `internal/execution/copilot.go` — `Execute` calls
  `session.SendAndWait(ctx, …)`, which blocks until the session reaches a
  terminal state or `ctx` expires.
- `internal/execution/session_events_collector.go` — the collector's `done`
  channel is closed **only** on a terminal `SessionEventTypeSessionIdle` /
  `SessionEventTypeSessionError`. A session that emits **no events at all** never
  signals done, so `SendAndWait` blocks for the full `ctx` deadline.

`internal/execution/engine.go` even documents that the deadline is entirely the
caller's: *"The caller controls cancellation and deadlines through ctx."* There is
no internal first-event guard anywhere (a repo search for a "first event" /
"time to first" timeout finds only an unrelated PRD metric).

## Why one coarse timeout can't catch it

`timeout_seconds` is necessarily large for long multi-turn / lifecycle evals
(e.g. a downstream consumer runs golden suites at `timeout_seconds: 14400` = 4 h).
A no-first-turn wedge therefore burns up to that entire budget before being
reaped — turning a fast failure into a multi-minute/hour stall and losing the
data point — even though a healthy session emits its first event within seconds.

## Observed symptom (downstream)

A `copilot-sdk` run launched the embedded engine, logged
`level=INFO msg="using Copilot CLI" … copilot_1.0.49.exe`, then emitted **nothing
further** — no transcript, no grades — until an external watchdog killed it ~38
min in. The captured output was only the process startup banner (~251 bytes);
`ExecutionResponse.Events` was empty.

## Proposed fix

Add a **time-to-first-event** deadline distinct from `timeout_seconds`:

- Signal the first event of any type from `SessionEventsCollector` (a `FirstEvent()`
  channel closed on the first `On(...)` call).
- In `copilot.go` `Execute`, arm a short timer before `SendAndWait` that cancels a
  child context (with a distinct cause) if no event arrives in time, and disarm it
  the instant the first event lands. Map that cancellation cause to a distinct,
  actionable error (e.g. `session start timeout: no first turn within Ns`) so a
  session-start hang is not conflated with a normal long turn that timed out.
- New opt-in config `first_event_timeout_seconds` (eval `Config` + per-task
  override), defaulting to `0` (disabled) so existing specs are unchanged.

This fires on the genuine signature (no events at all) and errs toward never
false-aborting a slow-but-live first turn (any event disarms it). Keeping the
window comfortably above realistic first-turn latency avoids false positives.

A deeper fix could also live in `github/copilot-sdk` (surface a `session.error`
when session bootstrap / the first model request stalls), but waza should defend
regardless — an embedded engine should never be able to wedge the runner for the
full (large) turn budget.

I'll open a PR implementing the above.

## Environment

Windows, embedded Copilot SDK `copilot_1.0.49`, `copilot-sdk` executor, agent on a
long multi-turn (golden/lifecycle) suite with a large `timeout_seconds`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Session-start hang: a run that produces no first event blocks until the full turn timeout (no time-to-first-event guard) #320

Summary

Where it comes from (code references, `main`)

Why one coarse timeout can't catch it

Observed symptom (downstream)

Proposed fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Session-start hang: a run that produces no first event blocks until the full turn timeout (no time-to-first-event guard) #320

Description

Summary

Where it comes from (code references, main)

Why one coarse timeout can't catch it

Observed symptom (downstream)

Proposed fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Where it comes from (code references, `main`)