pi-embedded-runner: stale sessionLastAssistant leaks prior provider's error string into later candidates in model-fallback

## Summary

In `pi-embedded-runner`, the model-fallback loop reuses a single shared session file across every candidate provider. When the first candidate writes an assistant row with an `errorMessage` (e.g. OpenAI returns a real 429), and a later candidate (e.g. Anthropic, Google) times out without producing a new assistant for the current attempt, the failover path falls back to `sessionLastAssistant.errorMessage` from the shared session file and reports the **previous provider's error string** as if it came from the current candidate.

The net effect: a single real upstream error from provider A is re-surfaced as the failure cause for providers B and C, producing false-positive "all providers failed with the same error" output.

## Where

- `src/agents/pi-embedded-runner/run/attempt.ts` — `lastAssistant` is computed by scanning `messagesSnapshot` for the most recent assistant **regardless of provider**.
- Bundled (production) lines from a 2026-05-24 build:
  - `pi-embedded-Bcz04p2i.js:2865` (failover error construction):
    ```js
    const assistantForFailover = currentAttemptAssistant ?? sessionLastAssistant;
    …
    new FailoverError(resolveAssistantFailoverErrorMessage(params), {
      …,
      rawError: params.lastAssistant?.errorMessage?.trim(),
    });
    ```
  - `model-fallback-DIXhOaxb.js:379` (`recordFailedCandidateAttempt`) stores `error: described.rawError ?? described.message`, so the stale `rawError` wins over the candidate-attributed `message`.

(File names with hashes are from the published build artifact; map back to the corresponding source modules.)

## Reproduction / observed cascade

1. OpenAI candidate hits a real 429 → session file now contains an assistant row with `errorMessage = "You exceeded your current quota…"` (and OpenAI as provider).
2. `runWithModelFallback` advances to Anthropic and spawns a fresh `runEmbeddedPiAgent` against the **same** `sessionFile`/`sessionId`.
3. The Anthropic request queues / hangs / aborts at the run-level timeout — no new assistant produced this attempt.
4. Failover-decision construction sees `currentAttemptAssistant === undefined` and falls back to `sessionLastAssistant` — which is still the OpenAI errored row.
5. The resulting `FailoverError` carries the OpenAI quota text as `rawError`, attributed (by the outer model-fallback) to Anthropic.
6. Same again for Google.

## Smoking gun in our logs

Every `[agent/embedded] embedded run failover decision` line for two distinct runs (`3c5d7ca0-83df-418b-be48-a9327459046a` and `b9ad1b27-…`) logs `from=openai/gpt-5.5` — **including** the decisions that the outer model-fallback layer interprets as Anthropic and Google candidate failures. There are zero `from=anthropic/…` or `from=google/…` decision logs. Inner pi-embedded-runner never saw a non-OpenAI-attributed assistant error.

## Evidence table

For one failed run (`runId=3c5d7ca0-83df-418b-be48-a9327459046a`, 2026-05-24 06:38–06:43 PT):

| Candidate | OUTER `reason` | OUTER `detail` (logged error string) | Real upstream call status |
|---|---|---|---|
| openai/gpt-5.5 | `rate_limit` | OpenAI quota text | **Real 429** — `/v1/responses` returned a genuine quota error from OpenAI (proxy logs confirm) |
| openai/gpt-5.5 (retry, different profile) | `rate_limit` | OpenAI quota text | **Real 429** — same |
| anthropic/claude-opus-4-7 | `timeout` | OpenAI quota text | **No Anthropic 429 observed.** Run ran ~123s and ended on run-level timeout; upstream proxy shows queued/rate-limited entries but no terminal quota-exhausted response for opus-4-7 |
| google/gemini-3-pro-preview | `timeout` | OpenAI quota text | Same shape — `reason=timeout`, but `detail` is again the OpenAI quota text verbatim |

Note specifically: two of the three statuses are `timeout`, not `rate_limit` — real quota errors produce immediate 429s, not run-level timeouts. And the error message is verbatim **OpenAI's** quota string; Anthropic and Google use entirely different wording for billing/quota errors.

## Impact

- Misleads operators into believing all three providers are concurrently exhausted, when only one actually is.
- Drives unnecessary top-up / billing action on providers that aren't out of quota.
- Makes accurate triage of multi-provider gateways effectively impossible because the surfaced `detail` is unreliable for any candidate after the first failing provider.
- Hard to spot from the operator side because the `detail` looks like a fully-formed quota error.

In our own incident this caused a false-positive triple-provider quota outage that was only caught by RCA inspection of inner failover-decision logs.

## Suggested fix

Two complementary guards:

1. **In pi-embedded-runner failover construction** (around the `assistantForFailover = currentAttemptAssistant ?? sessionLastAssistant` site):
   - If `currentAttemptAssistant` is undefined AND `sessionLastAssistant?.provider !== <this candidate's provider>`, do **not** propagate `sessionLastAssistant.errorMessage` as the `FailoverError.rawError`. Fall through to the candidate-attributed default (e.g. `"LLM request timed out."` / `"no response from provider"`).
2. **In `recordFailedCandidateAttempt`** (the `error: described.rawError ?? described.message` site in `model-fallback-…`):
   - Additionally guard: if `described.provider` (from `rawError` attribution) differs from `params.candidate.provider`, prefer `described.message` over `described.rawError`.

Either guard alone would have prevented the false-positive in our case; both together are belt-and-suspenders.

## Workaround

Operators can read `from=` in `[agent/embedded] embedded run failover decision` log lines instead of trusting the surfaced `detail` / outer error string. The workaround works but is fragile — it requires log access and inner-line correlation, and any wrapper that surfaces the outer error to humans (Slack alert, dashboard) will still show the stale text.

## Severity

Medium. Functional fallback still works (requests do route to the next provider), and the workaround exists. But the misleading attribution actively damages triage of multi-provider failures, which is exactly when accurate signals matter most.

## Filed by

Kentro.io engineering. Internal RCA tracked at KEN-4598 / KEN-4603. Happy to attach more log excerpts or test against a patch if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pi-embedded-runner: stale sessionLastAssistant leaks prior provider's error string into later candidates in model-fallback #86077

Summary

Where

Reproduction / observed cascade

Smoking gun in our logs

Evidence table

Impact

Suggested fix

Workaround

Severity

Filed by

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Candidate	OUTER `reason`	OUTER `detail` (logged error string)	Real upstream call status
openai/gpt-5.5	`rate_limit`	OpenAI quota text	Real 429 — `/v1/responses` returned a genuine quota error from OpenAI (proxy logs confirm)
openai/gpt-5.5 (retry, different profile)	`rate_limit`	OpenAI quota text	Real 429 — same
anthropic/claude-opus-4-7	`timeout`	OpenAI quota text	No Anthropic 429 observed. Run ran ~123s and ended on run-level timeout; upstream proxy shows queued/rate-limited entries but no terminal quota-exhausted response for opus-4-7
google/gemini-3-pro-preview	`timeout`	OpenAI quota text	Same shape — `reason=timeout`, but `detail` is again the OpenAI quota text verbatim

Uh oh!

pi-embedded-runner: stale sessionLastAssistant leaks prior provider's error string into later candidates in model-fallback #86077

Description

Summary

Where

Reproduction / observed cascade

Smoking gun in our logs

Evidence table

Impact

Suggested fix

Workaround

Severity

Filed by

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions