fix(runtime-fallback): harden delegated fallback retry flow by andomeder · Pull Request #4223 · code-yeongyu/oh-my-openagent

andomeder · 2026-05-20T23:55:53Z

Summary

Fixes a reproduced runtime-fallback failure chain in delegated child sessions:

immediate quota/retry statuses could advance fallback incorrectly
object-shaped/top-level message and event payloads were not normalized consistently
sync delegated retries could be dropped behind prompt-gate reservations
empty assistant completion rows could suppress fallback without producing a usable result
delegated child retries could lose their bootstrap prompt before a real user turn was persisted
fallback replays could create a second visible opening prompt

Changes

trigger runtime fallback directly on immediate provider retry/quota signals instead of treating them as successful progress
normalize object-shaped and top-level runtime-fallback/session-message payloads before fallback state, visibility checks, and sync completion checks
queue delegated sync fallback retries behind the active prompt reservation instead of dropping them as reserved
ignore empty assistant completion markers unless they carry meaningful payload
ignore synthetic/internal retry turns when rebuilding fallback retry payloads
preserve delegated child bootstrap retry parts until a real persisted user turn exists
redispatch fallback retry prompts as internal-only prompts
mark internal retry text parts as synthetic: true so they stay hidden in history instead of appearing as a second user prompt
cancel stale queued runtime-fallback prompts after successful retry progression
add regression coverage for the reproduced hang and duplicate-prompt paths

Testing

bun test src/hooks/runtime-fallback/auto-retry.test.ts src/hooks/runtime-fallback/index.test.ts src/hooks/runtime-fallback/message-update-handler.test.ts src/tools/delegate-task/sync-session-poller.test.ts
bun test src/shared/internal-initiator-marker.test.ts src/hooks/runtime-fallback/auto-retry.test.ts src/hooks/runtime-fallback/index.test.ts src/hooks/runtime-fallback/message-update-handler.test.ts src/tools/delegate-task/sync-session-poller.test.ts src/tools/delegate-task/sync-prompt-sender.test.ts src/shared/model-suggestion-retry.test.ts src/plugin/chat-message.test.ts
bun run build

Related Issues

Summary by cubic

Strengthens delegated runtime fallback and retries to prevent hangs, duplicate prompts, and dropped retries. Improves completion/visibility detection across mixed SDK payloads while ignoring provider auto‑retry noise.

Bug Fixes
- Treat immediate provider retry/quota statuses as fallback triggers (no false progress); normalize object-shaped models into provider/model strings.
- Normalize message responses (raw arrays, data, or messages) before checks; ignore provider auto‑retry text in visibility checks; count reasoning-only and token/cost-based completions as visible; ignore empty finish markers.
- Queue sync delegated retries behind an active prompt reservation; enqueue runtime fallback prompts and cancel stale queued runtime-fallback: prompts when not accepted.
- Dispatch fallback retry prompts as internal-only text parts (synthetic: true) to keep them hidden and avoid a second visible user prompt.
- Build retry payloads from the last real user turn (ignore synthetic/internal retries); keep delegated child bootstrap prompt until a real user turn is persisted.
- Sync poller: detect completion via top-level time.completed with meaningful payload; use timestamps when IDs are missing; keep polling on empty completions; improve terminal error extraction.
- Error classification: recognize “Free usage exceeded, subscribe to Go” as quota-exceeded.

^{Written for commit b2ef800. Summary will update on new commits. Review in cubic}

cubic-dev-ai

1 issue found across 19 files

Confidence score: 3/5

There is a concrete regression risk in src/hooks/runtime-fallback/visible-assistant-response.ts: auto-retry signal messages can still hit the completion/progress branch and suppress fallback, even though retry signals are meant to be excluded.
The issue is high severity (8/10) with high confidence (10/10), so this is more than a minor edge case and could produce user-visible fallback behavior problems.
Given this specific, user-impacting logic gap, merge risk is moderate rather than blocking-critical, but it should be addressed or carefully validated before release.
Pay close attention to src/hooks/runtime-fallback/visible-assistant-response.ts - retry-signal exclusion can be bypassed by completion/progress handling and suppress fallback incorrectly.

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai

0 issues found across 2 files (changes from recent commits).

_{Requires human review: The PR makes extensive, logic-sensitive changes across fallback retries, message normalization, and session handling; despite the clean AI review, the custom '100% no regressions' threshold is not met for such a large and complex diff.

Re-trigger cubic}

code-yeongyu · 2026-05-21T06:37:48Z

[sisyphus-bot] Hi andomeder. 🙏 Thanks for the very thorough work here. I read through every file in the 19-file diff, and I want to be transparent about why I'm not landing this without a maintainer pass even though Cubic's automated review was clean.

What I checked and liked

src/shared/prompt-async-gate.ts gets one new exported function (cancelQueuedInternalPrompts) plus a private sourceMatches helper. The change is purely additive: existing reservation, dispatch, and release behavior is untouched, and the new function only operates on already-queued entries by source/source-prefix. That's the right shape for adding cancellation without rewriting the gate semantics.
src/shared/internal-initiator-marker.ts introduces createInternalAgentTextPart and isSyntheticOrInternalTextPart, so internal retries can be tagged distinctly from synthetic assistant placeholders. That distinction lets visible-assistant-response.ts and session-status-handler.ts filter out retry signals when they're checking visibility, which addresses the "empty assistant completion rows could suppress fallback" failure mode in the PR description.
The new error-classifier.ts and auto-retry-signal.test.ts are clean factor-outs. Splitting "is this an immediate-quota error?" / "is this a retryable runtime error?" into a dedicated module makes the retry path much easier to reason about, and the table-style tests are exhaustive.
Test coverage is genuinely strong: auto-retry.test.ts, message-update-handler.test.ts (+116), session-status-handler.test.ts (+74), sync-session-poller.test.ts (+123), and the new test files all pin the failure scenarios described in the PR body. The "delegated child retries could lose their bootstrap prompt before a real user turn was persisted" case is locked in particular.
fallback-bootstrap-model.ts and last-user-retry-parts.ts are factored into dedicated modules instead of inlined; that helps the rest of the auto-retry surface stay small.

Why I want a maintainer eye before merge

src/shared/prompt-async-gate.ts is the most sensitive module in the repo (see the "Internal message injection is dangerous" section in AGENTS.md). Even an additive change there warrants a careful read against the gate invariants document, the existing route-audit test, and the duplicate-injection regression tests. I'd like code-yeongyu or another maintainer to confirm the new cancelQueuedInternalPrompts API can't be used to skip reservation cleanup paths in any caller down the line.
The auto-retry.ts change wires in the new error-classifier and the new internal-initiator-marker, and the new event-handler path. The integration surface across runtime-fallback/* is large enough that the maintainer's mental model of how the recovery loop actually behaves in production is the right tie-breaker.
Cubic's auto-approve explicitly declined here ("100% no regressions threshold is not met for such a large and complex diff"), which lines up with my read; the diff is correct in the small but the integration is the open question.

Concrete observations (mostly non-blocking, one worth confirming)

The new cancelQueuedInternalPrompts(sessionID, { sourcePrefix }) does not call back into any reservation release path. Callers need to make sure the prompts they cancel weren't already counted against an active reservation; otherwise the reservation could outlive the queued work. I see the auto-retry call site does walk the right release order, but the public API itself doesn't enforce this; a comment near the function declaration noting that constraint would help future callers.
isSyntheticOrInternalTextPart is now used in both visible-assistant-response.ts and session-status-handler.ts. Worth a sanity check that both call sites agree on which signals count as "internal" so the visibility heuristic doesn't drift between them later.
The PR is on top of a base that's now CONFLICTING against dev. A rebase will be needed before any final merge anyway, and that rebase is a natural chance to surface any of the integration concerns above.

Going to leave this for a maintainer to take the rest of the way. Thanks again for the careful structure; the additive shape and the test coverage made it possible to write this much detail without me having to guess at intent.

…n internal markers Design: the same OMO_INTERNAL_INITIATOR_MARKER token serves two semantically distinct contexts that MUST stay separate. VISIBLE markers (createInternalAgentTextPart, no synthetic flag): PR code-yeongyu#4003 agent-handoff identity markers — the agent MUST read its new identity from the chat stream. If these were synthetic, the LLM would be blind to its own role change and fail to switch identities. HIDDEN markers (createInternalAgentContinuationTextPart, synthetic: true): PR code-yeongyu#4223 internal retry/compaction prompts — invisible to the LLM as user input so they do not pollute the agent's perceived conversation. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

fix(runtime-fallback): harden delegated fallback retry flow

ea66572

cubic-dev-ai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/hooks/runtime-fallback/visible-assistant-response.ts Outdated

fix(runtime-fallback): ignore retry signals in visibility checks

b2ef800

cubic-dev-ai Bot reviewed May 21, 2026

View reviewed changes

code-yeongyu force-pushed the dev branch 2 times, most recently from eb25d29 to 2bfad49 Compare May 23, 2026 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(runtime-fallback): harden delegated fallback retry flow#4223

fix(runtime-fallback): harden delegated fallback retry flow#4223
andomeder wants to merge 2 commits into
code-yeongyu:devfrom
andomeder:wip/pr2-harden-delegated-runtime-fallback

andomeder commented May 20, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

code-yeongyu commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andomeder commented May 20, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Related Issues

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

code-yeongyu commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andomeder commented May 20, 2026 •

edited by cubic-dev-ai Bot

Loading

cubic-dev-ai Bot left a comment •

edited

Loading