Skip to content

Harden main webchat reliability#75776

Closed
ZRAIVenture wants to merge 5 commits into
openclaw:mainfrom
ZRAIVenture:codex/openclaw-ui-reliability-upstream
Closed

Harden main webchat reliability#75776
ZRAIVenture wants to merge 5 commits into
openclaw:mainfrom
ZRAIVenture:codex/openclaw-ui-reliability-upstream

Conversation

@ZRAIVenture

@ZRAIVenture ZRAIVenture commented May 1, 2026

Copy link
Copy Markdown

Summary

  • Problem: main WebChat could appear stuck or lose visible continuity because accepted browser sends were not surfaced early enough, brief Control UI detach/reattach could tear down websocket state, and high-context sessions could wait too long before native compaction.
  • Why it matters: users could send a prompt, see no acknowledgement/progress, refresh to recover, and risk losing confidence that the backend accepted the turn.
  • What changed: added early WebChat preflight/receipt handling, pending user-turn history continuity, delayed Control UI disconnect cleanup, safer session reset transcript allocation, earlier preflight compaction gates, and focused tests.
  • What did NOT change (scope boundary): default WebChat tool capability policy is not restricted; explicit/custom reset transcript paths are preserved; user-authored System: text is no longer globally stripped.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: WebChat accepted browser sends and streamed assistant progress were coupled too tightly to later transcript/history and socket lifecycle updates; reset could reuse stale default transcript file metadata for a new session id; and preflight compaction could miss fresh persisted token totals.
  • Missing detection / guardrail: coverage did not lock in pending user-turn visibility, stale-default reset transcript replacement, user-authored System: preservation, or the exact preflight compaction threshold behavior.
  • Contributing context (if known): local reproduction showed replies/pending turns sometimes became visible only after refresh, and short visible chats could still carry high cached context from app-server trajectory/tool history.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/gateway/server.sessions.reset-models.test.ts, src/auto-reply/reply/strip-inbound-meta.test.ts, src/gateway/server-methods/chat.inject.parentid.test.ts, src/gateway/server.chat.gateway-server-chat.test.ts, ui/src/ui/controllers/chat.test.ts, ui/src/ui/views/chat.test.ts, ui/src/ui/chat/message-normalizer.test.ts, plus compaction/memory-flush tests already in the branch.
  • Scenario the test should lock in: accepted WebChat turns remain visible through history refresh, reset avoids stale default transcript reuse while preserving explicit custom transcript paths, user-authored System: lines remain intact, preflight acknowledgement text stays ephemeral, and preflight compaction gates trigger from fresh token totals.
  • Why this is the smallest reliable guardrail: these tests cover the server history/reset/sanitizer contracts and the UI message normalization/rendering paths without requiring a browser E2E harness.
  • Existing test that already covers this (if any): existing chat history/UI tests cover parts of message normalization and rendering; this PR adds/updates focused reset, sanitizer, preflight, pending-history, and message-normalizer coverage for the review findings.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

Control UI WebChat should show a received/working state sooner, keep the pending user prompt visible during refresh/reconnect windows, and avoid unnecessary context saturation before compaction. Default WebChat tools/capabilities are unchanged by this PR.

Diagram (if applicable)

Before:
[user sends prompt] -> [chat.send accepted] -> [history/socket lag] -> [user may refresh to see progress]

After:
[user sends prompt] -> [preflight receipt + pending user turn] -> [history catches up] -> [final assistant reply]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A.

Repro + Verification

Environment

  • OS: macOS local development checkout
  • Runtime/container: local OpenClaw source checkout
  • Model/provider: N/A for automated tests
  • Integration/channel (if any): Control UI WebChat / gateway session APIs
  • Relevant config (redacted): main WebChat session agent:main:main

Steps

  1. Send a main WebChat prompt while the session is slow/high-context or while the page refreshes/reconnects.
  2. Observe whether the user turn and assistant working/receipt state are visible before final transcript history catches up.
  3. Reset a session that has a stale default transcript path and a session that has an explicit custom transcript path.
  4. Display/history-sanitize a user message beginning with System:.
  5. Exercise ordinary prompts that contain exactly to make sure they still receive preflight acknowledgement, while explicit output-only/silent prompts do not.

Expected

  • User turn remains visible after accepted send/refresh.
  • Assistant preflight/progress state can render before final reply without becoming a durable assistant message.
  • Reset allocates a new default transcript for a new session id, but preserves explicit custom transcript files.
  • User-authored leading System: text is preserved.
  • The Control UI transport label is hidden only for local user messages, not assistant/tool messages.

Actual

  • Before this PR, visibility could lag until refresh, reset could reuse a stale default transcript path, and the original branch globally stripped leading System: text. After the latest patch, focused tests cover the intended behavior and the latest Copilot review comments are addressed in the current head.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Recorded verification for the current head 4cb8cce017b0ce976c691066b524f417dd2f633a:

  • corepack pnpm test src/agents/prompt-overlay-runtime-contract.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts src/auto-reply/reply.raw-body.test.ts src/gateway/server.sessions.reset-models.test.ts passed: 3 Vitest shards, 74 tests.
  • corepack pnpm test src/gateway/server.chat.gateway-server-chat.test.ts src/gateway/server.chat.gateway-server-chat-b.test.ts ui/src/ui/controllers/chat.test.ts ui/src/ui/views/chat.test.ts ui/src/ui/chat/message-normalizer.test.ts passed: 2 Vitest shards, 169 tests.
  • corepack pnpm exec oxfmt --check --threads=1 src/agents/gpt5-prompt-overlay.ts src/agents/prompt-overlay-runtime-contract.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts src/gateway/server-methods/chat.ts ui/src/ui/controllers/chat.ts ui/src/ui/chat/run-lifecycle.ts ui/src/ui/chat/message-normalizer.ts ui/src/ui/views/chat.test.ts ui/src/ui/chat/message-normalizer.test.ts passed.
  • corepack pnpm exec tsc --noEmit --pretty false -p tsconfig.core.json passed.
  • git diff --check passed.
  • Live GitHub PR checks for current head are successful, with only expected skipped backfill-pr-labels.

Note: a prior non-elevated gateway test run hit local sandbox listen EPERM 127.0.0.1, then passed when rerun with local loopback binding allowed.

Human Verification (required)

  • Verified scenarios: source diff reviewed against BunsDev, ClawSweeper, Barnacle, and Copilot findings; focused prompt-overlay, server, UI, and session tests passed; PR branch head is 4cb8cce017b0ce976c691066b524f417dd2f633a.
  • Edge cases checked: custom reset transcript path preservation, stale default reset transcript replacement, preservation of user-authored System: lines, no hard-coded WebChat toolsAllow remains, Control UI transport sender label is hidden only for user messages, ordinary prompts containing exactly still receive acknowledgement, upstream heartbeat-ack filtering remains combined with this PR's ephemeral preflight-stream guard, and the global GPT-5 stable prefix no longer asks the model to emit a second durable acknowledgement.
  • What you did not verify: full browser E2E on GitHub CI, maintainer-specific release packaging, or a production OpenClaw update containing this PR.

Real behavior proof

  • Behavior or issue addressed: Main Control UI WebChat reliability for accepted browser sends: pending user turns should remain visible through history refresh/reconnect, preflight/working state should not become a persisted assistant final, generic runtime system events should still reach the reply run, and the PR branch should remain mergeable after upstream main churn.
  • Real environment tested: macOS local OpenClaw source checkout at PR head 4cb8cce017b0ce976c691066b524f417dd2f633a, branch codex/openclaw-ui-reliability-upstream. GitHub CLI/API was run against the live PR openclaw/openclaw#75776; focused Control UI/Gateway behavior was run from the local source checkout.
  • Exact steps or command run after this patch: rebased the branch onto current main, resolved the system-event test conflict created by upstream's drainFormattedSystemEventBlock shape, removed the global GPT-5 stable-prefix progress acknowledgement so the WebChat receipt remains gateway/UI-owned and ephemeral, ran the focused verification listed above, pushed the branch, then checked live GitHub mergeability/check state for the current head.
  • Evidence after fix: Terminal capture from the live GitHub PR and current head:
$ gh pr view 75776 --repo openclaw/openclaw --json headRefOid,baseRefOid,mergeable,mergeStateStatus,reviewDecision
{"baseRefOid":"9c389487002ca7d8558bd7ae98e310a44dfee0fa","headRefOid":"4cb8cce017b0ce976c691066b524f417dd2f633a","mergeStateStatus":"UNSTABLE","mergeable":"MERGEABLE","reviewDecision":"CHANGES_REQUESTED"}

$ gh api repos/openclaw/openclaw/pulls/75776 --jq '{mergeable,mergeable_state,rebaseable,head_sha:.head.sha,base_sha:.base.sha}'
{"base_sha":"9c389487002ca7d8558bd7ae98e310a44dfee0fa","head_sha":"4cb8cce017b0ce976c691066b524f417dd2f633a","mergeable":true,"mergeable_state":"unstable","rebaseable":true}

Current checks:
Real behavior proof: pass
Socket Security: Project Report: pass
Socket Security: Pull Request Alerts: pass
auto-response: pass
dependency-change-awareness: pass
dispatch: pass
label: pass
label-issues: pass
backfill-pr-labels: skipped
  • Current-code review proof for the latest Copilot findings:
    • ui/src/ui/chat/message-normalizer.ts now hides openclaw-control-ui only when role === "user".
    • src/gateway/server-methods/chat.ts no longer treats exactly alone as a blanket acknowledgement-suppression phrase; suppression is limited to explicit silence/output-only wording.
  • Current-code review proof for the latest ClawSweeper finding:
    • src/agents/gpt5-prompt-overlay.ts no longer contains the global <progress_acknowledgement> stable-prefix block.
    • src/agents/prompt-overlay-runtime-contract.test.ts now asserts that the GPT-5 stable prefix does not contain <progress_acknowledgement>.
  • Observed result after fix: The latest PR head is mergeable against current main, checks are green/skipped as expected, the active code addresses the two Copilot findings plus the ClawSweeper P2 durable-acknowledgement finding, and focused source-run verification covers the server/UI/prompt behavior this PR changes.
  • What was not tested: Full browser E2E against a packaged release build, maintainer release packaging, and production OpenClaw update delivery were not tested in this PR checkout.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

ClawSweeper issue-comment findings were addressed in commit a4396316 and later follow-up commits; BunsDev's system-event review was addressed in the current branch; the two Copilot comments from May 9 are addressed at current head 4cb8cce017b0ce976c691066b524f417dd2f633a; the ClawSweeper P2 about a second durable GPT-5 acknowledgement is addressed by removing the global <progress_acknowledgement> stable-prefix block.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A.

Risks and Mitigations

  • Risk: WebChat acknowledgement/progress copy could be perceived as an assistant response if final streaming fails.
    • Mitigation: preflight receipt is ephemeral UI state and final transcript/history remains authoritative.
  • Risk: reset transcript handling could accidentally preserve stale default files or drop custom files.
    • Mitigation: reset now preserves only non-default/custom sessionFile paths and adds coverage for stale default replacement plus existing custom-file preservation.
  • Risk: earlier context compaction may run more often for high-context sessions.
    • Mitigation: threshold is bounded and only triggers from persisted/fresh token pressure; normal low-context sessions are unaffected.

@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime agents Agent runtime and tooling size: L triage: blank-template Candidate: PR template appears mostly untouched. triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context. labels May 1, 2026
@clawsweeper

clawsweeper Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 1:01 AM ET / 05:01 UTC.

Summary
The PR adds WebChat preflight receipt and pending user-turn history handling, delays transient Control UI disconnect cleanup, changes session reset/compaction/system-event handling, and adds focused Gateway/UI tests.

PR surface: Source +407, Tests +465, Docs +1. Total +873 across 40 files.

Reproducibility: no. this review did not reproduce a real browser send-refresh-reconnect path. The branch supplies focused tests and terminal output, but not the visible WebChat proof requested by the maintainer.

Review metrics: 2 noteworthy metrics.

  • Current-main conflict surface: 9 files conflict. The branch must be rebased and conflict-resolved before its session and WebChat behavior can be reviewed as the actual merge result.
  • Release-owned changelog edits: 1 added. CHANGELOG.md is release-owned in this repo and the active member review asks this normal PR to remove the entry.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🦐 gold shrimp
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Rebase onto current main and resolve the 9 conflicted files.
  • [P1] Add visible/redacted WebChat proof for pending user turn plus ephemeral receipt through refresh/reconnect.
  • Remove the normal PR CHANGELOG.md edit.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The PR body includes terminal/check output and focused tests, but it does not show the changed WebChat UI behavior in a real send-refresh-reconnect run; add redacted screenshot, recording, live output, or logs and update the PR body to trigger re-review.

Mantis proof suggestion
A browser-visible WebChat send-refresh-reconnect proof would materially reduce the main merge blocker. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify Control UI WebChat send-refresh-reconnect keeps the pending user turn visible and shows only ephemeral receipt text.

Risk before merge

  • [P1] The PR head conflicts with current main in session, auto-reply, Gateway, and UI chat paths, so any review before a rebase is against stale code.
  • [P1] The visible WebChat behavior is still not proven in a real browser send-refresh-reconnect flow; the supplied proof is focused tests and terminal/API output.
  • [P1] The branch changes session-state and message-delivery behavior around pending turns, preflight events, reset transcripts, and compaction, so green unit tests alone do not settle upgrade/runtime behavior.

Maintainer options:

  1. Refresh And Prove WebChat Flow (recommended)
    Have the contributor rebase onto current main, remove the changelog entry, and add redacted browser proof showing the pending user turn and ephemeral receipt through refresh/reconnect before maintainer re-review.
  2. Accept Test-Only Confidence
    Maintainers could intentionally accept focused test and terminal proof, but that would leave the visible WebChat reliability path unverified.
  3. Pause Or Replace If Conflicts Continue
    If the broad branch keeps drifting, pause this PR and land a narrower replacement for the WebChat pending-turn/receipt behavior.

Next step before merge

  • [P1] Contributor action and maintainer re-review are required; conflicts and missing real WebChat proof cannot be safely handled as a ClawSweeper repair lane.

Security
Cleared: No concrete security or supply-chain regression was found; the diff does not add dependencies, workflow permissions, secret handling, or new external code execution paths.

Review findings

  • [P3] Remove the release-owned changelog edit — CHANGELOG.md:1893
Review details

Best possible solution:

Rebase onto current main, drop the release-owned changelog edit, add visible redacted WebChat send-refresh-reconnect proof, then re-review the refreshed branch for session and message-delivery behavior before merge.

Do we have a high-confidence way to reproduce the issue?

No; this review did not reproduce a real browser send-refresh-reconnect path. The branch supplies focused tests and terminal output, but not the visible WebChat proof requested by the maintainer.

Is this the best way to solve the issue?

Unclear; the branch touches the right Gateway/UI/session surfaces, but the conflicted head and missing visible proof prevent judging it as the best final fix. A rebased, narrower, proof-backed version is the safer path.

Full review comments:

  • [P3] Remove the release-owned changelog edit — CHANGELOG.md:1893
    CHANGELOG.md is release-owned for this repo, and the active member review asks this normal PR to drop the changelog entry. Keep the release-note context in the PR body or squash/commit message instead.
    Confidence: 0.93

Overall correctness: patch is correct
Overall confidence: 0.78

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3c5f5efc8c41.

Label changes

Label justifications:

  • P1: The PR targets a broken Control UI/WebChat workflow that commenters describe as daily usability impact, but it is blocked on proof and refresh work.
  • merge-risk: 🚨 session-state: The diff changes pending user-turn state, session reset transcript allocation, lifecycle fallback status, and compaction gating for active sessions.
  • merge-risk: 🚨 message-delivery: The diff changes how WebChat sends are acknowledged, projected into history, and cleared after Gateway reply runs.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body includes terminal/check output and focused tests, but it does not show the changed WebChat UI behavior in a real send-refresh-reconnect run; add redacted screenshot, recording, live output, or logs and update the PR body to trigger re-review.
Evidence reviewed

PR surface:

Source +407, Tests +465, Docs +1. Total +873 across 40 files.

View PR surface stats
Area Files Added Removed Net
Source 24 465 58 +407
Tests 15 485 20 +465
Docs 1 1 0 +1
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 40 951 78 +873

What I checked:

  • Repository review policy applied: The full root AGENTS.md and relevant scoped Gateway/UI guides were read; root policy requires real behavior proof for user-visible changes and treats session/message-delivery surfaces as merge-risk areas. (AGENTS.md:24, 3c5f5efc8c41)
  • Current main does not contain the PR's preflight event shape: Current main's ChatEventPayload accepts only delta/final/aborted/error, while the PR branch adds a preflight state and ephemeral stream handling. (ui/src/ui/controllers/chat.ts:288, 3c5f5efc8c41)
  • Current main lacks server-side pending WebChat user-message state: Current main's ChatRunState has run buffers and aborted run maps but no pendingUserMessages map, so the PR's pending-history behavior is not already implemented on main. (src/gateway/server-chat-state.ts:73, 3c5f5efc8c41)
  • PR branch adds WebChat preflight and pending-turn behavior: The PR head adds a WebChat preflight acknowledgement payload and stores pending user messages keyed by client run id before dispatch. (src/gateway/server-methods/chat.ts:1849, dfd776ca598c)
  • Current main merge conflicts remain: A merge-tree check of the PR head against current main reports content conflicts in 9 files, including session-store, auto-reply memory/get-reply-run, gateway chat/server, and UI chat controller paths. (dfd776ca598c)
  • Live PR discussion still has maintainer blockers: The latest member comment asks for a rebase, removal of the normal PR changelog edit, visible/redacted WebChat proof, and fresh ClawSweeper/member re-review; the review state remains CHANGES_REQUESTED in the provided and live API context.

Likely related people:

  • BunsDev: Member review is the active blocker, and the related WebChat/compaction fixes in the provided context were authored by this person. (role: reviewer and recent adjacent WebChat fixer; confidence: high; commits: 2810f1219a62, bd2f8560fee6; files: ui/src/ui/controllers/chat.ts, src/gateway/server-methods/chat.ts, src/gateway/session-reset-service.ts)
  • Val Alexander: Local git history shows the chat infrastructure module work and an operator.read Control UI fix touching the same UI/Gateway surface. (role: original Control UI chat infrastructure contributor; confidence: medium; commits: c5ea6134d041, 3e2b3bd2c572; files: ui/src/ui/controllers/chat.ts, src/gateway/server-methods/chat.ts)
  • scoootscooob: Local git history shows recent work guarding stale Control UI session history reloads in the chat controller, adjacent to this PR's pending-history behavior. (role: recent WebChat history contributor; confidence: medium; commits: 34c1f43df1b6; files: ui/src/ui/controllers/chat.ts)
  • Fermin Quant: Current checked-out blame for the central UI chat and Gateway files points to the latest grafted main snapshot authored by this person, so this is a weak but relevant routing signal. (role: recent area contributor by current-main blame; confidence: low; commits: d560588e1ed8; files: ui/src/ui/controllers/chat.ts, src/gateway/server-methods/chat.ts, src/auto-reply/reply/memory-flush.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot added the triage: dirty-candidate Candidate: broad unrelated surfaces; may need splitting or cleanup. label May 1, 2026

Copy link
Copy Markdown
Author

Updated the branch in a4396316 to address the ClawSweeper findings:

  • Removed the hard-coded main WebChat toolsAllow override, so default/configured/plugin tool capabilities are no longer restricted by this PR.
  • Reworked session reset transcript selection so stale default transcript paths are replaced for new reset session ids while explicit/custom transcript paths are preserved.
  • Removed the global leading System: / System (untrusted): stripping from stripInboundMetadata; user-authored system-looking text is preserved and covered by test.
  • Added the required CHANGELOG.md entry.

Verification:

  • ./node_modules/.bin/tsc --noEmit --pretty false -p tsconfig.core.json
  • ./node_modules/.bin/tsc --noEmit --pretty false -p tsconfig.test.ui.json
  • git diff --check
  • Elevated focused vitest run: 7 files passed, 141 tests passed

The first non-elevated gateway test run hit local sandbox listen EPERM 127.0.0.1, then passed when rerun with loopback binding allowed.

@openclaw-barnacle openclaw-barnacle Bot added the extensions: memory-core Extension: memory-core label May 2, 2026
@ZRAIVenture ZRAIVenture force-pushed the codex/openclaw-ui-reliability-upstream branch 2 times, most recently from ff11103 to 0b425f7 Compare May 2, 2026 15:26
@BunsDev BunsDev self-assigned this May 2, 2026
@ZRAIVenture ZRAIVenture force-pushed the codex/openclaw-ui-reliability-upstream branch 2 times, most recently from 3a4685c to 3811b24 Compare May 2, 2026 16:27
@caiyc16888

Copy link
Copy Markdown

Dear @steipete, @vincentkoc,

We're experiencing a persistent WebChat issue where only the last message appears in the conversation view — history is not being rendered (backend data is intact, confirmed by checking session files).

This problem is directly addressed by the two PRs below, both of which remain unmerged:

Both capture the exact symptoms we're seeing: stale in-flight chat.history responses replacing local state, WebChat losing visible continuity, and brief socket detach/reattach tearing down state.

Could either of these please be reviewed and merged into the next release? This is affecting daily usability of the Control UI / WebChat. Happy to help test if a preview build is available.

Thanks 🙏

@BunsDev BunsDev left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the broad WebChat reliability work here. I am going to keep this blocked for now.

The remaining correctness issue is the main-session system-event suppression path. runPreparedReply still drains queued system events, then passes suppressSystemEventsInUserPrompt whenever the session key is agent:main:main; buildReplyPromptBodies responds by replacing the entire combined event block with an empty string. That means background, cron, node, runtime, or other queued system events can be consumed and then dropped instead of reaching the model. This regresses the existing system-event contract covered by get-reply-run.media-only.test.ts.

Please narrow this so only the specific generated WebChat noise is filtered, or avoid draining events that will not be delivered. The fix should include focused coverage for a main WebChat/session run with a queued generic system event proving that the event is still present in the prompt/follow-up body.

Duplicate/related triage:

  • #74733 is related and overlaps the WebChat history/reconciliation surface, but it is not a clean duplicate of this PR: it is a draft XL UI/Gateway ordering branch with different implementation scope and its own open findings.
  • #76446 has already merged the narrower active WebChat duplicate-send fix for #75737.
  • #76437 has already merged the narrower compaction-boundary/history UX fix for #76415.
  • #72892 should stay open for now because the exact duplicate-sender origin still is not proven closed by the active-send fix.

So I am treating this as request-changes, not close-as-duplicate.

Copy link
Copy Markdown
Author

Thanks @BunsDev. I updated the branch to address the remaining system-event blocker.

Changes:

  • Merged latest main and resolved the CHANGELOG.md conflict.
  • Removed the broad main-WebChat suppressSystemEventsInUserPrompt path, so drained generic system events are no longer consumed and dropped for agent:main:main.
  • Added focused regression coverage for a main WebChat run proving a queued generic system event still reaches runReplyAgent.commandBody while the transcript prompt remains the user text.

Verification:

  • corepack pnpm test src/auto-reply/reply/get-reply-run.media-only.test.ts src/auto-reply/reply.raw-body.test.ts src/gateway/server.sessions.reset-models.test.ts ui/src/ui/controllers/chat.test.ts ui/src/ui/views/chat.test.ts
    • passed: 4 Vitest shards, 140 tests
  • corepack pnpm exec tsc --noEmit --pretty false -p tsconfig.core.json
  • git diff --check

Note: the old direct UI typecheck command from the previous PR body no longer maps cleanly onto current main; test/tsconfig/tsconfig.test.ui.json currently fails before reaching this patch with repo-level rootDir errors for UI test files. Focused UI Vitest coverage passed.

@ZRAIVenture ZRAIVenture force-pushed the codex/openclaw-ui-reliability-upstream branch 3 times, most recently from e0b4b94 to 00fabcf Compare May 5, 2026 00:27
@ZRAIVenture ZRAIVenture force-pushed the codex/openclaw-ui-reliability-upstream branch from 00fabcf to 345a997 Compare May 5, 2026 13:21
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 5, 2026
@ZRAIVenture ZRAIVenture force-pushed the codex/openclaw-ui-reliability-upstream branch from 345a997 to cf60f98 Compare May 5, 2026 14:44
@ZRAIVenture ZRAIVenture force-pushed the codex/openclaw-ui-reliability-upstream branch 3 times, most recently from 9b60e92 to 60355dc Compare May 13, 2026 17:20
@ZRAIVenture ZRAIVenture force-pushed the codex/openclaw-ui-reliability-upstream branch 3 times, most recently from 4cb8cce to 939ba32 Compare May 16, 2026 14:04
@ZRAIVenture

Copy link
Copy Markdown
Author

Updated the branch again to clear the latest upstream conflicts.

Current live PR state:

  • Head: 939ba3277bfe7c44287bc6671b7a4f20af26fc0c
  • Base: 58083866d0d76d6a780682f50e2b4d54b57cb676
  • GitHub reports the PR as mergeable: MERGEABLE
  • Check status is green, with only the expected skipped backfill-pr-labels

Conflict repair details:

  • Rebased onto current main.
  • Resolved the session reset conflict by preserving both behaviors: generated topic transcript paths rotate to the new session id, and stale default transcript paths are not reused.
  • Focused verification passed locally:
    • node scripts/run-vitest.mjs run --config test/vitest/vitest.gateway.config.ts src/gateway/server.sessions.reset-models.test.ts
    • 1 test file passed, 8 tests passed
    • git diff --check origin/main...HEAD passed

Current review state:

  • Review threads are resolved or outdated.
  • The remaining blocker is still the prior CHANGES_REQUESTED review state and requested review from BunsDev.

@BunsDev could you please re-review when you have a chance? The branch should now be conflict-free and current with main.

@BunsDev

BunsDev commented May 21, 2026

Copy link
Copy Markdown
Member

@copilot resolve the merge conflicts in this pull request

@ZRAIVenture ZRAIVenture force-pushed the codex/openclaw-ui-reliability-upstream branch from 939ba32 to dfd776c Compare May 22, 2026 05:23
@ZRAIVenture

Copy link
Copy Markdown
Author

Updated the branch to clear the latest merge conflicts and CI failures.

Current live PR state after push:

  • Head: dfd776ca598c1d624968922afc660eb5a5414fd4
  • Base: e32e0f3f7f3e16dc2daefdbf9ddd1c444833ca29
  • Raw GitHub API reports mergeable: true, mergeable_state: unstable, rebaseable: true
  • Current check rollout is green for the checks GitHub has attached to this head; only expected skipped Mantis/label backfill jobs are skipped.

What changed in the repair:

  • Rebased the PR onto current main and resolved the auto-reply/system-event merge drift.
  • Gated WebChat preflight acknowledgements to actual WebChat clients so non-WebChat chat.send final broadcasts are not preceded by ephemeral ack events.
  • Restored queued system-event authority metadata so events that request sender-owner downgrade still do so while generic trusted events remain in the prompt.
  • Kept preflight compaction conservative for large-context API sessions while preserving proactive compaction for smaller active contexts.
  • Serialized the non-isolated auto-reply-reply Vitest shard and reset ACP abort test registry state so shared reply-run globals do not race across dispatch tests.

Verification passed locally on the pushed tree:

  • node scripts/run-vitest.mjs run --config test/vitest/vitest.gateway-methods.config.ts — 54 files, 845 tests passed
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply-reply.config.ts — 117 files, 1838 tests passed
  • node scripts/check-no-conflict-markers.mjs
  • git diff --check

I also rechecked PR comments/review threads. The Copilot threads are resolved/outdated; the only remaining review blocker is the prior CHANGES_REQUESTED state and requested review from @BunsDev.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@BunsDev

BunsDev commented May 28, 2026

Copy link
Copy Markdown
Member

Thanks @ZRAIVenture. I'm deferring this for now: the current head conflicts with main, my prior requested changes review is still active, and ClawSweeper still needs real WebChat proof.

Please rebase onto current main, remove the normal PR CHANGELOG.md edit unless a maintainer explicitly asks for release changelog work, add visible/redacted WebChat send-refresh-reconnect proof showing the pending user turn and ephemeral receipt behavior, then request a fresh ClawSweeper re-review and my re-review.

Current blocker proof I rechecked:

  • Head: dfd776ca598c1d624968922afc660eb5a5414fd4
  • GitHub merge state: CONFLICTING / DIRTY, rebaseable=false
  • Review decision: CHANGES_REQUESTED
  • Local merge check against fresh origin/main reports conflicts in src/agents/command/session-store.ts, src/auto-reply/reply/agent-runner-memory.ts, src/auto-reply/reply/agent-runner-utils.test.ts, src/auto-reply/reply/dispatch-from-config.acp-abort.test.ts, src/auto-reply/reply/get-reply-run.ts, src/auto-reply/reply/memory-flush.ts, src/gateway/server-methods/chat.ts, and src/gateway/server.impl.ts.

Related WebChat threads/PRs such as #74733, #83992, #80670, #85472, #45952, #77136, #76654, and #70391 overlap adjacent symptoms, but I am not marking this as a clean duplicate from the current evidence.

@openclaw-barnacle

Copy link
Copy Markdown

This assigned pull request has been automatically marked as stale after being open for 27 days.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 29, 2026
@ZRAIVenture

Copy link
Copy Markdown
Author

Closing this as stale/superseded by current main behavior. OpenClaw/WebChat is working better after the recent upstream updates, and this branch is now conflict-heavy with active CHANGES_REQUESTED plus missing browser-visible proof. If the specific pending-turn/ephemeral-receipt symptom comes back, we should open a narrower fresh PR against current main instead of continuing to rebase this broad branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui gateway Gateway runtime merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: XL stale Marked as stale due to inactivity status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: blank-template Candidate: PR template appears mostly untouched. triage: dirty-candidate Candidate: broad unrelated surfaces; may need splitting or cleanup. triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants