Skip to content

test(e2e): add OpenClaw TUI chat-correlation coverage slice#5150

Merged
jyaunches merged 18 commits into
mainfrom
e2e-migrate/test-openclaw-tui-chat-correlation
Jun 10, 2026
Merged

test(e2e): add OpenClaw TUI chat-correlation coverage slice#5150
jyaunches merged 18 commits into
mainfrom
e2e-migrate/test-openclaw-tui-chat-correlation

Conversation

@jyaunches

@jyaunches jyaunches commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

OpenClaw TUI chat-correlation coverage slice

Refs: #4347, #2603, #3145, #5098, #5126

This PR adds a focused free-standing Vitest live coverage slice for the OpenClaw TUI chat-correlation websocket regression guard.

Current scope

This is not a full legacy-migration closeout. It keeps test/e2e/test-openclaw-tui-chat-correlation.sh and the retained nightly lane in place.

Adds:

  • test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts
  • a manual openclaw-tui-chat-correlation-vitest job in .github/workflows/e2e-vitest-scenarios.yaml

The live test exercises a real cloud OpenClaw sandbox via Vitest fixture support and asserts the protocol/history subset of #2603/#3145:

  • no empty final events for submitted runs;
  • replies correlate to accepted run IDs;
  • no missing or duplicate replies;
  • final replies preserve submitted A/B/C order;
  • history user turns preserve submitted A/B/C order;
  • no missing or duplicate user turns.

Out of scope:

  • TUI-rendered pending indicators / spinner / sequence display;
  • TUI-visible tool-call success/failure status;
  • legacy shell-script deletion;
  • migration ledger / inventory state changes.

Simplicity check

  • Test shape: simple live Vitest coverage slice for the websocket protocol/history contract.
  • New shared helpers: none in this PR; it uses existing fixture support from test(e2e): simplify legacy migration tracking #5126.
  • New framework / registry / ledger: none.
  • Workflow changes: one minimal manual live job for this free-standing test.

Validation

Local:

  • npx tsc --noEmit --allowImportingTsExtensions --moduleResolution bundler --module esnext --target es2022 --types node,vitest/globals --skipLibCheck test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts
  • npx prettier --check .github/workflows/e2e-vitest-scenarios.yaml test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts
  • npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts --reporter=dot

Live local run was not completed because Docker is not running locally; CI/manual dispatch should provide live sandbox evidence.

Summary by CodeRabbit

  • Chores
    • Enhanced automated testing infrastructure to perform regression validation and verify chat functionality stability.

Migration in progress per triage plan; partial migration (#5049 pattern).
Bash wrapper test/e2e/test-openclaw-tui-chat-correlation.sh stays in place
until typed scenario at test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts
soaks; deletion is a follow-up PR.

Refs: #4347 #2603 #3145 #5098
@jyaunches jyaunches added the area: e2e End-to-end tests, nightly failures, or validation infrastructure label Jun 10, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a live Vitest regression test that provisions a cloud sandbox, executes an in-sandbox websocket repro for issues #2603/#3145, analyzes captured chat events and history for contract violations, and conditionally retries on nondeterministic zero-event boundary cases. Also adds a corresponding GitHub Actions CI job with scenario gating, Docker Hub auth retry logic, and artifact capture.

Changes

OpenClaw TUI Chat Correlation Live E2E Regression Test

Layer / File(s) Summary
E2E test types and trace analysis logic
test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts
Defines trace analyzer contract types, helpers to parse websocket payloads, main trace analysis function that validates empty finals, detects missing/duplicate/uncorrelated replies and user-turn corruption, and a deterministic zero-events retry signature detector.
E2E websocket repro driver and sandbox execution pipeline
test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts
Builds the in-sandbox Node.js websocket driver as a string that sends three chat prompts and emits an ISSUE2603_RESULT payload; adds gateway readiness verification, repro runner that uploads and executes the driver with auth token injection, and a retry wrapper that conditionally reruns with a fresh session on zero-events signature.
E2E regression test main orchestration and assertions
test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts
Wires the repro runner, trace analyzer, and sandbox provisioning into a Vitest live test that requires NVIDIA_API_KEY, verifies the pinned OpenClaw version, executes the repro with optional retry, persists scenario.json and issue2603-trace.json artifacts, asserts all trace violations, and sets a 75-minute timeout.
CI workflow job for live regression test
.github/workflows/e2e-vitest-scenarios.yaml
Adds openclaw-tui-chat-correlation-vitest GitHub Actions job that runs when inputs.scenarios is empty; includes conditional Docker Hub login with three-attempt retry fallback, Node.js 22 setup, dependency installation and CLI build, targeted vitest invocation with NVIDIA_API_KEY injection, and artifact upload with 14-day retention.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The PR introduces substantial live E2E regression logic with trace analysis, websocket driver generation, sandbox execution orchestration, nondeterministic retry handling, and contract violation detection across 603 lines of high-complexity code in the test file, plus 71 lines of medium-complexity CI workflow setup. The test file spans multiple independent functional checkpoints (trace types, analysis, driver generation, execution, orchestration) that interact intricately; the analyzer logic is dense with edge-case detection for reply correlation, missing finals, and user-turn ordering. The CI job integrates Docker auth retry logic and artifact configuration. Mixed file complexity and interdependent control flow across checkpoints require careful review.

Possibly related issues

Suggested labels

integration: openclaw

Suggested reviewers

  • cv
  • prekshivyas

Poem

🐰 A sandbox hops, websockets gleam,
Chat events flow in correlation's stream,
Zero ghosts? Retry with a fresh heartbeat,
Traces persist—at last #2603's beat! 💬✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title correctly identifies the main change: adding a new Vitest live test for OpenClaw TUI chat-correlation coverage. It is concise, specific, and directly reflects the primary objective of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-migrate/test-openclaw-tui-chat-correlation

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: openclaw-tui-chat-correlation-vitest, openclaw-tui-chat-correlation-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No merge-blocking E2E is required because the changed files are limited to E2E workflow/test coverage and do not alter installer, onboarding, sandbox lifecycle, credentials, policy, inference routing, deployment, or runtime assistant user-flow code. Running the touched live lane is useful as an optional confidence check.

Optional E2E

  • openclaw-tui-chat-correlation-vitest (high; live cloud sandbox, Docker pulls, NVIDIA_API_KEY, up to 75 minutes): Optional validation of the newly added live Vitest lane and test implementation. This directly exercises the changed workflow job and the new websocket/chat-history regression guard, but should not be merge-blocking because the PR only changes E2E test/workflow code, not runtime behavior.
  • openclaw-tui-chat-correlation-e2e (high; live cloud sandbox, Docker pulls, NVIDIA_API_KEY, up to 75 minutes): Optional parity check against the retained legacy bash lane that remains authoritative for full closeout of the OpenClaw TUI chat correlation scenario while the Vitest replacement is being introduced.

New E2E recommendations

  • None.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: e2e-scenarios-all
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • e2e-scenarios-all: The PR changes the canonical Vitest E2E scenario workflow and adds a free-standing live Vitest E2E test job. Workflow changes require the full e2e-scenarios-all fan-out rather than a targeted typed scenario dispatch.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 6 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 5 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: In-sandbox gateway startup fallback: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `ensureSandboxGatewayRunning` runs `curl .../health || (nohup openclaw gateway run --port 18789 ... & sleep 10) && curl .../health` and throws if the final check fails.
  • New secret-bearing workflow job is outside the workflow-boundary validator (tools/e2e-scenarios/workflow-boundary.mts:392): The PR adds `openclaw-tui-chat-correlation-vitest`, which consumes Docker Hub credentials and `NVIDIA_API_KEY`, but the deterministic workflow-boundary validator still only validates the matrix job and `openshell-version-pin-vitest`. Future edits that weaken this trusted-code boundary could pass support tests unless they also affect an older validated job.
    • Recommendation: Extend `validateE2eVitestScenariosWorkflowBoundary` and its negative fixture to validate `openclaw-tui-chat-correlation-vitest`, including Docker Hub secret placement, `NVIDIA_API_KEY` scoping, full-SHA actions, `persist-credentials: false`, `npm ci --ignore-scripts`, fixed Vitest path, no dispatch-input interpolation, and `include-hidden-files: false`.
    • Evidence: `.github/workflows/e2e-vitest-scenarios.yaml` adds `openclaw-tui-chat-correlation-vitest` with Docker Hub secrets in the login step and `NVIDIA_API_KEY` in the Vitest step. `tools/e2e-scenarios/workflow-boundary.mts` ends free-standing job validation with `validateOpenShellVersionPinVitestJob(errors, jobs)` and has no corresponding validator for the new job.
  • Sandbox-local OpenClaw gateway token is not explicitly redacted from shell artifacts (test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts:438): The live repro reads the gateway auth token from inside the sandbox and passes it to the in-sandbox websocket driver. Current code does not intentionally print the token, but if the driver, gateway, shell tracing, or a future failure path echoes it, persisted stdout/stderr artifacts may retain a token that the host-side secret store does not know.
    • Recommendation: Avoid exposing the token to persisted shell output, or restructure token handling so the discovered token is passed as an explicit `redactionValues` entry before any artifact can be written. Add focused regression coverage if sandbox-local secret extraction is reused.
    • Evidence: `tokenExpression` reads `/sandbox/.openclaw/openclaw.json`; the `sandbox.execShell` call for artifact `live-issue2603-repro` passes `env` and `timeoutMs`, but no explicit `redactionValues` for the discovered gateway token.
  • Copied analyzer and retry classifier lack local fast regression coverage (test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts:231): The new live file carries a private copy of the chat trace analyzer and zero-chat-event retry classifier. Existing fast tests exercise the older private copy in `test/openclaw-tui-chat-correlation.test.ts`, not this new copy, so the live file can drift and either mask real protocol failures or stop retrying only the intended observability race.
    • Recommendation: Extract the analyzer/retry classifier into a shared tested helper used by both tests, or add local fast coverage for this file proving that only zero-chat-event captures retry and that empty-final, uncorrelated-reply, duplicate-reply, out-of-order final replies, out-of-order user turns, and duplicate-user-turn traces do not retry or pass.
    • Evidence: `looksLikeEventCaptureFailure` is defined privately in the new file and gates `runLiveIssue2603ReproWithEventCaptureRetry`; the new assertions also depend on private `finalReplyOrder` and `userTurnOrder` logic while nearby fast coverage remains in the older test file.
  • Gateway startup fallback still needs a source-of-truth boundary and failure-path test (test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts:392): `ensureSandboxGatewayRunning` tolerates an unreachable in-sandbox gateway by launching `openclaw gateway run`, sleeping, and rechecking. The helper is reasonable for idempotent live setup, but it still does not explain why onboarding, sandbox lifecycle, or OpenClaw runtime cannot own gateway readiness, when this workaround can be removed, or provide focused deterministic coverage for failure behavior.
    • Recommendation: Either move gateway readiness to the onboarding/source lifecycle or document the source boundary and removal condition here. Add a focused helper test that verifies the health-check/start/recheck command shape and that nonzero recheck results report stdout/stderr without being mistaken for a successful repro.
    • Evidence: `ensureSandboxGatewayRunning` builds `curl .../health || (nohup openclaw gateway run --port 18789 ... & sleep 10) && curl .../health` and throws on nonzero exit, but only the expensive live path exercises this fallback.
  • [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 TUI indicator and visible tool-call clauses remain outside this PR's coverage (test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts:548): The PR correctly narrows this job to the websocket protocol/history slice, but the stated out-of-scope [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 clauses for TUI-rendered pending indicators and visible tool-call success/failure are not exercised here. This should not be treated as full [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 closeout.
    • Recommendation: Keep PR, workflow, and follow-up tracking language explicitly scoped to the protocol/history slice, or add/identify TUI-layer coverage for visual pending indicators and visible tool-call success/failure behavior before claiming full [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 acceptance.
    • Evidence: The live assertions cover `emptyFinalsForSubmittedRuns`, `uncorrelatedReplies`, `userTurnOrder`, `missingReplies`, `duplicateReplies`, `finalReplyOrder`, `missingUserTurns`, and `duplicateUserTurns`; file comments state that TUI rendering indicators and visible tool-call status are outside this websocket-level guard.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Workflow boundary validator rejects `openclaw-tui-chat-correlation-vitest` when `NVIDIA_API_KEY` is job-scoped or present in checkout, setup, install, build, Docker login, or upload steps.. The live sandbox path is valuable, but workflow trusted-code boundaries, copied analyzer logic, retry classification, gateway fallback behavior, and sandbox-discovered token handling need deterministic support so regressions do not require a full cloud run to detect.
  • **Runtime validation** — Workflow boundary validator rejects `openclaw-tui-chat-correlation-vitest` when `DOCKERHUB_USERNAME` or `DOCKERHUB_TOKEN` appears outside the Docker Hub login step.. The live sandbox path is valuable, but workflow trusted-code boundaries, copied analyzer logic, retry classification, gateway fallback behavior, and sandbox-discovered token handling need deterministic support so regressions do not require a full cloud run to detect.
  • **Runtime validation** — Workflow boundary validator rejects `openclaw-tui-chat-correlation-vitest` if checkout, setup-node, or upload-artifact actions are not pinned to full commit SHAs.. The live sandbox path is valuable, but workflow trusted-code boundaries, copied analyzer logic, retry classification, gateway fallback behavior, and sandbox-discovered token handling need deterministic support so regressions do not require a full cloud run to detect.
  • **Runtime validation** — Workflow boundary validator rejects `openclaw-tui-chat-correlation-vitest` if checkout omits `persist-credentials: false`, install omits `npm ci --ignore-scripts`, the Vitest path is not fixed to `test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts`, dispatch inputs are interpolated, or artifact upload sets `include-hidden-files: true`.. The live sandbox path is valuable, but workflow trusted-code boundaries, copied analyzer logic, retry classification, gateway fallback behavior, and sandbox-discovered token handling need deterministic support so regressions do not require a full cloud run to detect.
  • **Runtime validation** — OpenClaw chat analyzer rejects final replies ordered A/C/B even when each reply appears exactly once and has the expected run ID.. The live sandbox path is valuable, but workflow trusted-code boundaries, copied analyzer logic, retry classification, gateway fallback behavior, and sandbox-discovered token handling need deterministic support so regressions do not require a full cloud run to detect.
  • **Copied analyzer and retry classifier lack local fast regression coverage** — Extract the analyzer/retry classifier into a shared tested helper used by both tests, or add local fast coverage for this file proving that only zero-chat-event captures retry and that empty-final, uncorrelated-reply, duplicate-reply, out-of-order final replies, out-of-order user turns, and duplicate-user-turn traces do not retry or pass.
  • **Acceptance clause:** Out of scope: TUI-rendered pending indicators / spinner / sequence display; — add test evidence or identify existing coverage. No TUI rendering test is added; the new file explicitly says TUI rendering indicators stay out of scope.
  • **Acceptance clause:** Out of scope: TUI-visible tool-call success/failure status; — add test evidence or identify existing coverage. No visible tool-call status test is added; the prompts in the websocket driver say `Do not use tools`, and comments state visible tool-call status is outside this websocket-level guard.
Since last review details

Current findings:

  • Source-of-truth review needed: In-sandbox gateway startup fallback: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `ensureSandboxGatewayRunning` runs `curl .../health || (nohup openclaw gateway run --port 18789 ... & sleep 10) && curl .../health` and throws if the final check fails.
  • New secret-bearing workflow job is outside the workflow-boundary validator (tools/e2e-scenarios/workflow-boundary.mts:392): The PR adds `openclaw-tui-chat-correlation-vitest`, which consumes Docker Hub credentials and `NVIDIA_API_KEY`, but the deterministic workflow-boundary validator still only validates the matrix job and `openshell-version-pin-vitest`. Future edits that weaken this trusted-code boundary could pass support tests unless they also affect an older validated job.
    • Recommendation: Extend `validateE2eVitestScenariosWorkflowBoundary` and its negative fixture to validate `openclaw-tui-chat-correlation-vitest`, including Docker Hub secret placement, `NVIDIA_API_KEY` scoping, full-SHA actions, `persist-credentials: false`, `npm ci --ignore-scripts`, fixed Vitest path, no dispatch-input interpolation, and `include-hidden-files: false`.
    • Evidence: `.github/workflows/e2e-vitest-scenarios.yaml` adds `openclaw-tui-chat-correlation-vitest` with Docker Hub secrets in the login step and `NVIDIA_API_KEY` in the Vitest step. `tools/e2e-scenarios/workflow-boundary.mts` ends free-standing job validation with `validateOpenShellVersionPinVitestJob(errors, jobs)` and has no corresponding validator for the new job.
  • Sandbox-local OpenClaw gateway token is not explicitly redacted from shell artifacts (test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts:438): The live repro reads the gateway auth token from inside the sandbox and passes it to the in-sandbox websocket driver. Current code does not intentionally print the token, but if the driver, gateway, shell tracing, or a future failure path echoes it, persisted stdout/stderr artifacts may retain a token that the host-side secret store does not know.
    • Recommendation: Avoid exposing the token to persisted shell output, or restructure token handling so the discovered token is passed as an explicit `redactionValues` entry before any artifact can be written. Add focused regression coverage if sandbox-local secret extraction is reused.
    • Evidence: `tokenExpression` reads `/sandbox/.openclaw/openclaw.json`; the `sandbox.execShell` call for artifact `live-issue2603-repro` passes `env` and `timeoutMs`, but no explicit `redactionValues` for the discovered gateway token.
  • Copied analyzer and retry classifier lack local fast regression coverage (test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts:231): The new live file carries a private copy of the chat trace analyzer and zero-chat-event retry classifier. Existing fast tests exercise the older private copy in `test/openclaw-tui-chat-correlation.test.ts`, not this new copy, so the live file can drift and either mask real protocol failures or stop retrying only the intended observability race.
    • Recommendation: Extract the analyzer/retry classifier into a shared tested helper used by both tests, or add local fast coverage for this file proving that only zero-chat-event captures retry and that empty-final, uncorrelated-reply, duplicate-reply, out-of-order final replies, out-of-order user turns, and duplicate-user-turn traces do not retry or pass.
    • Evidence: `looksLikeEventCaptureFailure` is defined privately in the new file and gates `runLiveIssue2603ReproWithEventCaptureRetry`; the new assertions also depend on private `finalReplyOrder` and `userTurnOrder` logic while nearby fast coverage remains in the older test file.
  • Gateway startup fallback still needs a source-of-truth boundary and failure-path test (test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts:392): `ensureSandboxGatewayRunning` tolerates an unreachable in-sandbox gateway by launching `openclaw gateway run`, sleeping, and rechecking. The helper is reasonable for idempotent live setup, but it still does not explain why onboarding, sandbox lifecycle, or OpenClaw runtime cannot own gateway readiness, when this workaround can be removed, or provide focused deterministic coverage for failure behavior.
    • Recommendation: Either move gateway readiness to the onboarding/source lifecycle or document the source boundary and removal condition here. Add a focused helper test that verifies the health-check/start/recheck command shape and that nonzero recheck results report stdout/stderr without being mistaken for a successful repro.
    • Evidence: `ensureSandboxGatewayRunning` builds `curl .../health || (nohup openclaw gateway run --port 18789 ... & sleep 10) && curl .../health` and throws on nonzero exit, but only the expensive live path exercises this fallback.
  • [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 TUI indicator and visible tool-call clauses remain outside this PR's coverage (test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts:548): The PR correctly narrows this job to the websocket protocol/history slice, but the stated out-of-scope [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 clauses for TUI-rendered pending indicators and visible tool-call success/failure are not exercised here. This should not be treated as full [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 closeout.
    • Recommendation: Keep PR, workflow, and follow-up tracking language explicitly scoped to the protocol/history slice, or add/identify TUI-layer coverage for visual pending indicators and visible tool-call success/failure behavior before claiming full [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 acceptance.
    • Evidence: The live assertions cover `emptyFinalsForSubmittedRuns`, `uncorrelatedReplies`, `userTurnOrder`, `missingReplies`, `duplicateReplies`, `finalReplyOrder`, `missingUserTurns`, and `duplicateUserTurns`; file comments state that TUI rendering indicators and visible tool-call status are outside this websocket-level guard.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Thanks for writing this up, but this is going in the opposite direction from the post-#5106 / #4941 decision.

This PR only edits test/e2e-scenario/migration/legacy-inventory.json, and the body proposes inventory status transitions, a bridge-probe/soak phase, and keeping the bash openclaw-tui-chat-correlation-e2e lane alive. That preserves the stale ledger and a parallel E2E path, which is exactly what we are trying to avoid now.

Concrete asks:

  • Please do not update legacy-inventory.json; test(e2e): simplify legacy migration tracking #5126 removes that ledger. Migration evidence should live in the PR and linked GitHub issue comments.
  • Please avoid bridge-probe / soak / inventory-transition framing. The target is one Vitest-driven E2E system, not a staged second roadmap.
  • Please retarget/stack any implementation work on codex/e2e-simplify-migration-tracking / test(e2e): simplify legacy migration tracking #5126 rather than main.
  • When implementing this slice, replace the existing bash-backed workflow lane with direct npx vitest run --project e2e-scenarios-live ..., upload fixture artifacts, and delete the migrated test/e2e/*.sh entry point in the same focused PR.

A planning-only inventory flip is not useful in the new direction. A focused migration PR for this coverage would be welcome if it follows the same shape as #5131/#5132/#5133/#5137/#5138/#5139/#5141/#5142/#5143/#5144/#5145/#5146/#5149.

The openclaw-tui-chat-correlation migration (#5098 epic, #4347 owner) needs
to upload a websocket repro driver into a cloud OpenClaw sandbox and run a
multi-line shell pipeline that reads the gateway auth token and dispatches
the driver. The legacy bash test does this via raw `openshell sandbox
upload` and `openshell sandbox exec --name X -- sh -lc "..."` calls.

Adds two typed wrappers on SandboxClient:
- upload(name, localPath, remotePath) — wraps `openshell sandbox upload`
- execShell(name, script, options) — wraps `openshell sandbox exec X -- sh -lc <script>`

Both share the existing validateSandboxName guard that rejects flag-shaped
names before command construction. Six new unit cases in e2e-clients.test.ts
cover argv shape, options propagation, and sandbox-name validation for both
helpers.

Refs: #4347 #5098
…+ workflow job

Ports the live block of test/openclaw-tui-chat-correlation.test.ts into the
typed scenario framework as a free-standing live test (#5049 pattern), plus
a discrete openclaw-tui-chat-correlation-vitest job in
e2e-vitest-scenarios.yaml modeled on #5107's openshell-version-pin-vitest.

Coverage:
- Onboards a fresh cloud OpenClaw sandbox via OnboardingPhaseFixture.from
  with the cloud-openclaw profile (already in SUPPORTED_ONBOARDING).
- Asserts the pinned 2026.5.27 OpenClaw version (host-side probe via
  openshell sandbox exec ... openclaw --version). Override via
  E2E_OPENCLAW_TUI_CORRELATION_PINNED_VERSION when bumping.
- Idempotent in-sandbox gateway start on port 18789 via SandboxClient.execShell.
- Uploads the websocket repro driver via SandboxClient.upload, runs it via
  execShell with the gateway auth token read from /sandbox/.openclaw/openclaw.json.
- Asserts the #2603 + #3145 contract: no empty finals for submitted runs,
  no uncorrelated replies, no missing/duplicate replies, no missing/duplicate
  user turns. Preserves the looksLikeEventCaptureFailure retry-once guard
  (OpenClaw-side observability race; remove when the runtime exposes a
  deterministic chat readiness ack).

Inventory: bridge-probe entry for test/e2e/test-openclaw-tui-chat-correlation.sh
now references this file as its bridge probe (satisfies the inventory invariant
that bridge-probe entries have non-empty bridgeProbes pointing at real paths).

Bash workflow (nightly-e2e.yaml openclaw-tui-chat-correlation-e2e job)
remains untouched per the migration freeze policy; deletion is a follow-up
PR with #4357 approval after typed scenario soaks.

Refs: #4347 #2603 #3145 #5098 #5049 #5107
First-pass dispatch (run 27284249981) failed at the version-pin assertion
with `spawn openshell ENOENT`. Cloud onboarding (`OnboardingPhaseFixture.from`)
succeeded, but `sandbox.exec(instance.sandboxName, ["openclaw", "--version"])`
spawned with an empty env because no env was passed: `ShellProbe.run`
defaults to `env: { ...(options.env ?? {}) }` (no inherit). Without PATH
the runner can't resolve openshell at ~/.local/bin.

Pass `env: buildAvailabilityProbeEnv()` to every sandbox.* call (exec,
execShell, upload). Mirrors the convention used by every phase fixture
(state-validation.ts, runtime.ts, lifecycle.ts). Documents the convention
inline so the next free-standing live test author follows it.

Refs: #4347 #2603 #3145
…tional

Pass 2 dispatch (run 27285642504) failed at the openclaw-version-pinned
assertion with exit 127 from `openshell sandbox exec <name> -- openclaw --version`.
The matrix `live-scenarios (ubuntu-repo-cloud-openclaw)` job ran 1 of 24
tests (mostly skipped) and the one that ran did not exercise sandbox.exec,
so this latent bug was not visible until a free-standing live test invoked
sandbox.exec against a real openshell binary.

Production code uniformly uses `-n <name>`:
- src/lib/agent/onboard.ts:276,335
- src/lib/onboard/initial-policy.ts:112,118,124
- src/lib/onboard/web-search-verify.ts:77,104
- src/lib/onboard/sandbox-verification-exec.ts:24
- src/lib/onboard/compatible-endpoint-smoke.ts:144
- src/lib/onboard/docker-gpu-supervisor-reconnect.ts:126
- src/lib/status-command-deps.ts:50

Legacy bash tests (test-rebuild-openclaw.sh, test-openclaw-tui-chat-correlation.sh)
and the existing test/openclaw-tui-chat-correlation.test.ts all use the
equivalent `--name <name>` long form. Positional name is rejected by the
real openshell binary as a mis-parsed argument and surfaces as exit 127
("command not found" semantic).

Switch SandboxClient.exec and execShell to `-n` form. SandboxClient.upload
stays positional — production code and the existing live test agree on
positional for `sandbox upload`. Updates the unit-test argv assertions
in e2e-clients, e2e-phase-runtime, and e2e-phase-state-validation; shifts
the chat-completion data-raw payload index from args[11] to args[12].

Refs: #4347 #2603 #3145 #5098
Two consecutive PASS dispatches on the openclaw-tui-chat-correlation-vitest
job confirm convergence:
- Pass 3: run 27286491062 \xe2\x86\x92 SUCCESS (369.41s)
- Pass 4: run 27287442227 \xe2\x86\x92 SUCCESS (358.64s, verification dispatch)

Both runs exercised the full 11-step assertion chain end-to-end against a
real cloud OpenClaw sandbox: cloud onboarding \xe2\x86\x92 openclaw-version-pinned
\xe2\x86\x92 sandbox-gateway-reachable \xe2\x86\x92 live-repro-emitted-result \xe2\x86\x92 6 chat-
correlation assertions (#2603 + #3145 contract).

Inventory transitions:
- status: bridge-probe \xe2\x86\x92 covered
- targetVitestScenarios: [test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts]
- bridgeProbes: cleared
- deletionReady: false (legacy bash wrapper retained until soak; deletion
  is a follow-up PR with #4357 approval per the migration freeze policy)

Refs: #4347 #2603 #3145 #5098
@jyaunches jyaunches marked this pull request as ready for review June 10, 2026 15:49
@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

#5126 is now green, so this draft should rebase/reshape before it moves toward review. It still targets main, edits legacy-inventory.json, uses framework-tests/, and describes a bridge/inventory transition that conflicts with the one Vitest E2E system direction. The TUI chat-correlation coverage may be a good next “small direct wrapper” slice, but it should land on the #5126 foundation without the repo-local ledger and with fixture/support wording instead of framework/runner framing. If the script is deleted in the same PR, include the #5126 Legacy E2E deletion evidence block in the PR body.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 215-286: The job openclaw-tui-chat-correlation-vitest currently
always runs even when a manual dispatch filtered by inputs.scenarios should skip
it; add a job-level conditional to respect the scenarios selector by adding an
if that allows the job when not a manual dispatch OR when
github.event.inputs.scenarios contains "openclaw-tui-chat-correlation" (e.g. if:
${{ github.event_name != 'workflow_dispatch' ||
contains(github.event.inputs.scenarios, 'openclaw-tui-chat-correlation') }}), so
the job only executes for the intended manual scenario or non-dispatch events.

In `@test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts`:
- Line 256: The test hardcodes port 18789 in the WebSocket URL when creating the
client (new WebSocket(...)), which will break if SANDBOX_GATEWAY_PORT changes;
update the WebSocket construction in openclaw-tui-chat-correlation.test.ts to
use the SANDBOX_GATEWAY_PORT constant instead of 18789 (e.g., interpolate
SANDBOX_GATEWAY_PORT into the "ws://127.0.0.1:<port>/ws" URL or pass it into the
script builder so new WebSocket uses the constant), ensuring the WebSocket
instantiation references SANDBOX_GATEWAY_PORT.

In `@test/e2e-scenario/migration/legacy-inventory.json`:
- Around line 581-588: The row was prematurely marked as covered—revert the
"status" field from "covered" back to its prior state (e.g., "migrating" or the
original value) and remove or clear the "targetVitestScenarios" entry until the
focused migration PR lands; also restore or update the
"notes"/"retiredReason"/"deletionReady" fields to reflect that the bash-backed
nightly lane is still retained and that migration evidence remains in PR `#5150`,
making sure to change the JSON keys "status", "targetVitestScenarios", and
"notes" in this object rather than leaving it marked covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8fbf1250-a326-4cdd-b5b8-89c0d09e5973

📥 Commits

Reviewing files that changed from the base of the PR and between 5f60ed6 and 6402c4e.

📒 Files selected for processing (7)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/framework-tests/e2e-clients.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
  • test/e2e-scenario/framework/clients/sandbox.ts
  • test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts
  • test/e2e-scenario/migration/legacy-inventory.json

Comment thread .github/workflows/e2e-vitest-scenarios.yaml
Comment thread test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts Outdated
Comment thread test/e2e-scenario/migration/legacy-inventory.json Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts (1)

525-530: ⚡ Quick win

Persist the full repro payload in issue2603-trace.json.

This artifact currently drops the actual events/history/order data that explains a failure, so post-mortems still depend on console output or a rerun. For a 75-minute live probe, saving repro and the derived analysis is a much better debugging default.

♻️ Proposed change
-    await artifacts.writeJson("issue2603-trace.json", {
-      sentRuns: repro.sentRuns,
-      eventCount: repro.events?.length ?? 0,
-      attempts: attempts.length,
-      error: repro.error,
-    });
+    await artifacts.writeJson("issue2603-trace.json", {
+      repro,
+      attempts,
+      analysis: repro.error ? undefined : analyzeIssue2603Trace(repro),
+    });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts` around lines
525 - 530, The artifact write currently only writes selected fields (sentRuns,
eventCount, attempts, error); change the call to persist the full repro payload
and derived analysis by writing the entire repro object and the analysis object
to "issue2603-trace.json" (e.g., include repro and analysis alongside attempts
and any metadata) so the file contains full events/history/order data; update
the artifacts.writeJson invocation that creates "issue2603-trace.json" to
serialize repro and analysis (and keep attempts/errors if desired).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts`:
- Around line 525-530: The artifact write currently only writes selected fields
(sentRuns, eventCount, attempts, error); change the call to persist the full
repro payload and derived analysis by writing the entire repro object and the
analysis object to "issue2603-trace.json" (e.g., include repro and analysis
alongside attempts and any metadata) so the file contains full
events/history/order data; update the artifacts.writeJson invocation that
creates "issue2603-trace.json" to serialize repro and analysis (and keep
attempts/errors if desired).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 88343285-11b4-4374-a113-ef05f31db27d

📥 Commits

Reviewing files that changed from the base of the PR and between 6402c4e and 7fc804d.

📒 Files selected for processing (1)
  • test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts

@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Update after #5126 moved the foundation to the fixtures namespace and picked up the sandbox helper prerequisite: this remains a useful salvage candidate, but the reshaped PR should target #5126, drop legacy-inventory.json, avoid framework-tests/, and import shared helpers from test/e2e-scenario/fixtures/. The shared SandboxClient now uses the named sandbox exec -n <name> form, exposes execShell/upload, requires trustedSandboxShellScript() for shell-string execution, and rejects flag-shaped upload paths. The salvaged PR should keep only the OpenClaw TUI chat-correlation scenario logic. If it deletes the legacy shell script, update the frozen legacy-script allowlist and any workflow wiring/contract tests in the same PR.

@jyaunches

jyaunches commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Migration-principle review for #5098: this PR has useful coverage, but it is drifting larger than “simple Vitest test, not new framework.”

Reference from test/e2e-scenario/docs/MIGRATION.md: Vitest is the harness; fixtures are support code; add only the fixture/helper needed for the migration; GitHub PRs/issues are source of truth; shell/system boundaries can be invoked directly from Vitest.

Concerns in this PR:

  • The live test is a salvage/protocol slice, not a full legacy closeout, so the title/body should avoid implying full migration.
  • Adds shared SandboxClient.execShell/upload helpers for what may be one test’s implementation detail.
  • Adds a dedicated workflow-boundary validator for one job; that feels like framework/policy expansion unless there is a required security invariant not covered by existing tests.
  • The 500+ line live test embeds a fairly large harness; some of that may be unavoidable, but it should stay local and clearly contract-focused.

Recommended path:

Other acceptable options:

  1. Split shared helper: if upload / trusted multiline sandbox exec are needed by multiple migrations, land them as a tiny fixture-support PR with tests and explicit downstream users, then rebase this PR to only add the TUI test.
  2. Full retirement later: after the coverage slice is complete and any missing TUI-rendered behavior is also covered, do a separate PR that deletes test/e2e/test-openclaw-tui-chat-correlation.sh, removes workflow wiring if any, and updates allowlists.

Suggested immediate changes:

  • Rename title/scope to “add live Vitest coverage for OpenClaw TUI chat-correlation slice.”
  • Drop stale ledger/bridge framing.
  • Inline or split the sandbox helper additions.
  • Remove or strongly justify the one-off workflow-boundary validator.
  • Add a Simplicity check section:
    • Test shape: simple live Vitest coverage slice.
    • New shared helpers: none, or split/justified.
    • New framework/registry/ledger: none.
    • Workflow changes: only the minimal dispatch job if truly needed.

@prekshivyas prekshivyas self-assigned this Jun 10, 2026
@jyaunches jyaunches changed the title test(e2e): migrate test-openclaw-tui-chat-correlation to free-standing Vitest live test test(e2e): add OpenClaw TUI chat-correlation coverage slice Jun 10, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts (2)

363-364: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when chat.send omits runId.

Line 364 falls back to idempotencyKey, which turns a missing accepted-run-id contract break into a misleading correlation failure later. Treat a missing/empty response.runId as the repro failure instead of fabricating the run ID.

Proposed fix
-      const response = await request("chat.send", { sessionKey, message, deliver: false, timeoutMs: 90_000, idempotencyKey });
-      sentRuns.push({ promptToken, replyToken, message, runId: response.runId ?? idempotencyKey });
+      const response = await request("chat.send", {
+        sessionKey,
+        message,
+        deliver: false,
+        timeoutMs: 90_000,
+        idempotencyKey,
+      });
+      if (typeof response.runId !== "string" || response.runId.length === 0) {
+        throw new Error(`chat.send did not return a runId for ${promptToken}`);
+      }
+      sentRuns.push({ promptToken, replyToken, message, runId: response.runId });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts` around lines
363 - 364, The test currently masks a contract breach by using idempotencyKey
when response.runId is missing; update the chat.send call handling to fail fast:
after awaiting request("chat.send") inspect response.runId and if it is
missing/empty throw or assert (include response in the error for debugging)
instead of falling back to idempotencyKey, and only push into sentRuns with the
real runId when present; reference the local variables response,
request("chat.send"), sentRuns, idempotencyKey and runId in your change.

541-546: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the pinned-version check exact.

toContain(EXPECTED_OPENCLAW_VERSION) still passes for outputs like 2026.5.270 or other strings that merely include the pinned value. Since this version gate defines whether the rest of the test is meaningful, compare the parsed version token exactly.

Proposed fix
-    expect(
-      versionResult.stdout,
-      `expected fresh sandbox to run OpenClaw ${EXPECTED_OPENCLAW_VERSION}; ` +
-        `update E2E_OPENCLAW_TUI_CORRELATION_PINNED_VERSION when bumping. ` +
-        `actual: ${versionResult.stdout}`,
-    ).toContain(EXPECTED_OPENCLAW_VERSION);
+    const actualVersion = versionResult.stdout.trim().split(/\s+/).at(-1);
+    expect(
+      actualVersion,
+      `expected fresh sandbox to run OpenClaw ${EXPECTED_OPENCLAW_VERSION}; ` +
+        `update E2E_OPENCLAW_TUI_CORRELATION_PINNED_VERSION when bumping. ` +
+        `actual: ${versionResult.stdout}`,
+    ).toBe(EXPECTED_OPENCLAW_VERSION);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts` around lines
541 - 546, The current assertion uses toContain and can match substrings; change
it to extract the actual version token from versionResult.stdout (e.g., via a
regex that captures the version number token) and assert that the parsed token
equals EXPECTED_OPENCLAW_VERSION (use toBe/toEqual or an exact string
comparison). Locate the assertion referencing versionResult and
EXPECTED_OPENCLAW_VERSION in openclaw-tui-chat-correlation.test.ts, replace the
toContain check with parsing logic that finds the version token (e.g.
/\b\d+\.\d+\.\d+\b/ or similar) and compare that token exactly to
EXPECTED_OPENCLAW_VERSION, failing the test if no token is found.
♻️ Duplicate comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)

214-214: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Incomplete scenario selector prevents explicit dispatch of this test.

The condition only runs this job when scenarios is empty, but doesn't handle the case when a user explicitly dispatches scenarios='openclaw-tui-chat-correlation'. A past review comment suggested:

if: ${{ inputs.scenarios == '' || contains(format(',{0},', inputs.scenarios), ',openclaw-tui-chat-correlation,') }}

The current implementation prevents users from running only this scenario via filtered dispatch.

🔧 Suggested fix
-    if: ${{ inputs.scenarios == '' }}
+    if: ${{ inputs.scenarios == '' || contains(format(',{0},', inputs.scenarios), ',openclaw-tui-chat-correlation,') }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml at line 214, The job's if
condition only runs when inputs.scenarios is empty, preventing explicit dispatch
of the "openclaw-tui-chat-correlation" scenario; update the conditional on the
job (the existing if using inputs.scenarios) to also allow when inputs.scenarios
explicitly includes "openclaw-tui-chat-correlation" by using a contains-style
check (e.g., normalize with commas and test for
',openclaw-tui-chat-correlation,') so the job runs either when inputs.scenarios
== '' or when that scenario is present in inputs.scenarios.
🧹 Nitpick comments (2)
.github/workflows/e2e-vitest-scenarios.yaml (1)

269-269: 💤 Low value

Clean up command formatting for readability.

The excessive spaces between arguments are unusual and reduce readability. Consider either keeping it on one line or using proper YAML multiline formatting.

♻️ Suggested formatting
-          npx vitest run --project e2e-scenarios-live             test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts             --silent=false --reporter=default
+          npx vitest run --project e2e-scenarios-live \
+            test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts \
+            --silent=false --reporter=default
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml at line 269, The command line in
the workflow step is spaced unusually and hurts readability; replace the heavily
spaced single-line invocation (the line containing "npx vitest run --project
e2e-scenarios-live test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts
--silent=false --reporter=default") with a clean single-line command (collapse
extra spaces) or convert it to a YAML multiline scalar (|) with proper
indentation so each argument is readable; update the step that runs "npx vitest
run" to use one of these formats to remove the extraneous spacing.
test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts (1)

555-560: ⚡ Quick win

Persist the actual trace payload in issue2603-trace.json.

Right now the uploaded artifact only keeps counts, so a correlation/order/history failure from this 75-minute live job is not reconstructable from the artifact itself. Persist events, historyMessages, and the per-attempt payloads (or the computed analysis) here.

Based on PR objectives, this workflow is supposed to upload live trace artifacts for this regression slice.

Possible shape
-    await artifacts.writeJson("issue2603-trace.json", {
-      sentRuns: repro.sentRuns,
-      eventCount: repro.events?.length ?? 0,
-      attempts: attempts.length,
-      error: repro.error,
-    });
+    await artifacts.writeJson("issue2603-trace.json", {
+      repro,
+      attempts,
+    });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts` around lines
555 - 560, The artifact currently written by artifacts.writeJson only contains
counts and error; instead include the full trace payload so the failure can be
reconstructed: expand the object passed to artifacts.writeJson (the call near
repro.sentRuns/repro.events) to persist repro.events, repro.historyMessages (if
present), and a per-attempt payload (e.g., attempts map or computed analysis)
alongside sentRuns, eventCount, attempts.length and error; ensure you reference
and serialize repro.events, repro.historyMessages and attempts (or a derived
analysis object) so the uploaded issue2603-trace.json contains the complete
trace, history and attempt-level data needed to replay the regression.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts`:
- Around line 363-364: The test currently masks a contract breach by using
idempotencyKey when response.runId is missing; update the chat.send call
handling to fail fast: after awaiting request("chat.send") inspect
response.runId and if it is missing/empty throw or assert (include response in
the error for debugging) instead of falling back to idempotencyKey, and only
push into sentRuns with the real runId when present; reference the local
variables response, request("chat.send"), sentRuns, idempotencyKey and runId in
your change.
- Around line 541-546: The current assertion uses toContain and can match
substrings; change it to extract the actual version token from
versionResult.stdout (e.g., via a regex that captures the version number token)
and assert that the parsed token equals EXPECTED_OPENCLAW_VERSION (use
toBe/toEqual or an exact string comparison). Locate the assertion referencing
versionResult and EXPECTED_OPENCLAW_VERSION in
openclaw-tui-chat-correlation.test.ts, replace the toContain check with parsing
logic that finds the version token (e.g. /\b\d+\.\d+\.\d+\b/ or similar) and
compare that token exactly to EXPECTED_OPENCLAW_VERSION, failing the test if no
token is found.

---

Duplicate comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Line 214: The job's if condition only runs when inputs.scenarios is empty,
preventing explicit dispatch of the "openclaw-tui-chat-correlation" scenario;
update the conditional on the job (the existing if using inputs.scenarios) to
also allow when inputs.scenarios explicitly includes
"openclaw-tui-chat-correlation" by using a contains-style check (e.g., normalize
with commas and test for ',openclaw-tui-chat-correlation,') so the job runs
either when inputs.scenarios == '' or when that scenario is present in
inputs.scenarios.

---

Nitpick comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Line 269: The command line in the workflow step is spaced unusually and hurts
readability; replace the heavily spaced single-line invocation (the line
containing "npx vitest run --project e2e-scenarios-live
test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts --silent=false
--reporter=default") with a clean single-line command (collapse extra spaces) or
convert it to a YAML multiline scalar (|) with proper indentation so each
argument is readable; update the step that runs "npx vitest run" to use one of
these formats to remove the extraneous spacing.

In `@test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts`:
- Around line 555-560: The artifact currently written by artifacts.writeJson
only contains counts and error; instead include the full trace payload so the
failure can be reconstructed: expand the object passed to artifacts.writeJson
(the call near repro.sentRuns/repro.events) to persist repro.events,
repro.historyMessages (if present), and a per-attempt payload (e.g., attempts
map or computed analysis) alongside sentRuns, eventCount, attempts.length and
error; ensure you reference and serialize repro.events, repro.historyMessages
and attempts (or a derived analysis object) so the uploaded issue2603-trace.json
contains the complete trace, history and attempt-level data needed to replay the
regression.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ff42bb0a-cafb-48ac-a08b-d0a818b3a65a

📥 Commits

Reviewing files that changed from the base of the PR and between eae9fb1 and 233e3b7.

📒 Files selected for processing (2)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/openclaw-tui-chat-correlation.test.ts

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed (code + 9-cat security). Migrates the openclaw TUI chat-correlation E2E (#2603/#3145) to a free-standing live Vitest test, adds a manual CI job + execShell/upload helpers, and fixes exec to pass the -n name flag.

Approve — coverage is a strict superset (all 6 legacy correlation checks + 2 new ordering checks), and the load-bearing -n fix is verified correct: production uniformly uses -n/--name (agent/onboard.ts, initial-policy.ts, web-search-verify.ts), so the old positional name was a latent bug; the 4 unit-test argv updates + the args[11]→args[12] index shift are consistent.

Security: all 9 pass — secrets scoped per-step and machine-enforced by the new workflow-boundary.mts validator (no job-level secret env), actions SHA-pinned, persist-credentials:false, loopback-only websocket, sandbox names validated (rejects flag-shaped names), argv arrays (no shell injection), --ignore-scripts on npm ci. Nits: analyzer types are duplicated rather than imported (documented lockstep cost); the event-capture retry is tightly gated so it can't mask a real correlation/ordering failure.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27306425374
Target ref: 233e3b72cf1d89399bb1bdf5eb941c71defa64f5
Workflow ref: main
Requested jobs: openclaw-tui-chat-correlation-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
openclaw-tui-chat-correlation-e2e ✅ success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants