You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cloud-experimental-e2e is a 931-line monolith (plus 1,285 lines of helper scripts) that bundles 10 phases testing fundamentally different things: public installer, Landlock enforcement, API key leak detection, live cloud inference, skill injection, agent verification, TUI smoke, and documentation validation. Phases 5d and 5e fail intermittently, and Phase 5f has been permanently skipped in CI.
This means:
A flaky skill-agent turn (Phase 5d) blocks the signal from a Landlock regression
A TUI timing issue (Phase 5e) blocks the signal from an inference routing break
The entire 90-minute timeout applies to everything, including phases that take 2 minutes
LLM non-determinism — Cloud model must return exact token SKILL_SMOKE_VERIFY_K9X2. Models sometimes paraphrase, add quotes, or wrap in explanation. grep -Fq match is strict.
Session lock contention — Stale .jsonl.lock files from crashed runs.
Gateway transient errors — 503s from cloud API cause hard failure.
Timeout pressure — 180s timeout tight when cloud is slow.
Phase 5e — OpenClaw TUI smoke
Timing sensitivity — Hardcoded sleeps (3s/28s/6s/3s) break when model latency varies.
TUI rendering non-determinism — "press ctrl+c again to exit" banner appears unpredictably.
PTY buffering — Prompt regex misses when buffer ends with box-drawing/ANSI.
Fundamental unsuitability — 150 lines of expect with 7+ tunable timeout env vars.
Problem
cloud-experimental-e2eis a 931-line monolith (plus 1,285 lines of helper scripts) that bundles 10 phases testing fundamentally different things: public installer, Landlock enforcement, API key leak detection, live cloud inference, skill injection, agent verification, TUI smoke, and documentation validation. Phases 5d and 5e fail intermittently, and Phase 5f has been permanently skipped in CI.This means:
Current Phase Map
Why Failing Phases Fail
Phase 5d — Skill agent verification
SKILL_SMOKE_VERIFY_K9X2. Models sometimes paraphrase, add quotes, or wrap in explanation.grep -Fqmatch is strict..jsonl.lockfiles from crashed runs.Phase 5e — OpenClaw TUI smoke
Phase 5f — Documentation checks
Proposed Split: 4 Focused Tests
1.
test-cloud-onboard-e2e.sh— Install + Sandbox Health ✅2.
test-cloud-inference-e2e.sh— Live Cloud Inference ✅3.⚠️ →✅
test-skill-agent-e2e.sh— Skill Injection + Agent Verification4.⚠️ →✅
test-docs-validation.sh— Documentation ChecksPhase 5e (TUI smoke): Drop from nightly
continue-on-error: trueProposed Pipeline
Each test does its own install for full independence — no cascade failures.
Related