Conversation
|
Codex review: needs real behavior proof before merge. Summary Reproducibility: yes. for the blocking classes by source inspection: legacy transcript import replaces all session rows, and several doctor importers overwrite live SQLite state without merge or recency checks. I did not run a live populated-install upgrade in this read-only review. Real behavior proof Next step before merge Security Review findings
Review detailsBest possible solution: Keep the database-first storage direction under maintainer/security review, make legacy imports merge-safe and source-preserving, settle the Node support decision, and require latest-head real upgrade proof before merge. Do we have a high-confidence way to reproduce the issue? Yes, for the blocking classes by source inspection: legacy transcript import replaces all session rows, and several doctor importers overwrite live SQLite state without merge or recency checks. I did not run a live populated-install upgrade in this read-only review. Is this the best way to solve the issue? No, not as-is. The architectural direction is plausible, but the current patch needs merge-safe migration semantics, a deliberate Node support decision, and latest-head real behavior proof before it is the maintainable solution. Full review comments:
Overall correctness: patch is incorrect Security concerns:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 115049753d59. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c536838794
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!isSqliteSessionStoreBackendEnabled(env)) { | ||
| return null; | ||
| } | ||
| const agentId = resolveAgentIdFromSessionStorePath(storePath) ?? DEFAULT_AGENT_ID; |
There was a problem hiding this comment.
Derive SQLite agent scope from configured store path
When session.store uses a non-canonical path (for example a custom template not shaped like .../agents/<id>/sessions/sessions.json), this code silently falls back to DEFAULT_AGENT_ID. After this commit moved runtime session reads/writes to SQLite, that fallback causes all such stores to be loaded/saved under the main SQLite partition, so non-main agent data can be mixed into the wrong store scope and agent-specific queries can return incorrect sessions.
Useful? React with 👍 / 👎.
| const sessionsTargetDir = path.join(stateDir, "agents", targetAgentId, "sessions"); | ||
| const sessionsTargetStorePath = path.join(sessionsTargetDir, "sessions.json"); | ||
| const legacySessionEntries = safeReadDir(sessionsLegacyDir); | ||
| const hasTargetJsonSessionStore = includeSessions && fileExists(sessionsTargetStorePath); |
There was a problem hiding this comment.
Detect legacy session JSON using session.store override
Doctor legacy migration detection is hardcoded to $STATE_DIR/agents/<defaultAgent>/sessions/sessions.json, so installs that previously persisted session metadata at a custom session.store path are skipped during openclaw doctor --fix. Because this commit removed runtime JSON session-store fallback, those users can lose visible session history after upgrading until they manually import from the custom path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c567352a42
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| async function withTranscriptAppendQueue<T>(queueKey: string, fn: () => Promise<T>): Promise<T> { | ||
| const previous = transcriptAppendQueues.get(queueKey) ?? Promise.resolve(); | ||
| let releaseCurrent!: () => void; | ||
| const current = new Promise<void>((resolve) => { | ||
| releaseCurrent = resolve; | ||
| }); | ||
| const tail = previous.catch(() => undefined).then(() => current); | ||
| transcriptAppendQueues.set(queueKey, tail); | ||
| await previous.catch(() => undefined); |
There was a problem hiding this comment.
Restore cross-process locking for transcript appends
appendSessionTranscriptMessage now serializes writes only through the in-memory transcriptAppendQueues map, so concurrent writers from different Node processes are no longer coordinated. When a gateway process and another process (e.g., CLI/worker) append to the same agentId/sessionId at the same time, both can read the same pre-append event list and then insert independently, which can duplicate idempotent messages and produce conflicting parentId chains in SQLite transcripts.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eb5d4f6b0b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| writePairingStateRecord({ | ||
| baseDir, | ||
| subdir: "devices", | ||
| key: "bootstrap", | ||
| value: parsed as DeviceBootstrapState, |
There was a problem hiding this comment.
Skip stale device-bootstrap import when SQLite state exists
This migration always writes the legacy devices/bootstrap.json snapshot into pairing.devices/bootstrap and then removes the source file. Bootstrap tokens are live runtime state (issued/revoked/cleared via persistState in src/infra/device-bootstrap.ts), so a delayed openclaw doctor --fix can overwrite newer token/profile data with stale file-era content and make that rollback durable after deletion.
Useful? React with 👍 / 👎.
| writeSubagentRegistryRunsSnapshot(runs, env); | ||
| try { | ||
| fs.unlinkSync(pathname); |
There was a problem hiding this comment.
Avoid overwriting live subagent runs from legacy snapshot
The importer unconditionally writes legacy subagents/runs.json records into SQLite and then unlinks the file. Subagent rows are updated during normal lifecycle persistence (persistSubagentRunsToState/saveSubagentRegistryToState), so if doctor runs later, matching run_id rows can be rolled back to stale metadata (cleanup/outcome/announce fields) and the rollback becomes permanent once the legacy file is deleted.
Useful? React with 👍 / 👎.
| if (record && (await importLegacyManagedImageRecord(record, stateDir))) { | ||
| records += 1; | ||
| } | ||
| await fs.rm(recordPath, { force: true }).catch(() => {}); |
There was a problem hiding this comment.
Keep managed-image record files when a row fails import
Each legacy managed-image record file is removed even when parsing fails or importLegacyManagedImageRecord returns false (for example, malformed JSON or missing source image). That permanently discards skipped record metadata, so operators cannot repair the bad rows and rerun migration to recover those attachments.
Useful? React with 👍 / 👎.
| .map((entry) => entry.name); | ||
| let imported = 0; | ||
| for (const fileName of files) { | ||
| const raw = await runsRoot.readText(fileName).catch(() => ""); |
There was a problem hiding this comment.
Do not delete cron log files after a read failure
When reading a legacy cron run-log file fails, the code coerces content to "", imports zero entries, and still deletes the file. In transient I/O/permission error scenarios this silently loses the only recoverable log data, preventing a retry after fixing the underlying read issue.
Useful? React with 👍 / 👎.
|
@steipete @jalehman One maintainer-readable, agent-runnable map for the companion-fit work around Architecture decisionThe right long-term shape is:
That keeps the boundary clean:
Why this is better for OpenClaw, not just for LCM
Current stack mapflowchart LR
A["#78595 database-first runtime"] --> B["#79971 runtime-truth correctness"]
A --> C["#79970 selection seam"]
A --> D["#79972 replay seam"]
B --> E["#79905 typed projection seam"]
C --> E
D --> E
D --> F["LCM #646 SQLite frontier"]
C --> G["LCM #642 identity mapping"]
D --> H["LCM #643 message_parts reconstruction"]
H --> I["LCM #644 rotate/checkpoint/GC remap"]
What is already implemented and why it exists1.
|
|
Final implementation map for the companion-fit / Lossless Claw migration stack. Architecture decision remains:
Implemented OpenClaw slices:
Implemented LCM slices:
Why some of the later implementation PRs live in the forks:
Recommended execution/review order:
For Peter / agent handoff:
|
Summary
This PR is the database-first runtime/state refactor for #78096. It moves OpenClaw away from scattered JSON, JSONL, sidecar SQLite, lock-file, pruning, and truncation-repair paths toward a typed SQLite storage model with one shared control-plane database and one data-plane database per agent.
Current rule of the road:
openclaw.json, plugin manifests, Git checkouts, explicit credential source files, and real user workspaces remain file-backed configuration or user content.Refs #78096
Current State
Updated May 10, 2026.
The PR branch has the main architecture in place:
node:sqliteruntime accessThe active cleanup pass is now focused on deleting/refactoring stale file-era assumptions from tests and runtime seams. Recent cleanup removed or rewired tests that still pretended active sessions/transcripts/auth lived in JSON/JSONL files, and tightened docs so transcript JSONL is migration-only input, never a runtime locator or bridge.
This is not merge-ready until the latest cleanup is pushed and the current Crabbox/Testbox validation is green.
Reviewer Mental Model
Review this as a storage-boundary refactor, not as a single session bug fix.
~/.openclaw/state/openclaw.sqliteowns gateway-wide registries, plugin state/blob rows, cron/task/flow state, queues, sandbox/browser/device/pairing data, migration bookkeeping, auth-profile KV rows, and shared caches.~/.openclaw/agents/<agentId>/agent/openclaw-agent.sqliteowns sessions, transcripts, ACP/subagent rows, VFS rows, tool/run artifacts, and agent-local cache data for that one workspace.openclaw.json, plugin manifests, Git checkouts, and real workspace files stay file-backed by design. Credential source files remain only where the user explicitly configured a file-backed secret source; auth profile runtime state itself is SQLite-backed.node:sqlitedialect rather than a second SQLite runtime driver.Database Layout
Global State Database
~/.openclaw/state/openclaw.sqliteOwns data that must be visible across the gateway and agents:
The global DB is intentionally not where large agent-local transcripts, VFS rows, or artifacts accumulate.
Per-Agent Database
~/.openclaw/agents/<agentId>/agent/openclaw-agent.sqliteOwns state that belongs to one agent/workspace:
The global database registers where each agent database lives. The agent database owns the write-heavy per-agent lane, so transcripts and artifacts do not become a global gateway bottleneck.
Schema And Access
The database schemas are SQL-first and generated into TypeScript from real temporary SQLite databases:
src/state/openclaw-state-schema.sql->src/state/openclaw-state-db.generated.d.tsandsrc/state/openclaw-state-schema.generated.tssrc/state/openclaw-agent-schema.sql->src/state/openclaw-agent-db.generated.d.tsandsrc/state/openclaw-agent-schema.generated.tsRuntime code uses Kysely over OpenClaw's native
node:sqlitedialect.better-sqlite3is used only bykysely-codegenfor dev-time introspection; it is not a runtime driver. Runtime stores use typed Kysely queries, transactions, unique keys, conflict handling, and explicit row patches instead of whole-file mutation.Refactor Scope
This branch moves or removes the file-era runtime paths across the codebase:
sessions.jsonto per-agentsession_entriesrows.{ agentId, sessionKey }, not a legacystorePathor transcript locator.auth-profiles.json/auth-state.jsoninto SQLite-backed state, with doctor-owned import for legacy files.sessionTranscripts, and QMD/builtin indexers no longer expose session-file helper names.Removed File-Era Machinery
SQLite replaces these old runtime concepts:
sessions.jsonrewrite queuessessions.jsonrow-store shims that let new runtime tests keep pretending session state was a fileMigration Model
openclaw doctor --fixandopenclaw migrateare the migration boundary.Migration builds the global and per-agent SQLite databases from legacy inputs, verifies imported rows, records migration runs, and removes old source files only after successful import. Runtime code should not silently import, repair, prune, or rewrite legacy session/cache/auth files while handling normal gateway traffic.
Migration inputs include:
sessions.json-> per-agent session rows*.jsonl-> per-agent transcript events/snapshotsauth-profiles.json,auth-state.json,auth.json, and OAuth credential files -> SQLite auth-profile state where applicablejobs.json,jobs-state.json, and run JSONL files -> shared cron tablesFuture state changes should add schema migrations and typed stores instead of adding new sidecar files.
Backup, Export, And Vacuum
Backups should be database-first archives:
state/openclaw.sqliteagents/<agentId>/agent/openclaw-agent.sqliteSession export remains a support/debug/workspace-export feature, not a second canonical runtime state system.
VFS, Workers, And PI Boundary
Agent workspace state is designed for disk, hybrid, or SQLite-backed VFS operation. The per-agent database owns VFS rows, tool artifacts, run artifacts, and scoped caches so worker-prepared runs can receive a serializable storage boundary.
Worker execution stays experimental behind settings while the storage boundary settles. The intended shape is one worker per active run first; pooling can come later after lifecycle and database contention are proven.
This also continues internalizing PI behind OpenClaw-owned facades. Session state, transcript storage, filesystem behavior, worker preparation, runtime accounting, and provider/runtime contracts are represented through OpenClaw-owned stores and contracts instead of letting PI define the durable layout.
Review Guide
For each former file-backed state owner, ask: where does this data live now?
Acceptable homes are:
Anything else is probably leftover file-era state and should be deleted, migrated, or renamed until the boundary is clear.
Latest Validation
Current Crabbox/Blacksmith Testbox run after the latest rebase/cleanup:
Recent local/remote validation during the cleanup pass:
pnpm check:database-first-legacy-storespnpm db:kysely:checkpnpm lint:kyselypnpm tsgo:corepnpm tsgo:extensionsgit diff --checkpnpm test src/channels/plugins/bundled.shape-guard.test.tspnpm test extensions/matrix/src/migration-config.test.ts extensions/matrix/src/matrix/sdk/idb-persistence.test.tsRecent Testbox findings were stale assumptions rather than new product design issues:
imessagesetup metadata missing a legacy migration discovery hint, Matrix tests leaking SQLite plugin state across cases, Matrix IndexedDB snapshot tests using async env-scoped state around SQLite, and channel/gateway fixtures that still asserted pre-SQLite file/session paths. Those are being cleaned up as part of the current pass before this PR is considered ready.