feat(sessions): directory-per-session store to eliminate lock contention#39991
feat(sessions): directory-per-session store to eliminate lock contention#39991Diaspar4u wants to merge 1 commit intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0dc7419bd5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Greptile SummaryThis PR replaces the monolithic Key findings:
Confidence Score: 3/5
Last reviewed commit: 0dc7419 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8dcc9ce86c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
8dcc9ce to
12532a1
Compare
🔒 Aisle Security AnalysisWe found 4 potential security issue(s) in this PR:
1. 🟠 Directory session store can write dotfile session entries that are skipped on load (hidden/unloadable sessions → persistent disk DoS/data loss)
DescriptionThe new directory-per-session store introduces a key→filename sanitization edge case that can make sessions permanently invisible/unloadable.
As a result, any session key beginning with Security/impact:
Vulnerable code: // load: dotfiles ignored
if (!fileName.endsWith(".json") || fileName.startsWith(".")) {
continue;
}
// write: dot-prefixed keys remain dot-prefixed filenames
const fileName = `${sanitizeSessionKey(sessionKey)}.json`;RecommendationMitigate by ensuring no valid session key can map to a dotfile, or by ensuring dotfiles are not skipped when they are part of the session-store namespace. Recommended approach (defensive):
Concrete fixes:
export function sanitizeSessionKey(key: string): string {
const encoded = key
.replace(/%/g, "%25")
.replace(/\//g, "%2F")
.replace(/\\/g, "%5C")
.replace(/:/g, "%3A");
// Prevent dotfiles (also avoid '.' and '..' confusion)
if (encoded.startsWith(".")) {
return `%2E${encoded.slice(1)}`; // or `_${encoded}`
}
return encoded;
}
Also add a length guard to prevent filesystem errors (optional but recommended):
2. 🟡 Symlink/path escape in gateway startup session-store auto-migration
DescriptionThe new gateway startup auto-migration scans If
Vulnerable code (startup migration): const sessionDirs = await listAgentSessionDirs(STATE_DIR);
for (const sessionsDir of sessionDirs) {
const storePath = path.join(sessionsDir, "sessions.json");
await migrateSessionStoreToDirectory(storePath);
}Related contributor: Additionally, the migration of a custom RecommendationAdd realpath-based containment checks and reject symlinked session directories before migrating. Example mitigation in import fs from "node:fs/promises";
import path from "node:path";
const stateReal = await fs.realpath(STATE_DIR);
for (const sessionsDir of await listAgentSessionDirs(STATE_DIR)) {
// Ensure sessionsDir exists and is a real directory, not a symlink
const st = await fs.lstat(sessionsDir);
if (!st.isDirectory() || st.isSymbolicLink()) continue;
const sessionsReal = await fs.realpath(sessionsDir);
const rel = path.relative(stateReal, sessionsReal);
if (rel.startsWith("..") || path.isAbsolute(rel)) continue; // escape attempt
const storePath = path.join(sessionsReal, "sessions.json");
await migrateSessionStoreToDirectory(storePath);
}For
Also consider reducing log sensitivity by logging paths relative to 3. 🔵 Symlink-following read of session entry files in directory store (CWE-59)
Description
If an attacker can create or replace files inside
Vulnerable code: const filePath = path.join(storeDir, fileName);
const raw = fs.readFileSync(filePath, "utf-8");RecommendationHarden directory-store file access against symlink/device-file abuse:
Example: import { constants as FS } from "node:fs";
async function safeReadSessionFile(filePath: string, maxBytes = 1024 * 1024) {
const st = await fs.promises.lstat(filePath);
if (st.isSymbolicLink() || !st.isFile()) {
throw new Error("refusing to read non-regular session entry file");
}
if (st.size > maxBytes) {
throw new Error("session entry too large");
}
const fh = await fs.promises.open(filePath, FS.O_RDONLY | (FS.O_NOFOLLOW ?? 0));
try {
return await fh.readFile({ encoding: "utf-8" });
} finally {
await fh.close();
}
}Also ensure the session state directory is created with restrictive permissions (e.g., 0700) to prevent untrusted local users from modifying 4. 🔵 Symlink-based arbitrary file overwrite via non-atomic ENOENT fallback in directory session writer (CWE-59)
Description
However, on
Vulnerable code: if (getErrorCode(err) === "ENOENT") {
...
await fs.promises.mkdir(storeDir, { recursive: true });
await fs.promises.writeFile(filePath, json, { mode: 0o600, encoding: "utf-8" });
}RecommendationAvoid direct writes to the final path; keep the temp+rename pattern even on ENOENT, and add link/file-type checks. Recommended changes:
Example (retry preserving atomic semantics): async function writeSessionEntryToDir(storeDir: string, sessionKey: string, entry: SessionEntry) {
const fileName = `${sanitizeSessionKey(sessionKey)}.json`;
const filePath = path.join(storeDir, fileName);
const json = JSON.stringify(entry, null, 2);
for (let attempt = 0; attempt < 2; attempt++) {
await fs.promises.mkdir(storeDir, { recursive: true });
const tmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`;
try {
await fs.promises.writeFile(tmp, json, { mode: 0o600, encoding: "utf-8", flag: "wx" });
await fs.promises.rename(tmp, filePath);
await fs.promises.chmod(filePath, 0o600).catch(() => undefined);
return;
} catch (e) {
if (getErrorCode(e) === "ENOENT" && attempt === 0) continue;
throw e;
} finally {
await fs.promises.rm(tmp, { force: true }).catch(() => undefined);
}
}
}Also ensure the state directory and Analyzed PR: #39991 at commit Last updated on: 2026-03-10T14:39:02Z |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12532a13cf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Security-model assessment per SECURITY.md: OpenClaw runs as a trusted-operator gateway, not an adversarial multi-tenant service. The operator controls the filesystem, config, and state directory.
|
d218ab6 to
61f5ebc
Compare
|
Closing to resubmit with a clean commit history and updated PR description. New PR incoming from the same branch. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 61f5ebc536
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } catch { | ||
| // Ignore | ||
| } |
There was a problem hiding this comment.
Re-throw fallback write failures for session entry writes
The ENOENT recovery path swallows every error from the retry write, so callers can get a successful save even when the fallback fails with non-ENOENT errors (for example EACCES/ENOSPC), leaving the session entry unwritten and causing silent state loss in directory mode. This should only suppress "file disappeared" cases and propagate other retry failures.
Useful? React with 👍 / 👎.
| // Fallback: migrate a custom session.store config path (may resolve outside the default agent dirs). | ||
| if (configSnapshot.config.session?.store) { | ||
| try { | ||
| const configStorePath = resolveStorePath(configSnapshot.config.session.store); |
There was a problem hiding this comment.
Migrate templated custom store path for each agent
This fallback migrates session.store only once without an agentId, so a templated path like /custom/{agentId}/sessions.json outside STATE_DIR migrates only the main-agent store and leaves other agent stores in legacy JSON mode. Fresh evidence: resolveStorePath expands {agentId} using the default agent when options are omitted (src/config/sessions/paths.ts), and startup calls it here without passing an agent id.
Useful? React with 👍 / 👎.
Problem
The monolithic
sessions.jsonfile causes cross-session lock contention. When one session writes (e.g., an agent doing many tool calls), all other sessions block waiting for the global write lock. This causessessions_sendtimeouts and cross-topic blocking.Solution
Add a directory-per-session store layout (
sessions.d/) where each session key gets its own JSON file. In directory mode:updateSessionStoresnapshots before the mutator runs, then writes only changed entriesreadSessionUpdatedAtreads one file instead of parsing the entire storeMigration
On gateway startup, all agent session directories are discovered via disk scan (
~/.openclaw/agents/*/sessions/) and each is migrated from legacysessions.jsonto the directory layout. This covers the default agent, sub-agents, and any ephemeral agents that exist on disk. A config-resolved fallback also handles customsession.storepaths that may live outside the default agent directories.The old file is backed up as
sessions.json.bak.<timestamp>. Migration is idempotent - already-migrated agents are skipped silently. Each agent's migration is independent (wrapped in try/catch) so one failure doesn't block others. Skipped in minimal test gateways.Fresh installs continue using JSON mode until an existing
sessions.jsontriggers migration.Key sanitization
Session keys (e.g.,
agent:main:telegram:direct:james) are sanitized for filesystem safety using URL-encoding style: colons become%3A, slashes become%2F/%5C, percent signs become%25. Keys are normalized (lowercased) during migration to match runtime lookup behavior. Round-trips correctly.Backward compatibility
sessions.jsonstores work unchanged until gateway restart triggers migrationloadSessionStore,saveSessionStore,updateSessionStore,updateSessionStoreEntry,updateLastRoute) transparently support both modessaveSessionStoreretains a global lock for bulk writes (rare operations like maintenance commands)Supersedes #34686 (force-push after rebase broke reopen). Related to #19528 (stale community PR with same goal, but heavily conflicted with current main).
Validation
pnpm build-- cleanpnpm check-- cleanpnpm test src/config/sessions/store.directory.test.ts-- passedAI Disclosure
AI-assisted (Claude). Fully tested and reviewed.
Closes #42160