feat(sessions): directory-per-session store#42524
feat(sessions): directory-per-session store#42524Diaspar4u wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Replace monolithic sessions.json with sessions.d/ directory layout where each session is a separate file. Atomic writes via tmp+rename, diff-based updates to minimize I/O, and one-time startup migration with staging-dir crash safety. Use key-order-independent comparison (stableStringify) in computeStoreDiff to prevent unnecessary writes from property ordering.
🔒 Aisle Security AnalysisWe found 5 potential security issue(s) in this PR:
1. 🟠 Directory-mode session store cache is TTL-only, enabling cross-process stale reads of routing/security policy
DescriptionThe directory-per-session session store ( In directory mode, cache validation intentionally skips mtime/size checks and relies only on a TTL:
This creates a security-relevant stale-read window in multi-process deployments sharing the same
Vulnerable code (directory mode cache validation is TTL-only): if (useDirectory) {
const cached = readSessionStoreCache({
storePath,
ttlMs: getSessionStoreTtl(),
mtimeMs: undefined,
sizeBytes: undefined,
});
if (cached) {
return cached;
}
}Because the prior JSON store used file RecommendationMake directory-mode cache coherence observable across processes. Recommended options (pick one):
Example: // On write (after all entry writes succeed)
const versionPath = path.join(resolveSessionStoreDir(storePath), ".version");
await fs.promises.writeFile(versionPath + ".tmp", String(Date.now()), { mode: 0o600 });
await fs.promises.rename(versionPath + ".tmp", versionPath);
// On read
const versionStat = getFileStatSnapshot(versionPath);
const cached = readSessionStoreCache({
storePath,
ttlMs: getSessionStoreTtl(),
mtimeMs: versionStat?.mtimeMs,
sizeBytes: versionStat?.sizeBytes,
});
Also consider documenting that multi-process deployments require the version file approach to avoid misdelivery and stale policy enforcement. 2. 🟠 Potential CPU/memory DoS in directory session-store diffing via stableStringify + structuredClone
DescriptionThe new directory-per-session store writes only changed entries by taking a full pre-mutation snapshot ( This is security-relevant because it enables uncontrolled resource consumption / crashy behavior on common update paths (e.g., inbound messages updating last route):
Vulnerable code: for (const key of Object.keys(current)) {
const prev = previous[key];
const curr = current[key];
if (!prev || stableStringify(prev) !== stableStringify(curr)) {
changed.push(key);
}
}This diff is triggered by directory-mode writers that snapshot the entire store: const previousSnapshot = useDirectory ? structuredClone(store) : undefined;RecommendationAvoid deep full-store snapshot+diff on hot paths. Recommended mitigations (choose one or combine):
Example: write only the mutated key for // after resolving the normalized key
await saveSessionStoreUnlocked(storePath, store, opts, {
changedKeys: [resolved.normalizedKey],
removedKeys: resolved.legacyKeys,
});And guard diffing as a fallback: let previousSnapshot: Record<string, SessionEntry> | undefined;
try {
previousSnapshot = useDirectory ? structuredClone(store) : undefined;
} catch {
previousSnapshot = undefined; // fall back to full write
}3. 🟡 Symlink traversal/arbitrary file write via directory-based session store
DescriptionThe new directory-per-session store will follow symlinks for the store directory path ( If an attacker can manipulate
Vulnerable code (directory store write path): const filePath = path.join(storeDir, fileName);
const tmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`;
await fs.promises.mkdir(storeDir, { recursive: true });
await fs.promises.writeFile(tmp, json, { mode: 0o600, encoding: "utf-8" });
await fs.promises.rename(tmp, filePath);
await fs.promises.chmod(filePath, 0o600)This is exploitable because the code never verifies RecommendationHarden directory-store filesystem operations against symlink manipulation:
Example hardened approach: import { constants as C } from "node:fs";
async function assertRealDir(p: string) {
const st = await fs.promises.lstat(p).catch(() => null);
if (st && st.isSymbolicLink()) {
throw new Error(`Refusing to use symlinked session store dir: ${p}`);
}
if (st && !st.isDirectory()) {
throw new Error(`Session store dir is not a directory: ${p}`);
}
}
await fs.promises.mkdir(storeDir, { recursive: true, mode: 0o700 });
await assertRealDir(storeDir);
const fh = await fs.promises.open(
tmp,
C.O_WRONLY | C.O_CREAT | C.O_EXCL | (C.O_NOFOLLOW ?? 0),
0o600,
);
try {
await fh.writeFile(json, "utf-8");
} finally {
await fh.close();
}
await fs.promises.rename(tmp, filePath);
// omit chmod() (or verify with lstat before chmod)Additionally, consider enforcing safe permissions/ownership for 4. 🔵 Startup session-store auto-migration can follow symlinks and operate on unintended paths
DescriptionThe new gateway startup auto-migration logic constructs This is risky because Impact depends on the deployment’s filesystem permissions, but in a misconfigured/shared Vulnerable code (startup migration calls): const sessionDirs = await listAgentSessionDirs(STATE_DIR);
for (const sessionsDir of sessionDirs) {
const storePath = path.join(sessionsDir, "sessions.json");
await migrateSessionStoreToDirectory(storePath);
}
if (configSnapshot.config.session?.store) {
const configStorePath = resolveStorePath(configSnapshot.config.session.store);
await migrateSessionStoreToDirectory(configStorePath);
}Why this is exploitable:
RecommendationAdd path sandboxing and symlink defenses before migrating:
import fs from "node:fs/promises";
import path from "node:path";
import { isPathWithin } from "../commands/cleanup-utils.js";
const agentsRoot = path.join(STATE_DIR, "agents");
for (const sessionsDir of await listAgentSessionDirs(STATE_DIR)) {
// Ensure sessionsDir exists and is not a symlink
const st = await fs.lstat(sessionsDir).catch(() => null);
if (!st?.isDirectory() || st.isSymbolicLink()) continue;
const realSessionsDir = await fs.realpath(sessionsDir).catch(() => null);
if (!realSessionsDir || !isPathWithin(realSessionsDir, agentsRoot)) continue;
const storePath = path.join(realSessionsDir, "sessions.json");
await migrateSessionStoreToDirectory(storePath);
}
5. 🔵 Prototype Pollution via session filename keys in directory-based session store
DescriptionThe new directory-per-session store loader builds a plain object map and assigns entries using a key derived from on-disk filenames:
Vulnerable code: const store: Record<string, SessionEntry> = {};
...
const sessionKey = desanitizeSessionKey(fileName.slice(0, -5));
...
store[sessionKey] = entry;This risk is increased by the directory store layout because filenames are now directly interpreted as object keys and assigned into a plain object. A malicious/poisoned on-disk RecommendationUse a prototype-less dictionary (or Option A: prototype-less object + key guardimport { isBlockedObjectKey } from "../../infra/prototype-keys.js";
function loadSessionStoreFromDir(storeDir: string): Record<string, SessionEntry> {
const store: Record<string, SessionEntry> = Object.create(null);
...
const sessionKey = desanitizeSessionKey(fileName.slice(0, -5));
if (isBlockedObjectKey(sessionKey)) {
continue; // or log + skip
}
...
store[sessionKey] = entry;
return store;
}Also apply the same Option B: use
|
Greptile SummaryThis PR introduces a directory-per-session store ( Issues found:
Confidence Score: 3/5
Last reviewed commit: 196d3fa |
| } catch (err) { | ||
| if (getErrorCode(err) === "ENOENT") { | ||
| // Parent dir may have been removed (e.g. in tests). Best-effort atomic retry. | ||
| let retryTmp: string | undefined; | ||
| try { | ||
| await fs.promises.mkdir(storeDir, { recursive: true }); | ||
| retryTmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`; | ||
| await fs.promises.writeFile(retryTmp, json, { mode: 0o600, encoding: "utf-8" }); | ||
| await fs.promises.rename(retryTmp, filePath); | ||
| } catch { | ||
| if (retryTmp) { | ||
| await fs.promises.rm(retryTmp, { force: true }).catch(() => undefined); | ||
| } | ||
| } | ||
| return; |
There was a problem hiding this comment.
Silent write failure swallowed in ENOENT retry path
If the ENOENT retry itself fails (inner catch at line 158), the error is silently swallowed and the function executes the return on line 163 — meaning the session entry write is silently lost with no log, no exception, and no indication to the caller. This can cause real data loss if, for example, a rename fails due to a permissions issue or disk-full condition during the retry.
The inner catch should at minimum log the error, and ideally re-throw so the outer lock/retry mechanism can surface it:
} catch (err) {
if (getErrorCode(err) === "ENOENT") {
let retryTmp: string | undefined;
try {
await fs.promises.mkdir(storeDir, { recursive: true });
retryTmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`;
await fs.promises.writeFile(retryTmp, json, { mode: 0o600, encoding: "utf-8" });
await fs.promises.rename(retryTmp, filePath);
} catch (retryErr) {
if (retryTmp) {
await fs.promises.rm(retryTmp, { force: true }).catch(() => undefined);
}
log.warn(`writeSessionEntryToDir: ENOENT retry also failed for ${filePath}: ${String(retryErr)}`);
throw retryErr;
}
return;
}
throw err;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/sessions/store.ts
Line: 149-163
Comment:
**Silent write failure swallowed in ENOENT retry path**
If the ENOENT retry itself fails (inner `catch` at line 158), the error is silently swallowed and the function executes the `return` on line 163 — meaning the session entry write is **silently lost** with no log, no exception, and no indication to the caller. This can cause real data loss if, for example, a `rename` fails due to a permissions issue or disk-full condition during the retry.
The inner `catch` should at minimum log the error, and ideally re-throw so the outer lock/retry mechanism can surface it:
```
} catch (err) {
if (getErrorCode(err) === "ENOENT") {
let retryTmp: string | undefined;
try {
await fs.promises.mkdir(storeDir, { recursive: true });
retryTmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`;
await fs.promises.writeFile(retryTmp, json, { mode: 0o600, encoding: "utf-8" });
await fs.promises.rename(retryTmp, filePath);
} catch (retryErr) {
if (retryTmp) {
await fs.promises.rm(retryTmp, { force: true }).catch(() => undefined);
}
log.warn(`writeSessionEntryToDir: ENOENT retry also failed for ${filePath}: ${String(retryErr)}`);
throw retryErr;
}
return;
}
throw err;
}
```
How can I resolve this? If you propose a fix, please make it concise.| } catch { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Silent return on corrupted sessions.json makes failures hard to diagnose
If sessions.json exists but is unparseable (e.g., truncated due to a previous crash), the catch silently returns false. From the gateway startup perspective, this looks identical to "no file to migrate" — no warning is emitted, the file is left on disk, and the gateway falls back to broken legacy JSON mode without any indication of the problem.
A log.warn here would make post-incident debugging significantly easier:
| } catch { | |
| return false; | |
| } | |
| } catch (err) { | |
| log.warn("migrateSessionStoreToDirectory: failed to parse legacy sessions.json, skipping migration", { | |
| storePath, | |
| err: String(err), | |
| }); | |
| return false; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/sessions/store.ts
Line: 233-235
Comment:
**Silent return on corrupted `sessions.json` makes failures hard to diagnose**
If `sessions.json` exists but is unparseable (e.g., truncated due to a previous crash), the `catch` silently returns `false`. From the gateway startup perspective, this looks identical to "no file to migrate" — no warning is emitted, the file is left on disk, and the gateway falls back to broken legacy JSON mode without any indication of the problem.
A `log.warn` here would make post-incident debugging significantly easier:
```suggestion
} catch (err) {
log.warn("migrateSessionStoreToDirectory: failed to parse legacy sessions.json, skipping migration", {
storePath,
err: String(err),
});
return false;
```
How can I resolve this? If you propose a fix, please make it concise.| const keys = Object.keys(store); | ||
| if (keys.length === 0) { | ||
| return false; | ||
| } | ||
|
|
||
| const storeDir = resolveSessionStoreDir(storePath); | ||
| const dirAlreadyExists = isDirectoryStore(storePath); | ||
| log.info("migrating session store from JSON to directory layout", { | ||
| entries: keys.length, | ||
| storeDir, | ||
| merge: dirAlreadyExists, | ||
| }); | ||
|
|
||
| // Deduplicate case-variant keys: multiple legacy keys may normalize to the | ||
| // same key (e.g. "Foo" and "foo"). Keep the entry with the newest updatedAt. | ||
| const deduped = new Map<string, SessionEntry>(); | ||
| for (const [key, entry] of Object.entries(store)) { | ||
| if (!entry) { | ||
| continue; | ||
| } | ||
| const normalizedKey = normalizeStoreSessionKey(key); | ||
| const prev = deduped.get(normalizedKey); | ||
| if (!prev || (entry.updatedAt ?? 0) > (prev.updatedAt ?? 0)) { | ||
| deduped.set(normalizedKey, entry); | ||
| } | ||
| } |
There was a problem hiding this comment.
Migration proceeds and destroys sessions.json even when all entries are skipped
If sessions.json contains keys but all their values are falsy (e.g., {"key": null}), keys.length > 0 passes the guard on line 238, but the deduplication loop skips every entry due to if (!entry) continue, leaving deduped empty with deduped.size === 0.
The code then proceeds to:
- Create the staging directory with zero files
- Atomically rename it to
sessions.d/ - Back up and delete
sessions.json
The net result is the gateway loads an empty store — all session data appears lost (though it's in the .bak file). A guard after the dedup loop would prevent this:
if (deduped.size === 0) {
log.warn("migrateSessionStoreToDirectory: all entries were falsy/invalid, skipping migration", { storePath });
return false;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/sessions/store.ts
Line: 237-262
Comment:
**Migration proceeds and destroys `sessions.json` even when all entries are skipped**
If `sessions.json` contains keys but all their values are falsy (e.g., `{"key": null}`), `keys.length > 0` passes the guard on line 238, but the deduplication loop skips every entry due to `if (!entry) continue`, leaving `deduped` empty with `deduped.size === 0`.
The code then proceeds to:
1. Create the staging directory with zero files
2. Atomically rename it to `sessions.d/`
3. Back up and delete `sessions.json`
The net result is the gateway loads an empty store — all session data appears lost (though it's in the `.bak` file). A guard after the dedup loop would prevent this:
```
if (deduped.size === 0) {
log.warn("migrateSessionStoreToDirectory: all entries were falsy/invalid, skipping migration", { storePath });
return false;
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 196d3fa176
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Fallback: migrate a custom session.store config path (may resolve outside the default agent dirs). | ||
| if (configSnapshot.config.session?.store) { | ||
| try { | ||
| const configStorePath = resolveStorePath(configSnapshot.config.session.store); |
There was a problem hiding this comment.
Migrate templated session.store paths for every agent
This fallback only resolves config.session.store once, and resolveStorePath defaults {agentId} to main when no agentId is passed. In setups that use a custom templated store path outside STATE_DIR/agents (for example .../{agentId}/sessions.json), non-main agent stores are never migrated, so those agents keep running on legacy sessions.json instead of the new directory layout after upgrade.
Useful? React with 👍 / 👎.
Summary
sessions.jsonfile uses a global write lock, causing cross-session write contention on multi-agent gateways. Every session update serializes behind a single lock even when sessions are independent.sessions.d/directory layout stores one JSON file per session key (percent-encoded filenames). Atomic writes via tmp+rename, diff-based updates using the existingstableStringifyutility. One-time startup migration with staging-dir crash safety. Gateway auto-migrates all agent session directories on boot.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
sessions.jsonis automatically migrated to asessions.d/directory (one file per session). The original JSON is backed up assessions.json.bak.<timestamp>.Security Impact (required)
0o600(owner-only read/write). Session key filenames are percent-encoded to prevent path traversal via colons, slashes, or backslashes.Repro + Verification
Environment
Steps
sessions.jsoncontaining multiple agent sessionssessions.d/on startupExpected
sessions.d/directory created with one.jsonfile per session keysessions.jsonbacked up and removedActual
Evidence
pnpm buildcleanpnpm checkcleanpnpm testgreenHuman Verification (required)
Review Conversations
Compatibility / Migration
loadSessionStoretransparently reads from directory or legacy JSONmigrateSessionStoreToDirectoryfor every agent session directory found on disk. Uses staging-dir approach: writes tosessions.d.migrating/, atomically renames tosessions.d/, backs up original JSON. If directory already exists (partial previous migration), new entries are merged without clobbering existing ones.Failure Recovery (if this breaks)
sessions.d/directory and restore fromsessions.json.bak.<timestamp>(rename back tosessions.json). Gateway falls back to legacy JSON mode when nosessions.d/exists.sessions.d/filesRisks and Mitigations
readSessionUpdatedAt) skip full directory scan entirely.sessions.d.migrating/renamed atomically tosessions.d/). Stale staging dirs are cleaned up on retry. Merge mode (dir already exists) writes directly without staging.This PR was developed with AI assistance. I have reviewed all changes, verified the test suite, and validated migration behavior on a real multi-agent gateway.