Skip to content

feat(sessions): directory-per-session store to eliminate lock contention#39991

Closed
Diaspar4u wants to merge 1 commit intoopenclaw:mainfrom
Diaspar4u:fix/session-store-directory
Closed

feat(sessions): directory-per-session store to eliminate lock contention#39991
Diaspar4u wants to merge 1 commit intoopenclaw:mainfrom
Diaspar4u:fix/session-store-directory

Conversation

@Diaspar4u
Copy link
Copy Markdown
Contributor

@Diaspar4u Diaspar4u commented Mar 8, 2026

Problem

The monolithic sessions.json file causes cross-session lock contention. When one session writes (e.g., an agent doing many tool calls), all other sessions block waiting for the global write lock. This causes sessions_send timeouts and cross-topic blocking.

Solution

Add a directory-per-session store layout (sessions.d/) where each session key gets its own JSON file. In directory mode:

  • No global lock on writes - each session file is written atomically (temp file + rename), so writes to different sessions never block each other
  • Diff-based writes - updateSessionStore snapshots before the mutator runs, then writes only changed entries
  • Efficient single-session reads - readSessionUpdatedAt reads one file instead of parsing the entire store
  • Full maintenance parity - pruning, capping, disk budget, and transcript archival all run the same as legacy mode

Migration

On gateway startup, all agent session directories are discovered via disk scan (~/.openclaw/agents/*/sessions/) and each is migrated from legacy sessions.json to the directory layout. This covers the default agent, sub-agents, and any ephemeral agents that exist on disk. A config-resolved fallback also handles custom session.store paths that may live outside the default agent directories.

The old file is backed up as sessions.json.bak.<timestamp>. Migration is idempotent - already-migrated agents are skipped silently. Each agent's migration is independent (wrapped in try/catch) so one failure doesn't block others. Skipped in minimal test gateways.

Fresh installs continue using JSON mode until an existing sessions.json triggers migration.

Key sanitization

Session keys (e.g., agent:main:telegram:direct:james) are sanitized for filesystem safety using URL-encoding style: colons become %3A, slashes become %2F/%5C, percent signs become %25. Keys are normalized (lowercased) during migration to match runtime lookup behavior. Round-trips correctly.

Backward compatibility

  • Existing sessions.json stores work unchanged until gateway restart triggers migration
  • All public APIs (loadSessionStore, saveSessionStore, updateSessionStore, updateSessionStoreEntry, updateLastRoute) transparently support both modes
  • Cache layer works with both modes (TTL-only invalidation for directory mode since individual file mtimes don't propagate reliably to directory mtime)
  • saveSessionStore retains a global lock for bulk writes (rare operations like maintenance commands)

Supersedes #34686 (force-push after rebase broke reopen). Related to #19528 (stale community PR with same goal, but heavily conflicted with current main).

Validation

  • pnpm build -- clean
  • pnpm check -- clean
  • pnpm test src/config/sessions/store.directory.test.ts -- passed

AI Disclosure

AI-assisted (Claude). Fully tested and reviewed.

Closes #42160

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: XL labels Mar 8, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0dc7419bd5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/config/sessions/store.ts
Comment thread src/gateway/server.impl.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 8, 2026

Greptile Summary

This PR replaces the monolithic sessions.json store with a sessions.d/ directory layout where each session key maps to its own JSON file, with the goal of eliminating cross-session write contention. The implementation is generally sound — key sanitization is correctly ordered, atomic temp+rename writes are properly handled, migration is idempotent, and the test suite is thorough.

Key findings:

  • Global in-process write lock is not eliminated — all write paths (updateSessionStore, updateSessionStoreEntry, updateLastRoute) still go through withSessionStoreLock(storePath, ...), the same per-storePath in-process queue as before. Concurrent session writes from the same process still serialize. The improvement is shorter lock hold time (writing one ~1 KB file vs. serializing the entire store), which reduces average blocking but does not eliminate cross-session contention within a single process. Additionally, updateSessionStoreEntry reads and deep-clones the entire directory store on every single-entry update, making it O(n) in the number of sessions rather than O(1). The PR description's "No global lock on writes" is accurate only for cross-process writes.

  • Redundant fallback migration with misleading error message — when configSnapshot.config.session?.store is undefined, the fallback migration attempts to migrate the default main-agent path, which is already covered by the listAgentSessionDirs scan immediately above. The redundant call is harmless and idempotent, but if resolveStorePath ever throws, the error message "Failed to migrate custom session store path" is misleading since it's not a custom path.

Confidence Score: 3/5

  • Safe to merge as an incremental improvement, but the primary stated goal (eliminating global lock contention) is not fully achieved — the hot path still holds the global in-process lock and reads the entire store on every single-session update.
  • The implementation is functionally correct and well-tested. Migration is idempotent, backward-compatible, and crash-safe. The real-world improvement (shorter lock hold time) is genuine and will reduce sessions_send timeouts. However, the O(n) read + structuredClone inside updateSessionStoreEntry and the persisting global lock mean the PR doesn't deliver the "no cross-session blocking" behavior described in the PR description. The redundant startup fallback and misleading error message are minor operational concerns. A score of 3 reflects that the PR is a solid improvement but falls short of its stated architectural goal, which may lead to follow-up work or misaligned expectations.
  • src/config/sessions/store.ts — specifically the updateSessionStoreEntry function and the full-store load it performs in directory mode on every invocation.

Last reviewed commit: 0dc7419

Comment thread src/config/sessions/store.ts
Comment thread src/gateway/server.impl.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8dcc9ce86c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/config/sessions/store.ts Outdated
@Diaspar4u Diaspar4u force-pushed the fix/session-store-directory branch from 8dcc9ce to 12532a1 Compare March 10, 2026 02:10
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Mar 10, 2026

🔒 Aisle Security Analysis

We found 4 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Directory session store can write dotfile session entries that are skipped on load (hidden/unloadable sessions → persistent disk DoS/data loss)
2 🟡 Medium Symlink/path escape in gateway startup session-store auto-migration
3 🔵 Low Symlink-following read of session entry files in directory store (CWE-59)
4 🔵 Low Symlink-based arbitrary file overwrite via non-atomic ENOENT fallback in directory session writer (CWE-59)

1. 🟠 Directory session store can write dotfile session entries that are skipped on load (hidden/unloadable sessions → persistent disk DoS/data loss)

Property Value
Severity High
CWE CWE-400
Location src/config/sessions/store.ts:118-123

Description

The new directory-per-session store introduces a key→filename sanitization edge case that can make sessions permanently invisible/unloadable.

  • sanitizeSessionKey() does not encode leading . characters.
  • writeSessionEntryToDir() writes files as ${sanitizeSessionKey(sessionKey)}.json.
  • loadSessionStoreFromDir() skips dotfiles (fileName.startsWith(".")).

As a result, any session key beginning with . (e.g. .foo, ..foo) will be persisted as a dotfile (e.g. .foo.json) but will never be loaded back into the in-memory store.

Security/impact:

  • DoS / disk exhaustion: remote clients that can control sessionKey (e.g., OpenAI-compat HTTP via x-openclaw-session-key, or /hooks/agent when hooks.allowRequestSessionKey=true) can create many distinct dot-prefixed session keys. These dotfiles are:
    • skipped by the loader, so they are not subject to pruning/capping logic based on the loaded store
    • skipped by full-write stale cleanup (existingFiles/readdir also ignores dotfiles), so they can become effectively undeletable via normal code paths
  • Data loss / bypass of expected persistence: a dot-prefixed session may appear to “not exist” on subsequent reads (new session IDs created, state lost). Migration from sessions.json can also silently migrate dot-prefixed keys into invisible dotfiles.

Vulnerable code:

// load: dotfiles ignored
if (!fileName.endsWith(".json") || fileName.startsWith(".")) {
  continue;
}// write: dot-prefixed keys remain dot-prefixed filenames
const fileName = `${sanitizeSessionKey(sessionKey)}.json`;

Recommendation

Mitigate by ensuring no valid session key can map to a dotfile, or by ensuring dotfiles are not skipped when they are part of the session-store namespace.

Recommended approach (defensive):

  1. Reject or rewrite session keys that would produce a dotfile name.
  2. Normalize + validate at the boundary (HTTP headers, hooks payloads) and before persistence.

Concrete fixes:

  • Encode leading dots in sanitizeSessionKey (or prefix filenames with a constant safe prefix):
export function sanitizeSessionKey(key: string): string {
  const encoded = key
    .replace(/%/g, "%25")
    .replace(/\//g, "%2F")
    .replace(/\\/g, "%5C")
    .replace(/:/g, "%3A");// Prevent dotfiles (also avoid '.' and '..' confusion)
  if (encoded.startsWith(".")) {
    return `%2E${encoded.slice(1)}`; // or `_${encoded}`
  }
  return encoded;
}
  • Or, change directory loading/cleanup to only skip literal . and .. entries (or not skip dotfiles at all inside sessions.d/), and include dotfiles in stale cleanup.

Also add a length guard to prevent filesystem errors (optional but recommended):

  • enforce a max encoded filename length (e.g., 200 bytes) and fall back to hash(key) for the filename while storing the original key inside the JSON payload.

2. 🟡 Symlink/path escape in gateway startup session-store auto-migration

Property Value
Severity Medium
CWE CWE-59
Location src/gateway/server.impl.ts:340-362

Description

The new gateway startup auto-migration scans STATE_DIR/agents/*/sessions and migrates ${sessionsDir}/sessions.json without validating that the target path stays within STATE_DIR or is not reached through symlinks.

If STATE_DIR is placed in a shared/writable location (or otherwise writable by an attacker with lower privileges than the gateway process), an attacker can:

  • Create a real directory STATE_DIR/agents/<attacker>/
  • Create a symlink STATE_DIR/agents/<attacker>/sessions -> /some/other/location
  • Cause the gateway (running as a different/more-privileged user) to run migration against /some/other/location/sessions.json

migrateSessionStoreToDirectory() performs filesystem writes/renames relative to the provided storePath (creates/renames sessions.d, deletes sessions.d.migrating, and renames the legacy JSON to a backup). This can lead to arbitrary file/directory modification outside the state directory (at least DoS; potentially privilege-impacting depending on where the symlink points and gateway privileges).

Vulnerable code (startup migration):

const sessionDirs = await listAgentSessionDirs(STATE_DIR);
for (const sessionsDir of sessionDirs) {
  const storePath = path.join(sessionsDir, "sessions.json");
  await migrateSessionStoreToDirectory(storePath);
}

Related contributor: listAgentSessionDirs() only checks that STATE_DIR/agents/<name> is a directory; it does not validate that <name>/sessions is a real directory (non-symlink) under STATE_DIR.

Additionally, the migration of a custom session.store path uses resolveStorePath() which allows arbitrary absolute paths; if that config can be influenced by untrusted parties, startup will also attempt migration outside STATE_DIR.

Recommendation

Add realpath-based containment checks and reject symlinked session directories before migrating.

Example mitigation in startGatewayServer (or inside listAgentSessionDirs):

import fs from "node:fs/promises";
import path from "node:path";

const stateReal = await fs.realpath(STATE_DIR);

for (const sessionsDir of await listAgentSessionDirs(STATE_DIR)) {// Ensure sessionsDir exists and is a real directory, not a symlink
  const st = await fs.lstat(sessionsDir);
  if (!st.isDirectory() || st.isSymbolicLink()) continue;

  const sessionsReal = await fs.realpath(sessionsDir);
  const rel = path.relative(stateReal, sessionsReal);
  if (rel.startsWith("..") || path.isAbsolute(rel)) continue; // escape attempt

  const storePath = path.join(sessionsReal, "sessions.json");
  await migrateSessionStoreToDirectory(storePath);
}

For session.store (custom paths):

  • Consider restricting to paths within STATE_DIR by default, or require an explicit opt-in flag to migrate external paths.
  • If external paths are allowed, apply the same realpath containment checks against an allowlist (e.g., user’s home or configured data directory).

Also consider reducing log sensitivity by logging paths relative to STATE_DIR (when applicable) rather than absolute paths.


3. 🔵 Symlink-following read of session entry files in directory store (CWE-59)

Property Value
Severity Low
CWE CWE-59
Location src/config/sessions/store.ts:92-106

Description

loadSessionEntryFromDir() reads ${storeDir}/${fileName} with fs.readFileSync without checking for symlinks or file type/size.

If an attacker can create or replace files inside sessions.d/ (e.g., shared/mispermissioned STATE_DIR, or state dir on a writable volume), they can:

  • Point *.json at an arbitrary target via symlink and cause the process to read it (and potentially parse it as JSON)
  • Cause a local denial-of-service by linking to special files or very large files (e.g., device nodes) which readFileSync will try to fully read into memory

Vulnerable code:

const filePath = path.join(storeDir, fileName);
const raw = fs.readFileSync(filePath, "utf-8");

Recommendation

Harden directory-store file access against symlink/device-file abuse:

  1. Use lstat to reject symlinks and non-regular files before reading.
  2. Enforce a maximum file size for session entry JSON.
  3. (Preferred on POSIX) open the file with O_NOFOLLOW.

Example:

import { constants as FS } from "node:fs";

async function safeReadSessionFile(filePath: string, maxBytes = 1024 * 1024) {
  const st = await fs.promises.lstat(filePath);
  if (st.isSymbolicLink() || !st.isFile()) {
    throw new Error("refusing to read non-regular session entry file");
  }
  if (st.size > maxBytes) {
    throw new Error("session entry too large");
  }
  const fh = await fs.promises.open(filePath, FS.O_RDONLY | (FS.O_NOFOLLOW ?? 0));
  try {
    return await fh.readFile({ encoding: "utf-8" });
  } finally {
    await fh.close();
  }
}

Also ensure the session state directory is created with restrictive permissions (e.g., 0700) to prevent untrusted local users from modifying sessions.d.


4. 🔵 Symlink-based arbitrary file overwrite via non-atomic ENOENT fallback in directory session writer (CWE-59)

Property Value
Severity Low
CWE CWE-59
Location src/config/sessions/store.ts:148-156

Description

writeSessionEntryToDir() normally writes to a random temp file and renames it into place, which avoids following destination symlinks (the symlink itself would be replaced).

However, on ENOENT it falls back to a direct write to the final filePath:

  • The direct writeFile(filePath, ...) will follow a pre-existing symlink at filePath and truncate/write the symlink target.
  • mode: 0o600 is only applied on creation; if the attacker pre-creates filePath (or swaps it during the mkdir→write race) with permissive permissions, the fallback path will not restore 0600.
  • An attacker who can manipulate the session store directory (e.g., shared/overly-permissive STATE_DIR, or writable sessions.d/) can race-delete the directory to trigger this fallback, then create filePath as a symlink to an arbitrary target file writable by the gateway process, resulting in arbitrary file overwrite as the gateway user.

Vulnerable code:

if (getErrorCode(err) === "ENOENT") {
  ...
  await fs.promises.mkdir(storeDir, { recursive: true });
  await fs.promises.writeFile(filePath, json, { mode: 0o600, encoding: "utf-8" });
}

Recommendation

Avoid direct writes to the final path; keep the temp+rename pattern even on ENOENT, and add link/file-type checks.

Recommended changes:

  1. On ENOENT, recreate storeDir and retry the same temp-file + rename sequence (to ensure the final path is replaced, not followed).
  2. Before renaming, ensure storeDir is a real directory (not a symlink) via lstat and reject otherwise.
  3. Optionally open destination using safer primitives:
    • POSIX: open(tmp, O_CREAT|O_EXCL, 0o600) to avoid clobbering pre-existing tmp
    • POSIX: use O_NOFOLLOW when opening existing files if you ever open the final path directly

Example (retry preserving atomic semantics):

async function writeSessionEntryToDir(storeDir: string, sessionKey: string, entry: SessionEntry) {
  const fileName = `${sanitizeSessionKey(sessionKey)}.json`;
  const filePath = path.join(storeDir, fileName);
  const json = JSON.stringify(entry, null, 2);

  for (let attempt = 0; attempt < 2; attempt++) {
    await fs.promises.mkdir(storeDir, { recursive: true });
    const tmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`;
    try {
      await fs.promises.writeFile(tmp, json, { mode: 0o600, encoding: "utf-8", flag: "wx" });
      await fs.promises.rename(tmp, filePath);
      await fs.promises.chmod(filePath, 0o600).catch(() => undefined);
      return;
    } catch (e) {
      if (getErrorCode(e) === "ENOENT" && attempt === 0) continue;
      throw e;
    } finally {
      await fs.promises.rm(tmp, { force: true }).catch(() => undefined);
    }
  }
}

Also ensure the state directory and sessions.d/ are created with restrictive permissions (e.g., 0700) to prevent untrusted local users from creating/removing files inside it.


Analyzed PR: #39991 at commit 61f5ebc

Last updated on: 2026-03-10T14:39:02Z

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12532a13cf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/config/sessions/store.ts Outdated
@Diaspar4u
Copy link
Copy Markdown
Contributor Author

Security-model assessment per SECURITY.md: OpenClaw runs as a trusted-operator gateway, not an adversarial multi-tenant service. The operator controls the filesystem, config, and state directory.

  • Prototype pollution via session-key filenames: Session keys originate from authenticated Gateway callers who are treated as trusted operators (SECURITY.md). On-disk files in sessions.d/ are under STATE_DIR, which is trusted local state. An attacker who can write files to STATE_DIR has already crossed the trust boundary. That said, using Object.create(null) is good practice and worth adopting as defense-in-depth.

  • Symlink traversal in session-store auto-migration: Exploitability requires write access to STATE_DIR to plant symlinks. SECURITY.md Out of Scope: "Reports that require write access to trusted local state (~/.openclaw)." The operator already has full host access.

  • Case-insensitive filesystem collisions: Session keys are already normalized to lowercase via normalizeStoreSessionKey() at all ingress points. The finding describes a theoretical mismatch that does not occur with actual session key generation paths. Operational hardening, not a boundary bypass.

  • Insecure default permissions for sessions.d/: The recommended deployment is one user per host/gateway. On single-user systems the umask-default permissions do not cross a trust boundary. Adding explicit mode: 0o700 is reasonable defense-in-depth for shared-host edge cases, but per SECURITY.md, multi-user shared gateways are not a supported security boundary.

@Diaspar4u Diaspar4u force-pushed the fix/session-store-directory branch from d218ab6 to 61f5ebc Compare March 10, 2026 13:31
@Diaspar4u
Copy link
Copy Markdown
Contributor Author

Closing to resubmit with a clean commit history and updated PR description. New PR incoming from the same branch.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61f5ebc536

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +154 to +156
} catch {
// Ignore
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-throw fallback write failures for session entry writes

The ENOENT recovery path swallows every error from the retry write, so callers can get a successful save even when the fallback fails with non-ENOENT errors (for example EACCES/ENOSPC), leaving the session entry unwritten and causing silent state loss in directory mode. This should only suppress "file disappeared" cases and propagate other retry failures.

Useful? React with 👍 / 👎.

// Fallback: migrate a custom session.store config path (may resolve outside the default agent dirs).
if (configSnapshot.config.session?.store) {
try {
const configStorePath = resolveStorePath(configSnapshot.config.session.store);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Migrate templated custom store path for each agent

This fallback migrates session.store only once without an agentId, so a templated path like /custom/{agentId}/sessions.json outside STATE_DIR migrates only the main-agent store and leaves other agent stores in legacy JSON mode. Fresh evidence: resolveStorePath expands {agentId} using the default agent when options are omitted (src/config/sessions/paths.ts), and startup calls it here without passing an agent id.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session store monolithic JSON with global lock causes cross-session contention

1 participant