Skip to content

feat(sessions): directory-per-session store#42524

Closed
Diaspar4u wants to merge 1 commit intoopenclaw:mainfrom
Diaspar4u:fix/session-store-directory
Closed

feat(sessions): directory-per-session store#42524
Diaspar4u wants to merge 1 commit intoopenclaw:mainfrom
Diaspar4u:fix/session-store-directory

Conversation

@Diaspar4u
Copy link
Copy Markdown
Contributor

Summary

  • Problem: The monolithic sessions.json file uses a global write lock, causing cross-session write contention on multi-agent gateways. Every session update serializes behind a single lock even when sessions are independent.
  • Why it matters: Multi-agent setups (3-10+ agents, dozens of concurrent sessions) experience write latency spikes and lock convoy effects during bursts of activity.
  • What changed: New sessions.d/ directory layout stores one JSON file per session key (percent-encoded filenames). Atomic writes via tmp+rename, diff-based updates using the existing stableStringify utility. One-time startup migration with staging-dir crash safety. Gateway auto-migrates all agent session directories on boot.
  • What did NOT change: Session store public API (loadSessionStore, updateSessionStore, updateSessionStoreEntry, etc.), in-memory caching behavior, lock serialization semantics (global lock still used for multi-key mutators).

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • On first gateway startup after upgrade, each agent's sessions.json is automatically migrated to a sessions.d/ directory (one file per session). The original JSON is backed up as sessions.json.bak.<timestamp>.
  • Fresh installs start in legacy JSON mode until the first gateway boot triggers migration.
  • No config changes required. No user action needed.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • Session files written with mode 0o600 (owner-only read/write). Session key filenames are percent-encoded to prevent path traversal via colons, slashes, or backslashes.

Repro + Verification

Environment

  • OS: macOS (also tested path logic for Windows atomic retry)
  • Runtime: Node 22+
  • Model/provider: N/A (storage layer)

Steps

  1. Start gateway with existing sessions.json containing multiple agent sessions
  2. Gateway auto-migrates to sessions.d/ on startup
  3. Concurrent session writes to different agents no longer block each other

Expected

  • sessions.d/ directory created with one .json file per session key
  • sessions.json backed up and removed
  • All session data preserved and accessible via existing APIs

Actual

  • Works as expected

Evidence

  • 35 new tests covering: key sanitization/round-trip, migration (idempotent, merge, case-variant dedup, staging-dir crash cleanup), CRUD operations, concurrent writes to different sessions, lock namespace serialization, readSessionUpdatedAt optimization, fresh install behavior, multi-agent migration via listAgentSessionDirs
  • pnpm build clean
  • pnpm check clean
  • pnpm test green

Human Verification (required)

  • Verified: migration of real multi-agent session store (3 agents, 40+ sessions), post-migration CRUD, concurrent writes across agents, gateway restart after migration
  • Edge cases: empty JSON, missing JSON, stale staging dir from crashed migration, case-variant key dedup, merge into existing directory, path traversal characters in keys
  • Not verified: Windows atomic write retry path (tested via code review only; existing Windows retry logic preserved unchanged)

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes - loadSessionStore transparently reads from directory or legacy JSON
  • Config/env changes? No
  • Migration needed? Yes - automatic on gateway startup, no user action required
  • Migration steps: Gateway calls migrateSessionStoreToDirectory for every agent session directory found on disk. Uses staging-dir approach: writes to sessions.d.migrating/, atomically renames to sessions.d/, backs up original JSON. If directory already exists (partial previous migration), new entries are merged without clobbering existing ones.

Failure Recovery (if this breaks)

  • How to disable/revert: Remove the sessions.d/ directory and restore from sessions.json.bak.<timestamp> (rename back to sessions.json). Gateway falls back to legacy JSON mode when no sessions.d/ exists.
  • Known bad symptoms: missing session data after migration (check backup file), permission errors on sessions.d/ files

Risks and Mitigations

  • Risk: Filesystem with many sessions (1000+) could slow directory reads
    • Mitigation: In-memory cache with TTL (existing mechanism) means directory scans happen infrequently. Individual entry reads (readSessionUpdatedAt) skip full directory scan entirely.
  • Risk: Interrupted migration leaves inconsistent state
    • Mitigation: Staging-dir approach (sessions.d.migrating/ renamed atomically to sessions.d/). Stale staging dirs are cleaned up on retry. Merge mode (dir already exists) writes directly without staging.

This PR was developed with AI assistance. I have reviewed all changes, verified the test suite, and validated migration behavior on a real multi-agent gateway.

Replace monolithic sessions.json with sessions.d/ directory layout where
each session is a separate file. Atomic writes via tmp+rename, diff-based
updates to minimize I/O, and one-time startup migration with staging-dir
crash safety. Use key-order-independent comparison (stableStringify) in
computeStoreDiff to prevent unnecessary writes from property ordering.
@openclaw-barnacle openclaw-barnacle Bot added the gateway Gateway runtime label Mar 10, 2026
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Mar 10, 2026

🔒 Aisle Security Analysis

We found 5 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Directory-mode session store cache is TTL-only, enabling cross-process stale reads of routing/security policy
2 🟠 High Potential CPU/memory DoS in directory session-store diffing via stableStringify + structuredClone
3 🟡 Medium Symlink traversal/arbitrary file write via directory-based session store
4 🔵 Low Startup session-store auto-migration can follow symlinks and operate on unintended paths
5 🔵 Low Prototype Pollution via session filename keys in directory-based session store

1. 🟠 Directory-mode session store cache is TTL-only, enabling cross-process stale reads of routing/security policy

Property Value
Severity High
CWE CWE-362
Location src/config/sessions/store.ts:459-479

Description

The directory-per-session session store (sessions.d/) changed cache coherence semantics in loadSessionStore().

In directory mode, cache validation intentionally skips mtime/size checks and relies only on a TTL:

  • The session store cache is per-process in-memory (Map in store-cache.ts)
  • invalidateSessionStoreCache(storePath) is only called in the writer process before writeSessionStoreDir()
  • Other processes will continue to serve cached session entries until TTL expiry (default 45 seconds)

This creates a security-relevant stale-read window in multi-process deployments sharing the same STATE_DIR:

  • Misrouting / information disclosure: updateLastRoute() persists deliveryContext/lastTo/lastAccountId used to decide where replies are sent. If process A updates routing but process B serves cached data, replies can be delivered to an old recipient/channel.
  • Policy bypass: session entries also include enforcement-relevant fields like sendPolicy and execSecurity. Revoking/tightening these in one process may not take effect in other processes for up to the TTL.

Vulnerable code (directory mode cache validation is TTL-only):

if (useDirectory) {
  const cached = readSessionStoreCache({
    storePath,
    ttlMs: getSessionStoreTtl(),
    mtimeMs: undefined,
    sizeBytes: undefined,
  });
  if (cached) {
    return cached;
  }
}

Because the prior JSON store used file mtimeMs/sizeBytes checks, this is a regression in cross-process cache coherence specifically for directory mode.

Recommendation

Make directory-mode cache coherence observable across processes.

Recommended options (pick one):

  1. Version/stamp file (strongly recommended)
    • Maintain a single file such as sessions.d/.version (or .stamp) whose contents is a monotonic counter/timestamp.
    • Update it atomically on every successful directory write under the same lock.
    • Use stat(mtimeMs,size) (or the file contents) of this version file in readSessionStoreCache params, restoring cross-process invalidation.

Example:

// On write (after all entry writes succeed)
const versionPath = path.join(resolveSessionStoreDir(storePath), ".version");
await fs.promises.writeFile(versionPath + ".tmp", String(Date.now()), { mode: 0o600 });
await fs.promises.rename(versionPath + ".tmp", versionPath);// On read
const versionStat = getFileStatSnapshot(versionPath);
const cached = readSessionStoreCache({
  storePath,
  ttlMs: getSessionStoreTtl(),
  mtimeMs: versionStat?.mtimeMs,
  sizeBytes: versionStat?.sizeBytes,
});
  1. Disable caching for directory mode by default (or set a very small default TTL), unless a robust versioning signal is implemented.

  2. Compute a cheap directory fingerprint (e.g., hash of sorted filenames + per-file mtime/size) and include it in the cache key/validation (more expensive than a stamp file).

Also consider documenting that multi-process deployments require the version file approach to avoid misdelivery and stale policy enforcement.


2. 🟠 Potential CPU/memory DoS in directory session-store diffing via stableStringify + structuredClone

Property Value
Severity High
CWE CWE-400
Location src/config/sessions/store.ts:186-198

Description

The new directory-per-session store writes only changed entries by taking a full pre-mutation snapshot (structuredClone(store)) and then computing a diff by deep stable-serializing every session entry.

This is security-relevant because it enables uncontrolled resource consumption / crashy behavior on common update paths (e.g., inbound messages updating last route):

  • CPU amplification: computeStoreDiff() iterates over all session keys and performs stableStringify(prev) and stableStringify(curr) per key. If the store contains up to the default 500 entries and/or entries contain large nested structures (e.g., systemPromptReport, skillsSnapshot), each update can become expensive.
  • Memory amplification: directory-mode writers additionally do previousSnapshot = structuredClone(store) before mutating, duplicating the entire in-memory store on every update.
  • Crash risk on unexpected shapes: stableStringify (see src/agents/stable-stringify.ts) has no cycle detection and no depth/size limits; if any entry ever contains a circular structure (or extremely deep nesting), diffing can throw/stack-overflow and prevent persistence, causing a denial-of-service.

Vulnerable code:

for (const key of Object.keys(current)) {
  const prev = previous[key];
  const curr = current[key];
  if (!prev || stableStringify(prev) !== stableStringify(curr)) {
    changed.push(key);
  }
}

This diff is triggered by directory-mode writers that snapshot the entire store:

const previousSnapshot = useDirectory ? structuredClone(store) : undefined;

Recommendation

Avoid deep full-store snapshot+diff on hot paths.

Recommended mitigations (choose one or combine):

  1. Track touched keys instead of diffing everything

    • For updateLastRoute / updateSessionStoreEntry, you already know the target session key. Pass that key to the save layer and write only that entry.
    • For generic updateSessionStore, have the mutator return { result, touchedKeys } (or wrap store in a Proxy to record writes).
  2. Replace deep stableStringify diff with a cheap, bounded comparison

    • Prefer comparing a small set of fields (e.g., updatedAt plus a version counter) or maintain an explicit per-entry etag/revision that changes on mutation.
  3. If you must serialize for comparison, make it safe and bounded

    • Implement cycle detection and maximum depth/size in stableStringify, and treat unsupported values explicitly.

Example: write only the mutated key for updateSessionStoreEntry:

// after resolving the normalized key
await saveSessionStoreUnlocked(storePath, store, opts, {
  changedKeys: [resolved.normalizedKey],
  removedKeys: resolved.legacyKeys,
});

And guard diffing as a fallback:

let previousSnapshot: Record<string, SessionEntry> | undefined;
try {
  previousSnapshot = useDirectory ? structuredClone(store) : undefined;
} catch {
  previousSnapshot = undefined; // fall back to full write
}

3. 🟡 Symlink traversal/arbitrary file write via directory-based session store

Property Value
Severity Medium
CWE CWE-59
Location src/config/sessions/store.ts:138-148

Description

The new directory-per-session store will follow symlinks for the store directory path (sessions.d) and then perform writes/renames/chmod operations inside that resolved target.

If an attacker can manipulate storeDir (or any parent component) — e.g., because session.store is configured to a path in a world-writable directory like /tmp, or because the state directory is group/world-writable — they can replace sessions.d with a symlink to an arbitrary directory. Subsequent session updates will then:

  • Write attacker-influenced files outside the intended state directory via writeFile(tmp) and rename(tmp, filePath).
  • Potentially change permissions of an attacker-chosen target due to a TOCTOU window between rename() and chmod(), where chmod() follows symlinks on POSIX.
  • Delete attacker-chosen files via unlink() in deleteSessionEntryFromDir if storeDir is symlinked.

Vulnerable code (directory store write path):

const filePath = path.join(storeDir, fileName);
const tmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`;
await fs.promises.mkdir(storeDir, { recursive: true });
await fs.promises.writeFile(tmp, json, { mode: 0o600, encoding: "utf-8" });
await fs.promises.rename(tmp, filePath);
await fs.promises.chmod(filePath, 0o600)

This is exploitable because the code never verifies storeDir is a real directory (not a symlink) before writing, and fs.statSync() in isDirectoryStore() follows symlinks, enabling directory-mode activation even when sessions.d is a symlink.

Recommendation

Harden directory-store filesystem operations against symlink manipulation:

  1. Refuse symlinked store directories (and ideally ensure the parent directory is not symlinked/world-writable).
  2. Remove the post-rename chmod() (it’s not needed to restrict permissions because 0o600 is already applied at create-time and umask can only make permissions more restrictive). If keeping it, lstat() the path immediately before chmod and abort if it’s a symlink.
  3. Use safe creation flags for the temp file (defense-in-depth): O_EXCL and (where available) O_NOFOLLOW.

Example hardened approach:

import { constants as C } from "node:fs";

async function assertRealDir(p: string) {
  const st = await fs.promises.lstat(p).catch(() => null);
  if (st && st.isSymbolicLink()) {
    throw new Error(`Refusing to use symlinked session store dir: ${p}`);
  }
  if (st && !st.isDirectory()) {
    throw new Error(`Session store dir is not a directory: ${p}`);
  }
}

await fs.promises.mkdir(storeDir, { recursive: true, mode: 0o700 });
await assertRealDir(storeDir);

const fh = await fs.promises.open(
  tmp,
  C.O_WRONLY | C.O_CREAT | C.O_EXCL | (C.O_NOFOLLOW ?? 0),
  0o600,
);
try {
  await fh.writeFile(json, "utf-8");
} finally {
  await fh.close();
}
await fs.promises.rename(tmp, filePath);// omit chmod() (or verify with lstat before chmod)

Additionally, consider enforcing safe permissions/ownership for STATE_DIR (and rejecting session.store paths under world-writable directories) rather than only auditing/warning.


4. 🔵 Startup session-store auto-migration can follow symlinks and operate on unintended paths

Property Value
Severity Low
CWE CWE-59
Location src/gateway/server.impl.ts:340-361

Description

The new gateway startup auto-migration logic constructs storePath values from on-disk directory names and from session.store config, and passes them directly into migrateSessionStoreToDirectory() without validating the resolved (real) path.

This is risky because migrateSessionStoreToDirectory(storePath) performs filesystem mutations (mkdir/rm/rename) relative to storePath (e.g., creating sessions.d(.migrating) alongside it and renaming sessions.json to a backup). If an attacker can place a symlink inside the scanned state directory (e.g., .../agents/<id>/sessions -> /etc), the code will follow that symlink and migrate/rename files outside STATE_DIR.

Impact depends on the deployment’s filesystem permissions, but in a misconfigured/shared STATE_DIR scenario this can lead to arbitrary file rename and directory creation/deletion as the gateway user.

Vulnerable code (startup migration calls):

const sessionDirs = await listAgentSessionDirs(STATE_DIR);
for (const sessionsDir of sessionDirs) {
  const storePath = path.join(sessionsDir, "sessions.json");
  await migrateSessionStoreToDirectory(storePath);
}

if (configSnapshot.config.session?.store) {
  const configStorePath = resolveStorePath(configSnapshot.config.session.store);
  await migrateSessionStoreToDirectory(configStorePath);
}

Why this is exploitable:

  • listAgentSessionDirs() returns .../agents/<dir>/sessions without resolving or rejecting symlinks.
  • path.join() does not protect against filesystem symlink traversal.
  • migrateSessionStoreToDirectory() (in src/config/sessions/store.ts) performs rm(..., {recursive:true}) and rename(...) based on storePath and its parent directory, so if any path segment is a symlink, operations occur outside the intended sandbox.

Recommendation

Add path sandboxing and symlink defenses before migrating:

  1. Reject symlinked session directories and enforce containment via realpath:
import fs from "node:fs/promises";
import path from "node:path";
import { isPathWithin } from "../commands/cleanup-utils.js";

const agentsRoot = path.join(STATE_DIR, "agents");

for (const sessionsDir of await listAgentSessionDirs(STATE_DIR)) {// Ensure sessionsDir exists and is not a symlink
  const st = await fs.lstat(sessionsDir).catch(() => null);
  if (!st?.isDirectory() || st.isSymbolicLink()) continue;

  const realSessionsDir = await fs.realpath(sessionsDir).catch(() => null);
  if (!realSessionsDir || !isPathWithin(realSessionsDir, agentsRoot)) continue;

  const storePath = path.join(realSessionsDir, "sessions.json");
  await migrateSessionStoreToDirectory(storePath);
}
  1. For session.store, either:
  • Restrict to within STATE_DIR by default, or
  • Require an explicit opt-in flag (env/config) to allow migrating paths outside STATE_DIR.
  1. Optionally harden migrateSessionStoreToDirectory() itself by validating that storePath is not a symlink (lstat) and that resolveSessionStoreDir(storePath) remains within an allowed base directory.

5. 🔵 Prototype Pollution via session filename keys in directory-based session store

Property Value
Severity Low
CWE CWE-1321
Location src/config/sessions/store.ts:111-128

Description

The new directory-per-session store loader builds a plain object map and assigns entries using a key derived from on-disk filenames:

  • sessionKey is derived from fs.readdirSync(storeDir) filenames (after desanitizeSessionKey())
  • The code assigns into a normal object: store[sessionKey] = entry
  • If a crafted file named __proto__.json (or similar special key) exists in sessions.d/, then store["__proto__"] = entry will invoke the magic __proto__ setter and mutate the prototype of store (prototype pollution / object injection)
  • loadSessionEntryFromDir() does not validate the parsed JSON shape, so the attacker-controlled object used as the new prototype can contain arbitrary properties that will be visible via prototype inheritance during later store[someKey] lookups

Vulnerable code:

const store: Record<string, SessionEntry> = {};
...
const sessionKey = desanitizeSessionKey(fileName.slice(0, -5));
...
store[sessionKey] = entry;

This risk is increased by the directory store layout because filenames are now directly interpreted as object keys and assigned into a plain object. A malicious/poisoned on-disk sessions.d/__proto__.json can trigger prototype mutation during load.

Recommendation

Use a prototype-less dictionary (or Map) and explicitly block magic keys.

Option A: prototype-less object + key guard

import { isBlockedObjectKey } from "../../infra/prototype-keys.js";

function loadSessionStoreFromDir(storeDir: string): Record<string, SessionEntry> {
  const store: Record<string, SessionEntry> = Object.create(null);
  ...
  const sessionKey = desanitizeSessionKey(fileName.slice(0, -5));
  if (isBlockedObjectKey(sessionKey)) {
    continue; // or log + skip
  }
  ...
  store[sessionKey] = entry;
  return store;
}

Also apply the same isBlockedObjectKey() check in write/delete paths (writeSessionEntryToDir, deleteSessionEntryFromDir) and in migration from legacy JSON to prevent creating dangerous filenames.

Option B: use Map<string, SessionEntry>

Prefer Map for key/value stores to avoid special property names entirely.


Analyzed PR: #42524 at commit 196d3fa

Last updated on: 2026-03-10T23:45:53Z

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 10, 2026

Greptile Summary

This PR introduces a directory-per-session store (sessions.d/) to replace the monolithic sessions.json, eliminating global write-lock contention for independent sessions in multi-agent setups. The design is solid: atomic tmp+rename writes, percent-encoded filenames for path-traversal safety, diff-based updates, and a staging-directory migration approach for crash safety. The public session store API is unchanged and the migration is fully backward-compatible.

Issues found:

  • Silent write failure in ENOENT retry path (store.ts:149-163): When writeSessionEntryToDir's ENOENT retry fails (inner catch), the error is silently swallowed and the function returns successfully. This can cause session data to be lost without any log entry or exception propagated to the caller — the most significant correctness issue in this PR.

  • Corrupted sessions.json causes silent migration skip (store.ts:233-235): A sessions.json that fails JSON.parse causes migrateSessionStoreToDirectory to return false with no warning logged. From the gateway's perspective this is indistinguishable from "no file to migrate," making post-incident diagnosis difficult.

  • All-falsy entries bypass the keys.length === 0 guard (store.ts:237-262): If sessions.json has top-level keys but all values are null/falsy, deduped ends up empty after the deduplication loop, yet migration proceeds to create an empty sessions.d/ and back up/remove the original JSON — effectively making the store appear empty on next load.

Confidence Score: 3/5

  • The PR is mostly safe to merge but contains a silent-failure path in the atomic write retry that can cause session data loss without any diagnostic signal.
  • The overall architecture is well-designed and the test coverage is thorough. However, the ENOENT retry in writeSessionEntryToDir swallows failures silently, and the migration function has two edge cases (corrupted JSON, all-null entries) that can either leave the system in an unexpectedly empty state or make diagnosis difficult. These are not catastrophic but are real correctness concerns in a write path for persistent data.
  • Pay close attention to src/config/sessions/store.ts, specifically the writeSessionEntryToDir ENOENT retry block (lines 149-163) and the migrateSessionStoreToDirectory error handling (lines 233-262).

Last reviewed commit: 196d3fa

Comment on lines +149 to +163
} catch (err) {
if (getErrorCode(err) === "ENOENT") {
// Parent dir may have been removed (e.g. in tests). Best-effort atomic retry.
let retryTmp: string | undefined;
try {
await fs.promises.mkdir(storeDir, { recursive: true });
retryTmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`;
await fs.promises.writeFile(retryTmp, json, { mode: 0o600, encoding: "utf-8" });
await fs.promises.rename(retryTmp, filePath);
} catch {
if (retryTmp) {
await fs.promises.rm(retryTmp, { force: true }).catch(() => undefined);
}
}
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent write failure swallowed in ENOENT retry path

If the ENOENT retry itself fails (inner catch at line 158), the error is silently swallowed and the function executes the return on line 163 — meaning the session entry write is silently lost with no log, no exception, and no indication to the caller. This can cause real data loss if, for example, a rename fails due to a permissions issue or disk-full condition during the retry.

The inner catch should at minimum log the error, and ideally re-throw so the outer lock/retry mechanism can surface it:

  } catch (err) {
    if (getErrorCode(err) === "ENOENT") {
      let retryTmp: string | undefined;
      try {
        await fs.promises.mkdir(storeDir, { recursive: true });
        retryTmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`;
        await fs.promises.writeFile(retryTmp, json, { mode: 0o600, encoding: "utf-8" });
        await fs.promises.rename(retryTmp, filePath);
      } catch (retryErr) {
        if (retryTmp) {
          await fs.promises.rm(retryTmp, { force: true }).catch(() => undefined);
        }
        log.warn(`writeSessionEntryToDir: ENOENT retry also failed for ${filePath}: ${String(retryErr)}`);
        throw retryErr;
      }
      return;
    }
    throw err;
  }
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/sessions/store.ts
Line: 149-163

Comment:
**Silent write failure swallowed in ENOENT retry path**

If the ENOENT retry itself fails (inner `catch` at line 158), the error is silently swallowed and the function executes the `return` on line 163 — meaning the session entry write is **silently lost** with no log, no exception, and no indication to the caller. This can cause real data loss if, for example, a `rename` fails due to a permissions issue or disk-full condition during the retry.

The inner `catch` should at minimum log the error, and ideally re-throw so the outer lock/retry mechanism can surface it:

```
  } catch (err) {
    if (getErrorCode(err) === "ENOENT") {
      let retryTmp: string | undefined;
      try {
        await fs.promises.mkdir(storeDir, { recursive: true });
        retryTmp = `${filePath}.${process.pid}.${crypto.randomUUID()}.tmp`;
        await fs.promises.writeFile(retryTmp, json, { mode: 0o600, encoding: "utf-8" });
        await fs.promises.rename(retryTmp, filePath);
      } catch (retryErr) {
        if (retryTmp) {
          await fs.promises.rm(retryTmp, { force: true }).catch(() => undefined);
        }
        log.warn(`writeSessionEntryToDir: ENOENT retry also failed for ${filePath}: ${String(retryErr)}`);
        throw retryErr;
      }
      return;
    }
    throw err;
  }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +233 to +235
} catch {
return false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent return on corrupted sessions.json makes failures hard to diagnose

If sessions.json exists but is unparseable (e.g., truncated due to a previous crash), the catch silently returns false. From the gateway startup perspective, this looks identical to "no file to migrate" — no warning is emitted, the file is left on disk, and the gateway falls back to broken legacy JSON mode without any indication of the problem.

A log.warn here would make post-incident debugging significantly easier:

Suggested change
} catch {
return false;
}
} catch (err) {
log.warn("migrateSessionStoreToDirectory: failed to parse legacy sessions.json, skipping migration", {
storePath,
err: String(err),
});
return false;
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/sessions/store.ts
Line: 233-235

Comment:
**Silent return on corrupted `sessions.json` makes failures hard to diagnose**

If `sessions.json` exists but is unparseable (e.g., truncated due to a previous crash), the `catch` silently returns `false`. From the gateway startup perspective, this looks identical to "no file to migrate" — no warning is emitted, the file is left on disk, and the gateway falls back to broken legacy JSON mode without any indication of the problem.

A `log.warn` here would make post-incident debugging significantly easier:

```suggestion
    } catch (err) {
      log.warn("migrateSessionStoreToDirectory: failed to parse legacy sessions.json, skipping migration", {
        storePath,
        err: String(err),
      });
      return false;
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +237 to +262
const keys = Object.keys(store);
if (keys.length === 0) {
return false;
}

const storeDir = resolveSessionStoreDir(storePath);
const dirAlreadyExists = isDirectoryStore(storePath);
log.info("migrating session store from JSON to directory layout", {
entries: keys.length,
storeDir,
merge: dirAlreadyExists,
});

// Deduplicate case-variant keys: multiple legacy keys may normalize to the
// same key (e.g. "Foo" and "foo"). Keep the entry with the newest updatedAt.
const deduped = new Map<string, SessionEntry>();
for (const [key, entry] of Object.entries(store)) {
if (!entry) {
continue;
}
const normalizedKey = normalizeStoreSessionKey(key);
const prev = deduped.get(normalizedKey);
if (!prev || (entry.updatedAt ?? 0) > (prev.updatedAt ?? 0)) {
deduped.set(normalizedKey, entry);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration proceeds and destroys sessions.json even when all entries are skipped

If sessions.json contains keys but all their values are falsy (e.g., {"key": null}), keys.length > 0 passes the guard on line 238, but the deduplication loop skips every entry due to if (!entry) continue, leaving deduped empty with deduped.size === 0.

The code then proceeds to:

  1. Create the staging directory with zero files
  2. Atomically rename it to sessions.d/
  3. Back up and delete sessions.json

The net result is the gateway loads an empty store — all session data appears lost (though it's in the .bak file). A guard after the dedup loop would prevent this:

    if (deduped.size === 0) {
      log.warn("migrateSessionStoreToDirectory: all entries were falsy/invalid, skipping migration", { storePath });
      return false;
    }
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/sessions/store.ts
Line: 237-262

Comment:
**Migration proceeds and destroys `sessions.json` even when all entries are skipped**

If `sessions.json` contains keys but all their values are falsy (e.g., `{"key": null}`), `keys.length > 0` passes the guard on line 238, but the deduplication loop skips every entry due to `if (!entry) continue`, leaving `deduped` empty with `deduped.size === 0`.

The code then proceeds to:
1. Create the staging directory with zero files
2. Atomically rename it to `sessions.d/`
3. Back up and delete `sessions.json`

The net result is the gateway loads an empty store — all session data appears lost (though it's in the `.bak` file). A guard after the dedup loop would prevent this:

```
    if (deduped.size === 0) {
      log.warn("migrateSessionStoreToDirectory: all entries were falsy/invalid, skipping migration", { storePath });
      return false;
    }
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 196d3fa176

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// Fallback: migrate a custom session.store config path (may resolve outside the default agent dirs).
if (configSnapshot.config.session?.store) {
try {
const configStorePath = resolveStorePath(configSnapshot.config.session.store);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Migrate templated session.store paths for every agent

This fallback only resolves config.session.store once, and resolveStorePath defaults {agentId} to main when no agentId is passed. In setups that use a custom templated store path outside STATE_DIR/agents (for example .../{agentId}/sessions.json), non-main agent stores are never migrated, so those agents keep running on legacy sessions.json instead of the new directory layout after upgrade.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session store monolithic JSON with global lock causes cross-session contention

1 participant