fix(models-config): fail closed on oversized fingerprint reads (review feedback)

zeroaltitude · zeroaltitude · commit 057024fa464c · 2026-05-05T16:33:53.000-07:00
Codex P2 / Aisle medium #1 on PR #73260 flagged that `safeHashRegularFile` returned a deterministic `oversize:${lst.size}` sentinel when a file exceeded the cap, so an attacker could swap the *contents* of an oversized `auth-profiles.json` or `models.json` without changing its byte length and still hit the readyCache via the size-only sentinel comparison (CWE-345). Switch to fail-closed: - `safeHashRegularFile` now returns `null` for oversize at lstat time and for grow-past-cap mid-read (the latter was already caught via the destroy-with-error path; the JSDoc/comment now states that explicitly). - The return type narrows from `{ hash; raw: Buffer | null } | null` to `{ hash; raw: Buffer } | null` — the `raw === null` branch in `readAuthProfilesStableHash` is now unreachable and removed. - JSDoc on the helper and `readAuthProfilesStableHash` now describes the fail-closed semantics and links them back to the threat model. Also: - Add a regression test that warms the cache with a small, hashable `auth-profiles.json`, then grows it past `MAX_AUTH_PROFILES_BYTES` (8 MiB) and asserts the implicit-provider-discovery pipeline re-runs (i.e. the fingerprint changed instead of stably hitting an `oversize:<size>` sentinel). - Add the missing `Unreleased` CHANGELOG entry covering the content-hashed fingerprint, post-write models.json drift hash, the oversize fail-closed change, and the existing `O_NOFOLLOW`/streaming/ prototype-pollution hardening (Codex P3 on the same review). - Backfill the new `resolveProviderEnvAuthEvidence`/ `listProviderEnvAuthLookupKeys`/`resolveProviderEnvAuthLookupKeys` exports in the fingerprint test's `model-auth-env-vars.js` mock so the suite runs at all after the recent `origin/main` merge added those call sites in `model-auth-env.ts` / provider secret-helpers. Validation (from ~/projects/worktrees/openclaw/perf-models-config-cache-fingerprint): pnpm vitest run src/agents/models-config.fingerprint-cache.test.ts Test Files 2 passed (2) Tests 12 passed (12) pnpm tsgo:core # 0 errors pnpm tsgo:core:test # 0 errors pnpm oxlint src/agents/models-config.ts \ src/agents/models-config.fingerprint-cache.test.ts Found 0 warnings and 0 errors. git diff --check # clean Beads: openclaw-9i0
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -40,6 +40,7 @@ Docs: https://docs.openclaw.ai
 - Plugins/update: make package upgrades swap pnpm/npm-prefix installs cleanly, keep legacy plugin install runtime chunks working, and on the beta channel fall back default-line npm plugins to default/latest when plugin beta releases are missing or fail install validation. Thanks @vincentkoc and @joshavant.
 - Plugins/active-memory: skip session-store channel entries that contain `:` when resolving the recall subagent's channel, so QQ c2c agent IDs (e.g. `c2c:10D4F7C2…`) and other scoped conversation IDs do not reach bundled-plugin `dirName` validation and crash the recall run. The same guard already applied to explicit `channelId` params (#76704); this extends it to store-derived channels. (#77396) Thanks @hclsys.
 - Sandbox/Windows: accept drive-absolute Docker bind sources while keeping sandbox blocked-path and allowed-root policy comparisons Windows-case-insensitive. (#42174) Thanks @6607changchun.
+- Agents/models-config: switch the `models.json` ready cache from mtime-based fingerprints to content-hashed `auth-profiles.json` fingerprints with a separate post-write `models.json` drift hash, so OAuth token rotations no longer invalidate the implicit-provider-discovery cache while external `models.json` edits still force a re-plan. Oversized auth-profiles/models.json files now fail closed (return null) instead of producing a size-only sentinel that a same-size content swap could evade (CWE-345); reads remain bounded-memory streaming with `O_NOFOLLOW`-hardened opens (CWE-59/CWE-400) and prototype-pollution-safe walks (CWE-1321). (#73260)
 - Agents/subagents: preserve every grouped child result when direct completion fallback has to bypass the requester-agent announce turn. Thanks @vincentkoc.
 - Agents/verbose: use compact explain-mode tool summaries for `/verbose` and progress drafts by default, with `agents.defaults.toolProgressDetail: "raw"` and per-agent overrides for debugging raw command/detail output.
 - Gateway/startup: keep model-catalog test helpers, run-session lookup code, QR pairing helpers, and TypeBox memory-tool schema construction out of hot startup import paths, reducing default gateway benchmark plugin-load and memory pressure.
diff --git a/src/agents/models-config.fingerprint-cache.test.ts b/src/agents/models-config.fingerprint-cache.test.ts
@@ -18,6 +18,9 @@ vi.mock("./model-auth-env-vars.js", () => ({
   listKnownProviderEnvApiKeyNames: () => ["OPENAI_API_KEY"],
   PROVIDER_ENV_API_KEY_CANDIDATES: { openai: ["OPENAI_API_KEY"] },
   resolveProviderEnvApiKeyCandidates: () => ({ openai: ["OPENAI_API_KEY"] }),
+  resolveProviderEnvAuthEvidence: () => ({}),
+  listProviderEnvAuthLookupKeys: () => ["openai"],
+  resolveProviderEnvAuthLookupKeys: () => ["openai"],
 }));
 
 vi.mock("../plugins/provider-runtime.js", () => ({
@@ -256,4 +259,56 @@ describe("ensureOpenClawModelsJson fingerprint cache", () => {
     await ensureOpenClawModelsJson(cfgTwo, agentDir);
     expect(resolveImplicitProvidersCallCount).toBe(2);
   });
+
+  it("invalidates the cache when auth-profiles.json transitions to oversize (Aisle/Codex P2 fail-closed on #73260)", async () => {
+    // Regression for the size-only sentinel bypass: previously an
+    // oversized auth-profiles.json yielded a deterministic
+    // `oversize:${size}` hash, so a same-size content swap would
+    // preserve the cache hit.  After the follow-up,
+    // `safeHashRegularFile` returns null on oversize — transitioning
+    // to oversize must therefore change the fingerprint and force a
+    // re-plan.
+    const agentDir = await fixtureSuite.createCaseDir("agent");
+    const cfg = createOpenAiConfig();
+
+    // Start with a small, hashable profile so the first call lands
+    // a cached entry keyed by a content-derived fingerprint.
+    await writeAuthProfiles(agentDir, {
+      version: 1,
+      profiles: {
+        "anthropic:default": {
+          type: "token",
+          provider: "anthropic",
+          token: "sk-ant-small", // pragma: allowlist secret
+        },
+      },
+    });
+    await ensureOpenClawModelsJson(cfg, agentDir);
+    const firstCount = resolveImplicitProvidersCallCount;
+    expect(firstCount).toBe(1);
+
+    // Now grow auth-profiles.json past the 8 MiB cap.  The previous
+    // implementation would still produce a deterministic
+    // `oversize:<size>` hash; the follow-up fix returns null,
+    // changing the fingerprint and forcing a re-plan.
+    const target = path.join(agentDir, "auth-profiles.json");
+    const padding = "x".repeat(10 * 1024 * 1024); // 10 MiB > MAX_AUTH_PROFILES_BYTES (8 MiB)
+    await fs.writeFile(
+      target,
+      JSON.stringify({
+        version: 1,
+        padding,
+        profiles: {
+          "anthropic:default": {
+            type: "token",
+            provider: "anthropic",
+            token: "sk-ant-small", // pragma: allowlist secret
+          },
+        },
+      }),
+    );
+
+    await ensureOpenClawModelsJson(cfg, agentDir);
+    expect(resolveImplicitProvidersCallCount).toBe(firstCount + 1);
+  }, 20_000);
 });
diff --git a/src/agents/models-config.ts b/src/agents/models-config.ts
@@ -92,7 +92,14 @@ const DANGEROUS_PROTO_KEYS: ReadonlySet<string> = new Set([
  *    and non-regular files via lstat before opening.  Uses O_NOFOLLOW
  *    where supported so a symlink swap-in between lstat and open also
  *    fails closed.
- *  - Returns null on any abnormality so callers force a re-plan.
+ *  - Aisle medium / Codex P2 followup on #73260: oversized files now
+ *    return null (fail closed, CWE-345).  The previous size-only
+ *    sentinel `oversize:${size}` let an attacker swap the contents of
+ *    an oversized file without changing its byte length and still hit
+ *    the cache; returning null forces the caller's re-plan path so the
+ *    cache cannot be evaded by a same-size content swap.  This is
+ *    consistent with the "return null forces re-plan" contract used
+ *    everywhere else in this helper.
  *
  * The streaming reader is destroyed if accumulated bytes exceed maxBytes,
  * so an attacker cannot grow the file between lstat and read past the
@@ -101,18 +108,19 @@ const DANGEROUS_PROTO_KEYS: ReadonlySet<string> = new Set([
 async function safeHashRegularFile(
   pathname: string,
   maxBytes: number,
-): Promise<{ hash: string; raw: Buffer | null } | null> {
+): Promise<{ hash: string; raw: Buffer } | null> {
   // lstat + isFile() + isSymbolicLink() rejects symlinks and any
   // non-regular file (directory, socket, FIFO, device).
   const lst = await fs.lstat(pathname).catch(() => null);
   if (!lst || lst.isSymbolicLink() || !lst.isFile()) {
     return null;
   }
   if (lst.size > maxBytes) {
-    // File too large at lstat time — don't even open it.  Caller forces
-    // re-plan.  We return a deterministic sentinel hash for size-
-    // exceeded so size changes still invalidate the cache.
-    return { hash: `oversize:${lst.size}`, raw: null };
+    // Oversize at lstat time — fail closed.  Returning null forces the
+    // caller to treat this file as unhashable, so the readyCache cannot
+    // grant a cache hit based on a size-derived sentinel that ignores
+    // content.  See the JSDoc above for the threat model.
+    return null;
   }
   // Open with O_NOFOLLOW (where the platform supports it) to close a
   // narrow TOCTOU window between lstat and open: if a symlink is
@@ -140,6 +148,9 @@ async function safeHashRegularFile(
       stream.on("data", (chunk: Buffer) => {
         seen += chunk.length;
         if (seen > maxBytes) {
+          // File grew past the cap mid-read.  Destroy with an error so
+          // the surrounding try/catch returns null (fail closed) —
+          // matching the lstat-time oversize check above.
           stream.destroy(new Error("file grew past cap during read"));
           return;
         }
@@ -161,18 +172,17 @@ async function safeHashRegularFile(
  * Compute a content-based fingerprint for auth-profiles.json that is
  * stable across OAuth token rotations.  Returns null if the file does
  * not exist or fails the safe-read checks (symlink, non-regular,
- * oversize).  Falls back to a raw-content hash if JSON parsing fails
- * (so structural changes still register, just without canonicalization).
+ * oversize, or any I/O error).  Oversize files fail closed (return
+ * null) rather than producing a size-derived sentinel — see the
+ * `safeHashRegularFile` JSDoc for the threat model.  Falls back to a
+ * raw-content hash if JSON parsing fails (so structural changes still
+ * register, just without canonicalization).
  */
 async function readAuthProfilesStableHash(pathname: string): Promise<string | null> {
   const safe = await safeHashRegularFile(pathname, MAX_AUTH_PROFILES_BYTES);
   if (!safe) {
     return null;
   }
-  if (safe.raw === null) {
-    // Oversize sentinel — caller invalidates cache by mismatch.
-    return safe.hash;
-  }
   let parsed: unknown;
   try {
     parsed = JSON.parse(safe.raw.toString("utf8"));