Skip to content

Commit 057024f

Browse files
committed
fix(models-config): fail closed on oversized fingerprint reads (review feedback)
Codex P2 / Aisle medium #1 on PR #73260 flagged that `safeHashRegularFile` returned a deterministic `oversize:${lst.size}` sentinel when a file exceeded the cap, so an attacker could swap the *contents* of an oversized `auth-profiles.json` or `models.json` without changing its byte length and still hit the readyCache via the size-only sentinel comparison (CWE-345). Switch to fail-closed: - `safeHashRegularFile` now returns `null` for oversize at lstat time and for grow-past-cap mid-read (the latter was already caught via the destroy-with-error path; the JSDoc/comment now states that explicitly). - The return type narrows from `{ hash; raw: Buffer | null } | null` to `{ hash; raw: Buffer } | null` — the `raw === null` branch in `readAuthProfilesStableHash` is now unreachable and removed. - JSDoc on the helper and `readAuthProfilesStableHash` now describes the fail-closed semantics and links them back to the threat model. Also: - Add a regression test that warms the cache with a small, hashable `auth-profiles.json`, then grows it past `MAX_AUTH_PROFILES_BYTES` (8 MiB) and asserts the implicit-provider-discovery pipeline re-runs (i.e. the fingerprint changed instead of stably hitting an `oversize:<size>` sentinel). - Add the missing `Unreleased` CHANGELOG entry covering the content-hashed fingerprint, post-write models.json drift hash, the oversize fail-closed change, and the existing `O_NOFOLLOW`/streaming/ prototype-pollution hardening (Codex P3 on the same review). - Backfill the new `resolveProviderEnvAuthEvidence`/ `listProviderEnvAuthLookupKeys`/`resolveProviderEnvAuthLookupKeys` exports in the fingerprint test's `model-auth-env-vars.js` mock so the suite runs at all after the recent `origin/main` merge added those call sites in `model-auth-env.ts` / provider secret-helpers. Validation (from ~/projects/worktrees/openclaw/perf-models-config-cache-fingerprint): pnpm vitest run src/agents/models-config.fingerprint-cache.test.ts Test Files 2 passed (2) Tests 12 passed (12) pnpm tsgo:core # 0 errors pnpm tsgo:core:test # 0 errors pnpm oxlint src/agents/models-config.ts \ src/agents/models-config.fingerprint-cache.test.ts Found 0 warnings and 0 errors. git diff --check # clean Beads: openclaw-9i0
1 parent 7cd2e8a commit 057024f

3 files changed

Lines changed: 78 additions & 12 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ Docs: https://docs.openclaw.ai
4040
- Plugins/update: make package upgrades swap pnpm/npm-prefix installs cleanly, keep legacy plugin install runtime chunks working, and on the beta channel fall back default-line npm plugins to default/latest when plugin beta releases are missing or fail install validation. Thanks @vincentkoc and @joshavant.
4141
- Plugins/active-memory: skip session-store channel entries that contain `:` when resolving the recall subagent's channel, so QQ c2c agent IDs (e.g. `c2c:10D4F7C2…`) and other scoped conversation IDs do not reach bundled-plugin `dirName` validation and crash the recall run. The same guard already applied to explicit `channelId` params (#76704); this extends it to store-derived channels. (#77396) Thanks @hclsys.
4242
- Sandbox/Windows: accept drive-absolute Docker bind sources while keeping sandbox blocked-path and allowed-root policy comparisons Windows-case-insensitive. (#42174) Thanks @6607changchun.
43+
- Agents/models-config: switch the `models.json` ready cache from mtime-based fingerprints to content-hashed `auth-profiles.json` fingerprints with a separate post-write `models.json` drift hash, so OAuth token rotations no longer invalidate the implicit-provider-discovery cache while external `models.json` edits still force a re-plan. Oversized auth-profiles/models.json files now fail closed (return null) instead of producing a size-only sentinel that a same-size content swap could evade (CWE-345); reads remain bounded-memory streaming with `O_NOFOLLOW`-hardened opens (CWE-59/CWE-400) and prototype-pollution-safe walks (CWE-1321). (#73260)
4344
- Agents/subagents: preserve every grouped child result when direct completion fallback has to bypass the requester-agent announce turn. Thanks @vincentkoc.
4445
- Agents/verbose: use compact explain-mode tool summaries for `/verbose` and progress drafts by default, with `agents.defaults.toolProgressDetail: "raw"` and per-agent overrides for debugging raw command/detail output.
4546
- Gateway/startup: keep model-catalog test helpers, run-session lookup code, QR pairing helpers, and TypeBox memory-tool schema construction out of hot startup import paths, reducing default gateway benchmark plugin-load and memory pressure.

src/agents/models-config.fingerprint-cache.test.ts

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ vi.mock("./model-auth-env-vars.js", () => ({
1818
listKnownProviderEnvApiKeyNames: () => ["OPENAI_API_KEY"],
1919
PROVIDER_ENV_API_KEY_CANDIDATES: { openai: ["OPENAI_API_KEY"] },
2020
resolveProviderEnvApiKeyCandidates: () => ({ openai: ["OPENAI_API_KEY"] }),
21+
resolveProviderEnvAuthEvidence: () => ({}),
22+
listProviderEnvAuthLookupKeys: () => ["openai"],
23+
resolveProviderEnvAuthLookupKeys: () => ["openai"],
2124
}));
2225

2326
vi.mock("../plugins/provider-runtime.js", () => ({
@@ -256,4 +259,56 @@ describe("ensureOpenClawModelsJson fingerprint cache", () => {
256259
await ensureOpenClawModelsJson(cfgTwo, agentDir);
257260
expect(resolveImplicitProvidersCallCount).toBe(2);
258261
});
262+
263+
it("invalidates the cache when auth-profiles.json transitions to oversize (Aisle/Codex P2 fail-closed on #73260)", async () => {
264+
// Regression for the size-only sentinel bypass: previously an
265+
// oversized auth-profiles.json yielded a deterministic
266+
// `oversize:${size}` hash, so a same-size content swap would
267+
// preserve the cache hit. After the follow-up,
268+
// `safeHashRegularFile` returns null on oversize — transitioning
269+
// to oversize must therefore change the fingerprint and force a
270+
// re-plan.
271+
const agentDir = await fixtureSuite.createCaseDir("agent");
272+
const cfg = createOpenAiConfig();
273+
274+
// Start with a small, hashable profile so the first call lands
275+
// a cached entry keyed by a content-derived fingerprint.
276+
await writeAuthProfiles(agentDir, {
277+
version: 1,
278+
profiles: {
279+
"anthropic:default": {
280+
type: "token",
281+
provider: "anthropic",
282+
token: "sk-ant-small", // pragma: allowlist secret
283+
},
284+
},
285+
});
286+
await ensureOpenClawModelsJson(cfg, agentDir);
287+
const firstCount = resolveImplicitProvidersCallCount;
288+
expect(firstCount).toBe(1);
289+
290+
// Now grow auth-profiles.json past the 8 MiB cap. The previous
291+
// implementation would still produce a deterministic
292+
// `oversize:<size>` hash; the follow-up fix returns null,
293+
// changing the fingerprint and forcing a re-plan.
294+
const target = path.join(agentDir, "auth-profiles.json");
295+
const padding = "x".repeat(10 * 1024 * 1024); // 10 MiB > MAX_AUTH_PROFILES_BYTES (8 MiB)
296+
await fs.writeFile(
297+
target,
298+
JSON.stringify({
299+
version: 1,
300+
padding,
301+
profiles: {
302+
"anthropic:default": {
303+
type: "token",
304+
provider: "anthropic",
305+
token: "sk-ant-small", // pragma: allowlist secret
306+
},
307+
},
308+
}),
309+
);
310+
311+
await ensureOpenClawModelsJson(cfg, agentDir);
312+
expect(resolveImplicitProvidersCallCount).toBe(firstCount + 1);
313+
}, 20_000);
259314
});

src/agents/models-config.ts

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,14 @@ const DANGEROUS_PROTO_KEYS: ReadonlySet<string> = new Set([
9292
* and non-regular files via lstat before opening. Uses O_NOFOLLOW
9393
* where supported so a symlink swap-in between lstat and open also
9494
* fails closed.
95-
* - Returns null on any abnormality so callers force a re-plan.
95+
* - Aisle medium / Codex P2 followup on #73260: oversized files now
96+
* return null (fail closed, CWE-345). The previous size-only
97+
* sentinel `oversize:${size}` let an attacker swap the contents of
98+
* an oversized file without changing its byte length and still hit
99+
* the cache; returning null forces the caller's re-plan path so the
100+
* cache cannot be evaded by a same-size content swap. This is
101+
* consistent with the "return null forces re-plan" contract used
102+
* everywhere else in this helper.
96103
*
97104
* The streaming reader is destroyed if accumulated bytes exceed maxBytes,
98105
* so an attacker cannot grow the file between lstat and read past the
@@ -101,18 +108,19 @@ const DANGEROUS_PROTO_KEYS: ReadonlySet<string> = new Set([
101108
async function safeHashRegularFile(
102109
pathname: string,
103110
maxBytes: number,
104-
): Promise<{ hash: string; raw: Buffer | null } | null> {
111+
): Promise<{ hash: string; raw: Buffer } | null> {
105112
// lstat + isFile() + isSymbolicLink() rejects symlinks and any
106113
// non-regular file (directory, socket, FIFO, device).
107114
const lst = await fs.lstat(pathname).catch(() => null);
108115
if (!lst || lst.isSymbolicLink() || !lst.isFile()) {
109116
return null;
110117
}
111118
if (lst.size > maxBytes) {
112-
// File too large at lstat time — don't even open it. Caller forces
113-
// re-plan. We return a deterministic sentinel hash for size-
114-
// exceeded so size changes still invalidate the cache.
115-
return { hash: `oversize:${lst.size}`, raw: null };
119+
// Oversize at lstat time — fail closed. Returning null forces the
120+
// caller to treat this file as unhashable, so the readyCache cannot
121+
// grant a cache hit based on a size-derived sentinel that ignores
122+
// content. See the JSDoc above for the threat model.
123+
return null;
116124
}
117125
// Open with O_NOFOLLOW (where the platform supports it) to close a
118126
// narrow TOCTOU window between lstat and open: if a symlink is
@@ -140,6 +148,9 @@ async function safeHashRegularFile(
140148
stream.on("data", (chunk: Buffer) => {
141149
seen += chunk.length;
142150
if (seen > maxBytes) {
151+
// File grew past the cap mid-read. Destroy with an error so
152+
// the surrounding try/catch returns null (fail closed) —
153+
// matching the lstat-time oversize check above.
143154
stream.destroy(new Error("file grew past cap during read"));
144155
return;
145156
}
@@ -161,18 +172,17 @@ async function safeHashRegularFile(
161172
* Compute a content-based fingerprint for auth-profiles.json that is
162173
* stable across OAuth token rotations. Returns null if the file does
163174
* not exist or fails the safe-read checks (symlink, non-regular,
164-
* oversize). Falls back to a raw-content hash if JSON parsing fails
165-
* (so structural changes still register, just without canonicalization).
175+
* oversize, or any I/O error). Oversize files fail closed (return
176+
* null) rather than producing a size-derived sentinel — see the
177+
* `safeHashRegularFile` JSDoc for the threat model. Falls back to a
178+
* raw-content hash if JSON parsing fails (so structural changes still
179+
* register, just without canonicalization).
166180
*/
167181
async function readAuthProfilesStableHash(pathname: string): Promise<string | null> {
168182
const safe = await safeHashRegularFile(pathname, MAX_AUTH_PROFILES_BYTES);
169183
if (!safe) {
170184
return null;
171185
}
172-
if (safe.raw === null) {
173-
// Oversize sentinel — caller invalidates cache by mismatch.
174-
return safe.hash;
175-
}
176186
let parsed: unknown;
177187
try {
178188
parsed = JSON.parse(safe.raw.toString("utf8"));

0 commit comments

Comments
 (0)