perf(models-config): content-hash auth-profiles + models.json drift detection#73260
perf(models-config): content-hash auth-profiles + models.json drift detection#73260zeroaltitude wants to merge 12 commits intoopenclaw:mainfrom
Conversation
…etection Replaces mtime-based fingerprinting with content-based hashes for the implicit-provider-discovery cache key, plus a second-factor models.json content hash that catches external edits / partial corruption / sibling process writes between cache hits. Includes the security hardening findings raised by Aisle on the original combined PR (openclaw#72869). # Why content hashes The previous mtime-based key invalidated on every OAuth token refresh because auth-profiles.json gets rewritten with new access/refresh timestamps even when the set of available providers doesn't change. Same for models.json mtime: the file is the OUTPUT of this function, so each call observed its own write and invalidated the next call. Now: - auth-profiles.json: SHA-256 over a stable serialization that strips volatile OAuth session fields (access, refresh, expires*, issuedAt, refreshed/lastChecked/lastRefresh/lastValidatedAt). Token rotation no longer invalidates; structural changes (added/removed profiles, rotated static type:token credentials) still do. - models.json: NOT included in the input fingerprint (would cause self-invalidation). Instead its content hash is captured at write time and stored alongside the readyCache entry. Every cache check recomputes the file hash and compares; any external edit / corruption / tamper invalidates the cache and forces re-plan. # Security hardening (Aisle review on PR openclaw#72869) - 🟠 High #1 (CWE-59 symlink-following chmod): ensureModelsFileMode now lstats first and refuses to chmod symlinks or non-regular files. Closes the attack where a symlink at ${agentDir}/models.json points at a sensitive owned file and our best-effort chmod follows. - 🟡 Med #3 (CWE-1321 prototype pollution): stripAuthProfilesVolatile now uses Object.create(null) for the result and explicitly filters __proto__ / prototype / constructor keys (belt-and-suspenders). - 🟡 Med #4 (DoS via unbounded fingerprinting): MAX_AUTH_PROFILES_BYTES = 8 MiB cap (above which we hash raw bytes), MAX_AUTH_PROFILES_DEPTH = 64 with a depth-cap marker so deeply-nested JSON cannot stack- overflow the gateway. - 🟡 Med #5 (CWE-312 secrets in cache): buildModelsJsonFingerprint now returns SHA-256 hex of the canonical payload instead of the raw stable-stringified payload. The readyCache holds no verbatim secret material; heap snapshots / debug telemetry / core dumps cannot leak apiKey strings via this path. # Token-rotation correctness "token" is intentionally NOT in AUTH_PROFILE_VOLATILE_FIELDS even though OAuth-style 'token' fields rotate. Profiles with type: 'token' use the literal token key as a long-lived static credential, and stripping it would mask real auth-state changes when a user rotates that credential. Documented inline. # Tests Two tests in models-config.fingerprint-cache.test.ts: - 'does not invalidate the cache when OAuth session fields rotate' (rotates access/refresh/expires; cache stays valid) - 'DOES invalidate the cache when a static type:token credential rotates' (rotates the literal token field; cache invalidates) # Compatibility MODELS_JSON_STATE.readyCache value shape extended with modelsJsonHash: { fingerprint, modelsJsonHash, result }. All three plan return paths (skip/noop/write) capture the post-write hash. The refreshedFingerprint re-key path forwards modelsJsonHash through unchanged. This is one of two PRs split out of openclaw#72869. The targetProvider short-circuit work landed separately.
🔒 Aisle Security AnalysisWe found 2 potential security issue(s) in this PR:
1. 🟡 Cache integrity check bypass for oversized models.json/auth-profiles.json (sentinel hash depends only on size)
Description
As a result, when a file remains oversized and an attacker (or any external process) modifies its contents without changing its size, the “hash” remains constant and the cache-integrity mechanism can be bypassed:
The same issue applies to Vulnerable code: if (lst.size > maxBytes) {
return { hash: `oversize:${lst.size}`, raw: null };
}This undermines the intended “detect external edits / tampering” property of the readyCache for oversized files. RecommendationEnsure the oversize path cannot produce stable cache hits based only on size. Options (pick one consistent with desired behavior):
if (lst.size > maxBytes) {
return null; // force re-plan / fingerprint change
}
// pseudo
hash.update(prefixBytes);
hash.update(`|truncated|size:${fst.size}`);
return { hash: hash.digest("hex"), raw: null };
Approach (1) is simplest and most robust for integrity: oversize becomes an automatic cache miss. 2. 🔵 Symlink-following / TOCTOU risk when O_NOFOLLOW is unavailable in models.json handling
Description
In particular:
This can lead to unintended reads (hashing) of arbitrary regular files (bounded by Vulnerable code (fallback branch): const flags =
typeof FS_CONSTANTS.O_NOFOLLOW === "number"
? FS_CONSTANTS.O_RDONLY | FS_CONSTANTS.O_NOFOLLOW
: FS_CONSTANTS.O_RDONLY;
fh = await fs.open(pathname, flags);
// ... fh.stat() checks the opened target, not whether the path was a symlink
await fh.chmod(0o600);RecommendationTreat platforms without
Example hardening for const lst = await fs.lstat(pathname).catch(() => null);
if (!lst || lst.isSymbolicLink() || !lst.isFile()) return;
const flags = FS_CONSTANTS.O_RDONLY; // still best-effort on platforms lacking O_NOFOLLOW
const fh = await fs.open(pathname, flags);
try {
const fst = await fh.stat();
if (!fst.isFile()) return;
await fh.chmod(0o600);
} finally {
await fh.close().catch(() => undefined);
}Note: without Analyzed PR: #73260 at commit Last updated on: 2026-04-28T05:34:22Z |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea3ff6c8ab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Greptile SummaryThis PR replaces mtime-based cache keys with content-based SHA-256 hashes for the One P2 to note: the oversize sentinel returned by Confidence Score: 5/5Safe to merge — the single finding is P2 (impractical edge case with no security or data-correctness impact at realistic file sizes). No P0 or P1 issues found. The security hardening (O_NOFOLLOW, fchmod-through-fd, prototype-pollution guard, depth cap) is correctly implemented. The two-factor cache logic is sound. The only finding is a documentation-vs-implementation gap in the oversize sentinel path that is practically unreachable at the configured thresholds. src/agents/models-config.ts lines 111–116 (oversize sentinel semantics). Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/models-config.ts
Line: 111-116
Comment:
**Oversize branch returns size sentinel, not a content hash**
The PR description and the `readAuthProfilesStableHash` comment both claim "raw-hash above cap," but the oversize branch returns a size-only sentinel (`oversize:${lst.size}`) without reading the file at all. Two files of the same byte-count but different content receive the same "hash," so a content change that leaves the size unchanged would not invalidate the fingerprint.
For the models.json drift path this matters only above 1 MiB (very unusual), but the misleading comment at line 173 — "Caller invalidates cache by mismatch" — is only true if the size also changes. Streaming the first N bytes through SHA-256 when the file exceeds the cap (capping what's fed to the hasher, not what's read into memory) would satisfy both the DoS-resistance goal and the content-sensitivity claim.
How can I resolve this? If you propose a fix, please make it concise.Reviews (2): Last reviewed commit: "fix(models-config): streaming hash, O_NO..." | Re-trigger Greptile Re-review progress:
|
Addresses Codex P1, Greptile P2, and Aisle Med #1/#2/#3 on openclaw#73260: # Streaming bounded hash (Codex P1 / Aisle Med #2) The previous oversize-branch in readAuthProfilesStableHash still did fs.readFile(path) which loaded the entire file into memory \u2014 the MAX_AUTH_PROFILES_BYTES cap was effectively unenforced. Same problem for readModelsJsonContentHash (no cap at all). New helper safeHashRegularFile(): - lstat first; reject symlinks and non-regular files - bail at lstat-time if size exceeds maxBytes (returns sentinel hash so size-change still invalidates the cache) - open with O_NOFOLLOW where supported (closes lstat\u2192open TOCTOU) - fstat after open to verify it's still a regular file - createReadStream(fd) with bounded highWaterMark; destroy if accumulated bytes exceed maxBytes (handles the case where attacker grows the file between lstat and stream completion) Both readers now route through safeHashRegularFile. Models.json gets a 1 MiB cap (MAX_MODELS_JSON_BYTES); auth-profiles keeps its 8 MiB cap. # O_NOFOLLOW chmod (Aisle Med #1, CWE-367) The previous lstat-then-chmod sequence was racy: an attacker who could rename/replace ${agentDir}/models.json between the two syscalls could have chmod() follow a swapped-in symlink to an arbitrary file. Replaced with fs.open(path, O_RDONLY | O_NOFOLLOW) \u2192 fchmod via the file handle. The open itself refuses symlinks atomically; the fchmod operates on the validated fd, eliminating the race. When O_NOFOLLOW is unavailable (rare but possible), falls back to plain O_RDONLY \u2014 the prior lstat protection was already best-effort. # Symlink-safe reads (Aisle Med #3, CWE-59) Previous fs.readFile in the hash readers followed symlinks. With the safeHashRegularFile refactor above, both auth-profiles.json and models.json now reject symlinks before reading. # JSDoc fix (Greptile P2) readModelsJsonContentHash's prior comment said "forces a re-plan" when the file is absent, but the call site does null === null comparison which is a cache hit, not a re-plan (the steady-state skip case for plan.action === 'skip'). Updated the comment to explain the actual semantics: stable absence is a stable result; drift is a re-plan. # Tests 14/14 existing tests pass. No new test cases needed \u2014 the existing file-mode and fingerprint-cache tests cover both the symlink-rejection paths (via lstat shortcut) and the modelsJsonHash second-factor. Lint: 0 errors.
|
Three findings landed in
All three readers (auth-profiles hash, models.json hash, chmod) now go through the same hardened code paths. |
|
@greptile-apps: please do re-review/re-score |
|
Codex review: needs changes before merge. Summary Reproducibility: yes. Current main source shows the cache fingerprint still depends on Real behavior proof Next step before merge Security Review findings
Review detailsBest possible solution: Land the content-outcome cache design after preserving token-profile expiry in the stable hash, adding focused regression coverage, and clearing exact-head validation. Do we have a high-confidence way to reproduce the issue? Yes. Current main source shows the cache fingerprint still depends on Is this the best way to solve the issue? No. The content-hash and fail-closed outcome direction is the right maintainable fix, but expiry fields should be stripped only where they are truly OAuth-volatile or otherwise preserved for Full review comments:
Overall correctness: patch is incorrect Security concerns:
Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 95a1c915312a. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6a37b84f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cd2e8af7a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…w feedback) Codex P2 / Aisle medium #1 on PR openclaw#73260 flagged that `safeHashRegularFile` returned a deterministic `oversize:${lst.size}` sentinel when a file exceeded the cap, so an attacker could swap the *contents* of an oversized `auth-profiles.json` or `models.json` without changing its byte length and still hit the readyCache via the size-only sentinel comparison (CWE-345). Switch to fail-closed: - `safeHashRegularFile` now returns `null` for oversize at lstat time and for grow-past-cap mid-read (the latter was already caught via the destroy-with-error path; the JSDoc/comment now states that explicitly). - The return type narrows from `{ hash; raw: Buffer | null } | null` to `{ hash; raw: Buffer } | null` — the `raw === null` branch in `readAuthProfilesStableHash` is now unreachable and removed. - JSDoc on the helper and `readAuthProfilesStableHash` now describes the fail-closed semantics and links them back to the threat model. Also: - Add a regression test that warms the cache with a small, hashable `auth-profiles.json`, then grows it past `MAX_AUTH_PROFILES_BYTES` (8 MiB) and asserts the implicit-provider-discovery pipeline re-runs (i.e. the fingerprint changed instead of stably hitting an `oversize:<size>` sentinel). - Add the missing `Unreleased` CHANGELOG entry covering the content-hashed fingerprint, post-write models.json drift hash, the oversize fail-closed change, and the existing `O_NOFOLLOW`/streaming/ prototype-pollution hardening (Codex P3 on the same review). - Backfill the new `resolveProviderEnvAuthEvidence`/ `listProviderEnvAuthLookupKeys`/`resolveProviderEnvAuthLookupKeys` exports in the fingerprint test's `model-auth-env-vars.js` mock so the suite runs at all after the recent `origin/main` merge added those call sites in `model-auth-env.ts` / provider secret-helpers. Validation (from ~/projects/worktrees/openclaw/perf-models-config-cache-fingerprint): pnpm vitest run src/agents/models-config.fingerprint-cache.test.ts Test Files 2 passed (2) Tests 12 passed (12) pnpm tsgo:core # 0 errors pnpm tsgo:core:test # 0 errors pnpm oxlint src/agents/models-config.ts \ src/agents/models-config.fingerprint-cache.test.ts Found 0 warnings and 0 errors. git diff --check # clean Beads: openclaw-9i0
|
Follow-up addressing clawsweeper review (Codex P2 + P3) and Aisle medium #1 landed in Codex P2 / Aisle medium #1 — size-only oversize sentinel could mask same-size content swaps (CWE-345) The previous
Codex P3 — missing Added an Regression coverage
Validation (from Aisle low #2 (TOCTOU on platforms without @clawsweeper please re-review. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 057024fa46
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ransport check + bounded compare (review feedback) Codex P1+P2 (and matching Aisle High #2 / medium #3) on PR openclaw#73261. ClawSweeper rejected the previous round's "wrong solution" \u2014 the global readyCache write after a provider-scoped check, an unbounded recursive stableEqual on disk-controlled state, no per-model transport drift check, and a fail-open malformed-apiKey branch. Redesign: # Provider-scoped readyCache (Codex P1) The previous version called MODELS_JSON_STATE.readyCache.set(cacheKey, ...) after the targetProvider short-circuit, where cacheKey was the GLOBAL fingerprint key. A later non-targeted ensureOpenClawModelsJson call would hit that entry and skip the full plan even though only one provider was validated. After the redesign, successful short-circuits populate a SCOPED key: `${targetPath}\\0${fingerprint}\\0sc:${targetProvider}` which non-targeted callers never compute. Targeted callers consult the scoped key first, hit it warm if present (with the same modelsJsonHash two-factor verification as the global cache), and fall through to the disk-based check otherwise. Non-targeted callers always run a full plan to validate every provider. # Per-model transport drift (Codex P1 / Aisle High #2) The runtime consumes per-model `baseUrl` / `api` / `headers` from models.json (`pi-embedded-runner/model.ts` falls back to `discoveredModel.baseUrl` / `.api` / `.headers` when no provider-level override is set). The previous comparison only covered provider-level fields, so an attacker who could write models.json could keep provider-level apiKey/baseUrl/headers/auth intact while injecting a per-model `baseUrl: \"https://attacker.example/v1\"` and the short- circuit would accept it. After the redesign, any disk-side per-model transport field forces a full re-plan that re-applies provider/plugin defaults and rewrites the file. This is intentionally strict: the common case (no per-model transport overrides) still gets the short- circuit; configurations that legitimately use per-model transport just skip the fast path. # Bounded structural equality (Codex P2 / Aisle medium #3) `stableEqual = stableStringify(a) === stableStringify(b)` recurses without depth bounds. A small but pathologically nested `models.json` (e.g. `{a: {a: {a: ...}}}`) could stack-overflow the gateway during the short-circuit comparison. Added `stableStringifyBounded` (throws on overrun) + `stableEqualBounded` (catches the overrun and returns false \u2014 fail closed to the full plan). Used for baseUrl/headers/auth comparisons against disk-controlled state with `SHORT_CIRCUIT_COMPARE_MAX_DEPTH = 64`. The original unbounded `stableEqual` is removed; the trusted- input fingerprint pipeline still uses unbounded `stableStringify`. # Fail closed on malformed disk apiKey (Codex P2) When config has no apiKey, the previous fail-open branch only rejected non-empty strings on disk \u2014 numbers, null, objects, etc. were silently accepted. Now anything other than `undefined` or empty string forces a re-plan. # Reuse openclaw#73260's safeHashRegularFile The disk read in `readExistingProviderMatchesConfig` now uses `safeHashRegularFile` (the fingerprint-cache primitive) for O_NOFOLLOW open + lstat/fstat regular-file check + size cap + fail-closed on grow-past-cap. Drops the redundant `MAX_MODELS_JSON_SHORT_CIRCUIT_BYTES` constant in favour of the shared `MAX_MODELS_JSON_BYTES` from openclaw#73260. # Tests 13/13 pass in src/agents/models-config.target-provider-short-circuit.test.ts: - existing 5 (hit-on-match, miss-on-rotated-key, miss-on-baseUrl, miss-on-headers, miss-on-cold-cache, hit-after-warm-fingerprint) still pass with the new code path. - `short-circuit-populates-scoped-cache` rewritten: third call hits the SCOPED cache entry with no fs.readFile (modelsJsonHash uses createReadStream). - `scoped-cache-isolation` (new): targeted short-circuit followed by non-targeted call with same fingerprint runs full plan (does NOT hit the scoped entry). - `miss-on-per-model-baseUrl` / `miss-on-per-model-headers` / `miss-on-per-model-api` (new): per-model transport overrides on disk reject the short-circuit and fall through to a full plan. - `miss-on-deep-nested-disk` (new): 200-level-deep adversarial disk headers fall through cleanly without throwing/overflowing. - `miss-on-malformed-disk-apiKey` (new): non-string disk apiKey rejects when config has no apiKey. Also backfills the missing `resolveProviderEnvAuthEvidence`/`listProviderEnvAuthLookupKeys`/ `resolveProviderEnvAuthLookupKeys` exports in the `model-auth-env-vars.js` mock so the test suite runs against the post-merge `model-auth-env.ts` surface (same fix openclaw#73260's follow-up landed for its fingerprint-cache test). # Validation (from worktree ~/projects/worktrees/openclaw/perf-models-config-target-provider): pnpm vitest run src/agents/models-config.target-provider-short-circuit.test.ts Test Files 1 passed (1) Tests 13 passed (13) pnpm vitest run src/agents/models-config.fingerprint-cache.test.ts \\ src/agents/models-config.target-provider-short-circuit.test.ts Test Files 2 passed (2) Tests 19 passed (19) pnpm tsgo:core # 0 errors pnpm tsgo:core:test # 0 errors pnpm oxlint src/agents/models-config.ts \\ src/agents/models-config.target-provider-short-circuit.test.ts Found 0 warnings and 0 errors. git diff --check # clean Beads: openclaw-l5q
…ache on uncacheable auth-profiles (review feedback) Round-4 follow-up to Codex P1 + P2 on PR openclaw#73260. Both findings were real holes in the round-3 "return null on oversize" change: 1. Codex P1 — "Treat unhashable models.json as cache drift" (src/agents/models-config.ts:474): The cache-hit predicate accepted `currentModelsJsonHash === settled.modelsJsonHash`, but `readModelsJsonContentHash` returned `null` for several distinct failure modes (oversize >1 MiB, symlinked, non-regular, I/O error) IN ADDITION to the legitimate "file absent" case. Anyone who could put models.json into one of those states could then mutate its contents repeatedly while every read returned null, and the cache kept hitting via `null === null`. 2. Codex P2 — "Distinguish oversize auth-profiles from stable fingerprints" (src/agents/models-config.ts:123): `readAuthProfilesStableHash` returned `null` for any oversize auth-profiles.json, so all >8 MiB variants collapsed to a single fingerprint contribution. Once the file was already oversize, credential edits that kept it oversize no longer changed the fingerprint and therefore no longer invalidated the ready cache. Fix: replace the `string | null` return with a discriminated outcome (`ContentHashOutcome` in `models-config-state.ts`): | { kind: "absent" } // legitimate file-does-not-exist steady state | { kind: "hashed"; hash } // successful bounded-stream read | { kind: "uncacheable" } // oversize, symlink, non-regular, or I/O error Predicate (`modelsContentOutcomesMatch`): - `absent === absent` → cache hit (skip-noop on a missing file). - `hashed === hashed` → cache hit iff hashes are identical. - `uncacheable` on EITHER side → ALWAYS a miss. Two `uncacheable` outcomes do not match each other, because they could correspond to different attacker-controlled bodies that just happen to trip the same cap. Auth-profiles cache-bypass: when `readAuthProfilesStableOutcome` returns `uncacheable`, `ensureOpenClawModelsJson` bypasses the ready cache entirely — neither reading from it nor writing to it — and re-plans on every call. This stops an attacker who keeps the file oversize from accumulating dead entries in the in-memory cache, and stops same-size content swaps from riding a stale entry. When the file returns to a hashable state the next call lands a fresh cache entry keyed by the new content fingerprint. Defensive guard: `buildModelsJsonFingerprint` now refuses to be called with an `uncacheable` auth-profiles outcome. Callers must bypass the cache instead. Also: - Refactor `safeHashRegularFile` → `safeReadFileOutcome` returning the discriminated outcome. ENOENT is the only lstat error treated as `absent`; every other error (EACCES, etc.) is `uncacheable` so silent permission failures do not masquerade as a stable absence. All other behaviour is preserved (O_NOFOLLOW-hardened open, fstat re-check, bounded streaming hash, abort-on-grow-past-cap). - Update `ContentHashOutcome` JSDoc + the `MODELS_JSON_STATE` cache shape from `modelsJsonHash: string | null` to `modelsJsonOutcome: ContentHashOutcome`. - Rewrite the cache-flow JSDoc (Greptile P2 noted the previous JSDoc was the opposite of the actual behaviour). - Add two regression tests: * "forces a re-plan on every call while auth-profiles.json stays oversize (Codex P2 round-4 on openclaw#73260)" — warms the cache with a small profile, verifies repeat-call hit, then oversizes the file and asserts that EVERY subsequent call re-plans (including byte-identical and same-byte-length swapped contents). Restoring the file to a small state re-enables caching on the next call. * "invalidates the cache when models.json becomes unhashable (Codex P1 round-4 on openclaw#73260)" — warms cache with a real write, verifies repeat-call hit, then externally grows models.json past `MAX_MODELS_JSON_BYTES` and asserts the next call re-plans. - Update CHANGELOG to describe the round-4 fail-closed contract. Validation (from ~/projects/worktrees/openclaw/perf-models-config-cache-fingerprint): pnpm vitest run src/agents/models-config.fingerprint-cache.test.ts Test Files 2 passed (2) Tests 16 passed (16) pnpm tsgo:core # 0 errors pnpm tsgo:core:test # 0 errors pnpm oxlint src/agents/models-config.ts \ src/agents/models-config-state.ts \ src/agents/models-config.fingerprint-cache.test.ts Found 0 warnings and 0 errors. git diff --check # clean Beads: openclaw-t9l
Round 4 — addressing Codex P1 + P2 follow-upBoth findings on the round-3 commit were real holes; round-3 collapsed
Commit: 1. Codex P1 — "Treat unhashable models.json as cache drift"
Solution. Introduce | { kind: "absent" } // legitimate file-does-not-exist
| { kind: "hashed"; hash } // bounded-stream read succeeded
| { kind: "uncacheable" } // oversize, symlink, non-regular, I/O errorReplace the
Stored cache shape: 2. Codex P2 — "Distinguish oversize auth-profiles from stable fingerprints"
Solution.
When the file returns to a hashable state, the next call lands a fresh ValidationTwo regression tests added:
Live runtime proof (no vitest, no
|
…hort-circuit (review feedback) Round-5 follow-up to Codex P1 + P2 on PR openclaw#73261: 1. Codex P1 — "Compare provider-level api before short-circuiting" (openclaw#73261 (comment)): readExistingProviderMatchesConfig compared baseUrl, apiKey, headers, and auth, but NOT the provider-level `api` field. The runtime consumes `models.providers.<id>.api` at the same priority as the other transport fields (see pi-embedded-runner/model.ts and models-json/plan.ts), so an attacker who can write models.json could swap a provider's transport flavor (e.g. `"openai-completions" -> "openai-responses"`) and the short- circuit would re-bless it because the per-model loop only flagged `api` set on disk-side MODEL rows, not on the provider itself. 2. Codex P2 — "Treat unhashable models.json as cache miss" (openclaw#73261 (comment)): the previous round-4 short-circuit used a `string | null` hash so an `uncacheable` models.json (oversize, symlink, I/O error) collapsed with the legitimate "file absent" case via `null === null`. This overlapped with the round-4 fix on openclaw#73260 (commit 05fda3d) which introduced a discriminated `ContentHashOutcome` type; the short-circuit's scoped-cache compare and disk-based check both needed to consume that contract via `modelsContentOutcomesMatch` and `safeReadFileOutcome` to inherit the fail-closed semantics (CWE-345). The merge of origin/perf/models-config-cache-fingerprint into this branch (commit 9f74081) wired the new `ContentHashOutcome` contract into the targetProvider short-circuit: - readExistingProviderMatchesConfig now uses `safeReadFileOutcome` instead of the removed `safeHashRegularFile`, so an `uncacheable` models.json on a fresh process (cold readyCache) refuses to bypass the planner. - Provider-level `api` comparison was added between the apiKey check and the headers check, with the same symmetric depth-bounded structural comparator already used for baseUrl/headers/auth. Drift (including config-undefined vs. disk-string) falls through to a full plan, which re-applies provider/plugin defaults and rewrites the file. - The scoped-cache compare swapped `currentModelsJsonHash === settled.modelsJsonHash` for `modelsContentOutcomesMatch(currentModelsJsonOutcome, settled.modelsJsonOutcome)`, so an `uncacheable` outcome on either side never matches itself or a captured hash. The post-validation outcome capture also refuses to write a scoped entry when the file is `uncacheable`, preventing dead entries from accumulating during oversize states. - cache-bypass mode (`!cacheable`) covers the targetProvider hint via the same `planAndWrite` sentinel-fingerprint path, so an `uncacheable` auth-profiles.json forces a full re-plan even when the caller hinted a provider. This commit adds: - 4 targeted regression tests in src/agents/models-config.target-provider-short-circuit.test.ts: * "miss-on-provider-api-drift" — disk-side provider api flipped to a different transport family, planner rewrites. * "miss-on-provider-api-config-undefined" — config has no `api`, any disk-side `api` rejects the short-circuit. * "miss-on-unhashable-models-json" — warm scoped-cache entry + models.json grown past MAX_MODELS_JSON_BYTES forces re-plan via the `uncacheable` outcome compare. * "miss-on-cold-uncacheable-models-json" — fresh process state + oversize models.json refuses the disk-based short-circuit via `safeReadFileOutcome`. - A CHANGELOG.md update on the existing models-config bullet to list `api` in the compared provider fields and mention the fail-closed `ContentHashOutcome` contract. Validation (from ~/projects/worktrees/openclaw/perf-models-config-target-provider): pnpm vitest run \ src/agents/models-config.target-provider-short-circuit.test.ts \ src/agents/models-config.fingerprint-cache.test.ts \ src/agents/models-config.write-serialization.test.ts \ src/agents/models-config.skips-writing-models-json-no-env-token.test.ts \ src/agents/models-config.runtime-source-snapshot.test.ts Test Files 10 passed (10) Tests 82 passed (82) pnpm tsgo:core # 0 errors pnpm tsgo:core:test # 0 errors pnpm oxlint src/agents/models-config.ts \ src/agents/models-config-state.ts \ src/agents/models-config.target-provider-short-circuit.test.ts \ src/agents/models-config.fingerprint-cache.test.ts Found 0 warnings and 0 errors. git diff --check # clean Real-runtime proof was captured by driving the production `ensureOpenClawModelsJson` directly via tsx (no test harness, no mocks); all three scenarios — steady-state hit, provider-api drift forces re-plan, oversize models.json forces re-plan — printed PROVED lines and exited 0. Output recorded on the PR comment. Beads: openclaw-l5q
…che-fingerprint # Conflicts: # CHANGELOG.md # docs/.generated/plugin-sdk-api-baseline.sha256 # src/agents/models-config.ts
|
@clawsweeper please re-review. Round 4 follow-up commit |
…l-closed behavior (review feedback) Clawsweeper P3 on PR openclaw#73260 round 4: the JSDoc on `MAX_AUTH_PROFILES_BYTES` still claimed "Above the cap we hash raw bytes instead" \u2014 a description that matched the pre-round-4 raw-hash behavior but contradicts the fail-closed model the round-4 commit (`05fda3d839`) actually implemented. The current behavior is: when an auth-profiles.json file exceeds `MAX_AUTH_PROFILES_BYTES`, `safeReadFileOutcome` returns `{ kind: "uncacheable" }`, and `ensureOpenClawModelsJson` bypasses the ready cache entirely. The file is never partially hashed; an oversize auth-profiles.json cannot ride a stale cache entry. Round-4 PR comment claimed this prose was already updated; it wasn't. Now updated with explicit references to `safeReadFileOutcome`, `ContentHashOutcome`, and `modelsContentOutcomesMatch` so future readers follow the fail-closed model rather than the obsolete raw-hash model. No code-path change; comment-only.
Round 5 — actually addressing Clawsweeper P3 (oversize auth-profile cap comment)Round-4 PR comment (2026-05-06T21:16) claimed the stale cap comment at [P3] Update the oversize auth-profile cap comment —
|
…tests The checks-node-core failure on run 25480122830 (https://github.com/openclaw/openclaw/actions/runs/25480122830) is unrelated to this PR — image/music/video-generation runtime tests that were confirmed passing locally (pnpm vitest run src/image-generation/runtime.test.ts passes clean on this HEAD). No code changes.
|
Cyclical PR review loop: still alive, no new reviewer comments at 2026-05-08T03:04Z; re-checking on next heartbeat. |
Summary
Splits the cache-fingerprint half of #72869 into a standalone PR.
This PR replaces mtime-based cache key inputs with content-based hashes for the
ensureOpenClawModelsJsonimplicit-provider-discovery cache, plus a second-factor models.json content hash that catches external edits / partial corruption / sibling-process writes between cache hits. Includes the security hardening findings raised by Aisle on the original combined PR.Why content hashes
The previous mtime-based key invalidated on every OAuth token refresh because
auth-profiles.jsongets rewritten with new access/refresh timestamps even when the set of available providers does not change. Same formodels.jsonmtime: the file is the OUTPUT of this function, so each call observed its own write and invalidated the next call.Now:
access,refresh,expires*,issuedAt,refreshed/lastChecked/lastRefresh/lastValidatedAt). Token rotation no longer invalidates; structural changes (added/removed profiles, rotated statictype:tokencredentials) still do.Security hardening (Aisle review on PR #72869)
ensureModelsFileModenow lstats first; refuses to chmod symlinks or non-regular filesObject.create(null)for stripped result + explicit__proto__/prototype/constructorfilterMAX_AUTH_PROFILES_BYTES = 8 MiB(raw-hash above cap),MAX_AUTH_PROFILES_DEPTH = 64with depth-markerbuildModelsJsonFingerprintreturns SHA-256 hex of canonical payload instead of raw stable-stringifiedToken-rotation correctness
tokenis intentionally NOT inAUTH_PROFILE_VOLATILE_FIELDSeven though OAuth-styletokenfields rotate. Profiles withtype: "token"use the literaltokenkey as a long-lived static credential, and stripping it would mask real auth-state changes when a user rotates that credential. Documented inline.Tests
Two tests in
models-config.fingerprint-cache.test.ts:does not invalidate the cache when OAuth session fields rotate(rotates access/refresh/expires; cache stays valid)DOES invalidate the cache when a static type:token credential rotates(rotates the literaltokenfield; cache invalidates)Existing fingerprint-cache test suite still passes.
Compatibility
MODELS_JSON_STATE.readyCachevalue shape extended withmodelsJsonHash:{ fingerprint, modelsJsonHash, result }. All three plan return paths (skip/noop/write) capture the post-write hash. TherefreshedFingerprintre-key path forwardsmodelsJsonHashthrough unchanged.Related
Real behavior proof
Behavior or issue addressed: Models-config cache fingerprint fail-closed behavior for unhashable/oversize
models.jsonandauth-profiles.jsonfrom PR perf(models-config): content-hash auth-profiles + models.json drift detection #73260.Real environment tested: Local OpenClaw topic branch
perf/models-config-cache-fingerprintat05fda3d839, isolated temporary agent directory, productionensureOpenClawModelsJsoninvoked through atsxruntime driver with no Vitest mocks.Exact steps or command run after this patch: Ran the proof driver after the patch to warm cache on a small auth profile, grow
auth-profiles.jsonpast the 8 MiB cap, repeat calls with byte-identical and swapped oversize contents, restore a small file, then exercise oversize and symlinkedmodels.json.Evidence after fix: Full copied runtime trace is in the PR comment: perf(models-config): content-hash auth-profiles + models.json drift detection #73260 (comment)
Excerpt of copied live output:
Observed result after fix: Uncacheable models content never matches cached state, oversize auth profiles bypass the ready cache instead of collapsing to a stale fingerprint, and restoring cacheable content re-enables fast hits.
What was not tested: Nothing else for this PR's changed behavior beyond the isolated local runtime proof and supplemental automated validation.