perf(agents): stabilize models.json cache fingerprint and add targetProvider short-circuit [claude, human developer oversight] by zeroaltitude · Pull Request #72869 · openclaw/openclaw

zeroaltitude · 2026-04-27T15:13:21Z

Problem

The in-memory cache in ensureOpenClawModelsJson was effectively disabled in practice. Every call recomputed a fingerprint that differed from the previous call's fingerprint, so the implicit-provider-discovery pipeline (which takes several seconds to scan installed plugins for provider contributions) ran on every invocation instead of once per real config change.

Two root causes:

The fingerprint included auth-profiles.json mtime. That file is rewritten whenever any OAuth token in it is refreshed (access token, refresh token, expiry), even though the set of providers the user can actually use is unchanged. Every refresh invalidated the cache on the next call.
The fingerprint included models.json mtime. That file is the output of this function. When a call wrote models.json, the next call observed a newer mtime and invalidated its own cached result — a self-defeating fingerprint.

Fix 1 — Stabilize the cache fingerprint (commit `918f3f9173`)

Replace the mtime-based inputs with:

A content-based hash of auth-profiles.json that strips volatile OAuth fields (access, refresh, token, expires, expiresAt, expiresIn, issuedAt, refreshedAt, lastCheckedAt, lastRefreshAt, lastValidatedAt). Structural changes (adding/removing a profile, changing provider or accountId) still invalidate the cache.
Drop modelsFileMtimeMs entirely. External edits to models.json are already detected downstream in the plan layer by comparing existingRaw against the computed plan.

Measured impact: on a workspace with ~37 discovery plugins, each call to runEmbeddedPiAgent drops from ~5s to sub-millisecond in the steady-state path.

Fix 2 — `targetProvider` short-circuit (commit `74e4d04771`)

When a caller knows the exact provider it intends to use (e.g. the pi-embedded runner, which always has a resolved provider/modelId before calling ensureOpenClawModelsJson), there is no reason to walk the full implicit-provider-discovery pipeline — models.json just needs to contain that one provider with usable credentials.

Adds an optional EnsureOpenClawModelsJsonOptions.targetProvider hint. When set, and the provider is present in both the in-memory config and on-disk models.json with some form of usable credential material (string apiKey, unresolved-secret apiKey object, populated headers, or explicit auth config), skip the fingerprint + plan + discovery path entirely and return immediately.

Threaded through from compactEmbeddedPiSessionDirect and runEmbeddedPiAgent so every embedded-pi dispatch benefits.

This is on top of the cache-fingerprint stabilization above and covers the cold-cache case (first call per gateway start, per-subagent spawn with a fresh cache, etc.) where even the fingerprint computation can be avoided.

Tests

src/agents/models-config.fingerprint-cache.test.ts adds 226 lines covering:

Volatile OAuth field stripping (auth-profiles.json content hash stability across token refresh).
Structural changes still invalidate (adding/removing a profile).
models.json writes do not invalidate the cache on next read.
targetProvider short-circuit hits when provider is configured with usable credentials.
targetProvider short-circuit fall-through when credentials are missing.
targetProvider ignored when not set (existing path preserved).

AI/Vibe-Coded disclosure

This PR was prepared with the assistance of Claude under human developer oversight. The diagnosis came from observing 5-second startup pauses on every runEmbeddedPiAgent call in production despite no actual config changes; instrumenting ensureOpenClawModelsJson revealed both fingerprint instabilities. Tested against my live workspace (37 plugins, several OAuth-using providers) over multiple gateway restarts.

Appreciate a sanity check on whether the cache layer is still the right place for this optimisation, or if there's a different layer you'd prefer.

…freshes The in-memory cache in ensureOpenClawModelsJson was effectively disabled in practice: every call recomputed a fingerprint that differed from the previous call's fingerprint, so the implicit-provider-discovery pipeline (which takes several seconds to scan installed plugins for provider contributions) ran on every invocation instead of once per real config change. Two root causes: 1. The fingerprint included mtime of auth-profiles.json. That file is rewritten whenever any OAuth token in it is refreshed (access token, refresh token, expiry), even though the set of providers the user can actually use is unchanged. Every refresh invalidated the cache on the next call. 2. The fingerprint included mtime of models.json. That file is the OUTPUT of this function. When a call wrote models.json, the NEXT call observed a newer mtime and invalidated its own cached result. Replace the mtime-based inputs with a content-based hash of auth-profiles.json that strips volatile OAuth fields (access, refresh, token, expires, expiresAt, expiresIn, issuedAt, refreshedAt, lastCheckedAt, lastRefreshAt, lastValidatedAt). Structural changes (adding/removing a profile, changing provider or accountId) still invalidate the cache. Drop the models.json input entirely: external edits to that file are already detected downstream in the plan layer by comparing existingRaw against the computed plan. Measured impact: on a workspace with ~37 discovery plugins, each call to runEmbeddedPiAgent dropped from ~5s to sub-millisecond in the models.json ensure step after the first call per gateway start. This affects every inbound message, every subagent spawn, every cron-dispatched agent turn, and every Active Memory lookup. Add src/agents/models-config.fingerprint-cache.test.ts covering: - cache hit on identical inputs - cache hit across simulated OAuth token rotation - cache invalidation when a profile is added/removed - cache invalidation when config changes

…s already configured When a caller knows the exact provider it intends to use (e.g. the pi-embedded runner, which always has a resolved provider/modelId before it calls ensureOpenClawModelsJson), there is no reason to walk the full implicit- provider-discovery pipeline — models.json just needs to contain that one provider with usable credentials. Add an optional EnsureOpenClawModelsJsonOptions.targetProvider hint. When set, and the provider is present in both the in-memory config and on-disk models.json with some form of usable credential material (string apiKey, unresolved-secret apiKey object, populated headers, or explicit auth config), skip the fingerprint + plan + discovery path entirely and return immediately. Thread the hint through from compactEmbeddedPiSessionDirect and runEmbeddedPiAgent so every embedded-pi dispatch benefits. This is on top of the cache-fingerprint stabilization in 49f7450 and covers the cold-cache case (first call per gateway start, per-subagent spawn with a fresh cache, etc.) where even the fingerprint path can be avoided.

# Conflicts: # src/agents/pi-embedded-runner/run.ts

aisle-research-bot · 2026-04-27T16:53:21Z

🔒 Aisle Security Analysis

We found 4 potential security issue(s) in this PR:

#	Severity	Title
1	🟠 High	Symlink/hardlink attack via predictable temp file in atomic models.json write
2	🟡 Medium	Potential DoS via unbounded fs.readFile() of oversized auth-profiles.json during fingerprinting
3	🟡 Medium	Unbounded read + JSON.parse of models.json allows memory/CPU exhaustion (DoS)
4	🟡 Medium	Incomplete disk-vs-config validation in models.json short-circuit allows persistent tampering of provider behavior

1. 🟠 Symlink/hardlink attack via predictable temp file in atomic models.json write

Property	Value
Severity	High
CWE	CWE-377
Location	`src/agents/models-config.ts:284-287`

Description

writeModelsFileAtomicForModelsJson writes to a predictable temporary path in the agent directory using fs.writeFile(). If an attacker can create files inside ${agentDir} (e.g., via an environment override of OPENCLAW_AGENT_DIR pointing to a shared/writable directory, or other local write access), they can pre-create tempPath as a symlink (or hardlink) to another file. fs.writeFile() will follow symlinks and overwrite the link target, enabling arbitrary file overwrite as the gateway user (CWE-59/CWE-377).

Vulnerable flow:

Input/control: attacker can influence filesystem state in agentDir (place a symlink at the predicted *.tmp path)
Sink: fs.writeFile(tempPath, ...) follows symlinks and writes attacker-controlled JSON content into the symlink target
Result: arbitrary file clobbering (and subsequent fs.rename behavior may also replace models.json)

Vulnerable code:

const tempPath = `${targetPath}.${process.pid}.${Date.now()}.tmp`;
await fs.writeFile(tempPath, contents, { mode: 0o600 });
await fs.rename(tempPath, targetPath);

Recommendation

Avoid writing via a predictable path with writeFile() that can follow symlinks.

Mitigations (combine):

Create the temp file securely with O_EXCL (fails if it already exists) and never follow existing symlinks:

import { open } from "node:fs/promises";

export async function writeModelsFileAtomicForModelsJson(targetPath: string, contents: string) {
  const dir = path.dirname(targetPath);
  const name = `.${path.basename(targetPath)}.${crypto.randomUUID()}.tmp`;
  const tempPath = path.join(dir, name);

  const fh = await open(tempPath, "wx", 0o600); // O_EXCL
  try {
    await fh.writeFile(contents, { encoding: "utf8" });
    await fh.sync();
  } finally {
    await fh.close();
  }
  await fs.rename(tempPath, targetPath);
}

Optionally lstat the directory and/or targetPath before and after to ensure it is not a symlink and is within the expected directory.

This prevents a local attacker from using symlinks/hardlinks to redirect writes to unintended files.

2. 🟡 Potential DoS via unbounded fs.readFile() of oversized auth-profiles.json during fingerprinting

Property	Value
Severity	Medium
CWE	CWE-400
Location	`src/agents/models-config.ts:97-105`

Description

readAuthProfilesStableHash() attempts to cap expensive parsing when auth-profiles.json exceeds MAX_AUTH_PROFILES_BYTES, but in the oversized-file branch it still reads the entire file into memory and hashes it.

Input: auth-profiles.json located under agentDir (default under the user-writable state dir, and can also be overridden via env), so a local/untrusted party who can write to that path can supply an arbitrarily large file.
Sink: await fs.readFile(pathname) reads the whole file into memory, and createHash(...).update(raw) performs CPU work proportional to file size.
Impact: memory exhaustion and CPU exhaustion (DoS) when this fingerprint function is invoked.

Vulnerable code:

if (stat.size > MAX_AUTH_PROFILES_BYTES) {
  let raw: Buffer;
  try {
    raw = await fs.readFile(pathname);
  } catch {
    return null;
  }
  return createHash("sha256").update(raw).digest("hex");
}

Recommendation

Enforce a hard maximum read size for auth-profiles.json even in the oversized-file path.

Options:

Reject oversized files outright (simplest):

if (stat.size > MAX_AUTH_PROFILES_BYTES) {
  return null; // or return a fixed marker hash like sha256("[too-large]")
}

Stream the hash with an upper bound and abort once the limit is exceeded:

import { createReadStream } from "node:fs";
import { createHash } from "node:crypto";

const stream = createReadStream(pathname, { highWaterMark: 64 * 1024 });
const hash = createHash("sha256");
let total = 0;
for await (const chunk of stream) {
  total += chunk.length;
  if (total > MAX_AUTH_PROFILES_BYTES) {
    stream.destroy(new Error("auth-profiles.json too large"));
    return null; // or marker hash
  }
  hash.update(chunk);
}
return hash.digest("hex");

This prevents allocating/hashing attacker-controlled multi-GB files and keeps the fingerprinting work bounded.

3. 🟡 Unbounded read + JSON.parse of models.json allows memory/CPU exhaustion (DoS)

Property	Value
Severity	Medium
CWE	CWE-400
Location	`src/agents/models-config.ts:225-463`

Description

ensureOpenClawModelsJson() introduces new helpers that read models.json fully into memory (sometimes multiple times) and may parse it as JSON without any file-size cap.

If an attacker (or untrusted caller) can influence agentDir (e.g. via RunEmbeddedPiAgentParams.agentDir) or can tamper with ${agentDir}/models.json, they can place an extremely large file causing:

Large Buffer/string allocations via fs.readFile()
CPU spike and event-loop blocking via JSON.parse()
Repeated reads (hashing + short-circuit comparison + plan layer) amplifying the impact

Vulnerable code paths:

readModelsJsonContentHash() reads the entire file as a Buffer.
readExistingProviderMatchesConfig() reads the entire file as UTF-8 and performs JSON.parse().

Vulnerable code:

// reads whole file into memory
const raw = await fs.readFile(pathname);

// reads whole file into a string and parses it
raw = await fs.readFile(targetPath, "utf8");
parsed = JSON.parse(raw);

Recommendation

Add a strict size cap (and ideally streaming) for models.json reads.

Suggested approach:

stat() the file first and refuse / re-plan / truncate if above a reasonable bound (e.g. 1–8 MiB).
Avoid JSON.parse() on oversized content; treat as drift/corruption and rewrite from a computed plan.
For hashing, prefer streaming (createReadStream) to avoid allocating a full buffer.

Example (size cap + streaming hash):

const MAX_MODELS_JSON_BYTES = 8 * 1024 * 1024;

async function readModelsJsonContentHash(pathname: string): Promise<string | null> {
  const stat = await fs.stat(pathname).catch(() => null);
  if (!stat) return null;
  if (stat.size > MAX_MODELS_JSON_BYTES) return null; // force re-plan / treat as corrupted

  return await new Promise((resolve, reject) => {
    const hash = createHash("sha256");
    const stream = createReadStream(pathname);
    stream.on("data", (chunk) => hash.update(chunk));
    stream.on("error", () => resolve(null));
    stream.on("end", () => resolve(hash.digest("hex")));
  });
}

async function safeReadModelsJsonUtf8(pathname: string): Promise<string | null> {
  const stat = await fs.stat(pathname).catch(() => null);
  if (!stat || stat.size > MAX_MODELS_JSON_BYTES) return null;
  return fs.readFile(pathname, "utf8");
}

Then in readExistingProviderMatchesConfig(), use the capped reader and return false (no short-circuit) when the file is too large/unreadable so the normal plan/write path can restore a valid small file.

4. 🟡 Incomplete disk-vs-config validation in models.json short-circuit allows persistent tampering of provider behavior

Property	Value
Severity	Medium
CWE	CWE-345
Location	`src/agents/models-config.ts:444-515`

Description

ensureOpenClawModelsJson() introduces a short-circuit path that skips full planning/writing of models.json when a target provider “matches” config. However, the match function only compares a subset of provider fields (baseUrl, apiKey, headers, auth).

As a result, an attacker (or accidental corruption) that can modify ${agentDir}/models.json can change other provider fields that affect runtime behavior (e.g. api transport selection, models list/model metadata, or other provider-specific knobs) while keeping those four checked fields unchanged. The gateway will then:

treat the disk state as valid and return early
skip the plan step that would otherwise normalize and overwrite models.json

This permits persistence of tampered provider settings across runs even when the caller provides a correct config, because the early return happens before the later “modelsJsonHash” cache second-factor logic.

Vulnerable code (short-circuit match omits other provider fields):

// compares only baseUrl, apiKey, headers, auth
if (!stableEqual(configuredProvider.headers, diskProvider.headers)) return false;
if (!stableEqual(configuredProvider.auth, diskProvider.auth)) return false;
return true;

Recommendation

Tighten the short-circuit criteria to ensure the on-disk provider entry is fully consistent with what planOpenClawModelsJson() would produce.

Options:

Compare the entire provider object (minus known volatile/secret-marker fields)

// simplest: require exact structural equality of the normalized provider config
// (ideally after running the same normalization used by the planner)
if (!stableEqual(configuredProvider, diskProvider)) return false;

At minimum include additional behavior-affecting fields such as api and models (and any provider-specific routing/config keys).

if (!stableEqual(configuredProvider.api, diskProvider.api)) return false;
if (!stableEqual(configuredProvider.models, diskProvider.models)) return false;

Additionally (defense-in-depth), include a models.json content hash check even in the short-circuit path (or call into the normal cache+plan path) so any tampering anywhere in the file prevents early return.

Analyzed PR: #72869 at commit 1fadcde

_{Last updated on: 2026-04-28T04:15:41Z}

greptile-apps · 2026-04-27T16:56:50Z

Greptile Summary

This PR fixes two real performance bugs in ensureOpenClawModelsJson: replacing mtime-based fingerprint inputs with a content hash (stripping volatile OAuth fields) and adding a targetProvider short-circuit that skips the discovery pipeline when the provider is already on disk. The fingerprint stabilization is well-reasoned and the implementation is clean.

P1 — short-circuit returns stale on-disk credentials: The fast path at lines 283–293 only checks that the provider key exists in both the in-memory config and models.json; it does not check whether the on-disk credentials match the current config. After a credential rotation and gateway restart (empty in-memory cache, stale models.json), the short-circuit fires and resolveModelAsync reads the old key — exactly the scenario the fingerprint cache would have caught.
P2 — \"token\" in volatile fields: type: \"token\" auth profiles use the token field as a long-lived API credential; stripping it means rotating that credential won't invalidate the fingerprint cache.
P2 — missing tests: Despite the description listing short-circuit tests, only fingerprint-cache tests appear in the added test file.

Confidence Score: 3/5

Not safe to merge as-is — the targetProvider short-circuit can silently serve stale credentials after a config rotation.

A single P1 is present on a hot path (every runEmbeddedPiAgent call). The fingerprint-cache stabilization is sound, but the short-circuit bypasses the same config-change detection the cache relies on, creating a regression for the credential-rotation scenario.

src/agents/models-config.ts lines 283–293 (short-circuit logic); src/agents/models-config.fingerprint-cache.test.ts (missing short-circuit tests).

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/agents/models-config.ts
Line: 283-293

Comment:
**Short-circuit fires on stale on-disk credentials**

`explicitHasTarget` only checks that the provider key exists in the current config (`Boolean(explicitProviders[targetProvider])`); it says nothing about whether the *content* of that config (e.g. `apiKey`, `baseUrl`) matches what was written to `models.json`. `readExistingProviderIsConfigured` only checks that *some* non-empty credential exists on disk — not that it matches the current config.

Concrete failure path: the user rotates their API key in the config, restarts the gateway (clearing the in-memory fingerprint cache), and the first call after restart has `provider = "openai"`. `explicitHasTarget` is `true` (openai still exists), `onDiskHasTarget` is `true` (old key is still a non-empty string), so the short-circuit returns `{ wrote: false }` — but `models.json` still contains the old, revoked key. `resolveModelAsync` then reads the stale file and all requests fail silently until something else forces a cache miss.

The fingerprint-based cache (lines 295–307) would have caught this because the fingerprint includes the full `config` object. The short-circuit bypasses that check entirely.

A minimal fix would be to also verify that the provider's entry in models.json structurally matches what the current config would produce — or at minimum, not short-circuit when the fingerprint-cache is empty (cold start), because that is precisely the scenario where stale disk state needs to be reconciled.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/models-config.fingerprint-cache.test.ts
Line: 101-226

Comment:
**`targetProvider` short-circuit is untested**

The PR description lists six test cases, three of which cover the `targetProvider` short-circuit ("hits when provider is configured with usable credentials", "fall-through when credentials are missing", "ignored when not set"). None of those tests appear in this file — only the four fingerprint-cache tests are present.

The missing case that matters most is *short-circuit fires with stale on-disk credentials*: after a config credential rotation the short-circuit should fall through to the fingerprint path, not return early with the old key still on disk.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/models-config.ts
Line: 24-36

Comment:
**`"token"` in volatile fields strips long-lived credentials**

`"token"` is included as a volatile field, which is appropriate for OAuth session tokens. However, `type: "token"` profiles in `auth-profiles.json` (e.g., the `"anthropic:default"` profile in the test) use `token` as the long-lived API credential — not as a refresh-cycle access token. When that credential is revoked and replaced in `auth-profiles.json`, the content hash won't change (the field is stripped), so the fingerprint-based cache won't invalidate, and `models.json` won't be updated with the new credential until a full cache reset.

If `"token"` here is indeed a short-lived session token that gets auto-refreshed by Anthropic's CLI auth flow (similar to `access`/`refresh`), this is intentional and should be documented on the field. If it can also be a manually-managed long-lived key, it should be excluded from the volatile set (or handled conditionally based on `type`).

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "Merge main into perf/models-config-start..." | Re-trigger Greptile}

greptile-apps · 2026-04-27T16:56:57Z

+const AUTH_PROFILE_VOLATILE_FIELDS: ReadonlySet<string> = new Set([
+  "access",
+  "refresh",
+  "token",
+  "expires",
+  "expiresAt",
+  "expiresIn",
+  "issuedAt",
+  "refreshedAt",
+  "lastCheckedAt",
+  "lastRefreshAt",
+  "lastValidatedAt",
+]);


"token" in volatile fields strips long-lived credentials

"token" is included as a volatile field, which is appropriate for OAuth session tokens. However, type: "token" profiles in auth-profiles.json (e.g., the "anthropic:default" profile in the test) use token as the long-lived API credential — not as a refresh-cycle access token. When that credential is revoked and replaced in auth-profiles.json, the content hash won't change (the field is stripped), so the fingerprint-based cache won't invalidate, and models.json won't be updated with the new credential until a full cache reset.

If "token" here is indeed a short-lived session token that gets auto-refreshed by Anthropic's CLI auth flow (similar to access/refresh), this is intentional and should be documented on the field. If it can also be a manually-managed long-lived key, it should be excluded from the volatile set (or handled conditionally based on type).

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/models-config.ts Line: 24-36 Comment: **`"token"` in volatile fields strips long-lived credentials** `"token"` is included as a volatile field, which is appropriate for OAuth session tokens. However, `type: "token"` profiles in `auth-profiles.json` (e.g., the `"anthropic:default"` profile in the test) use `token` as the long-lived API credential — not as a refresh-cycle access token. When that credential is revoked and replaced in `auth-profiles.json`, the content hash won't change (the field is stripped), so the fingerprint-based cache won't invalidate, and `models.json` won't be updated with the new credential until a full cache reset. If `"token"` here is indeed a short-lived session token that gets auto-refreshed by Anthropic's CLI auth flow (similar to `access`/`refresh`), this is intentional and should be documented on the field. If it can also be a manually-managed long-lived key, it should be excluded from the volatile set (or handled conditionally based on `type`). How can I resolve this? If you propose a fix, please make it concise.

Fixed in d505fa064c — same fix as the Codex thread above. token removed from the volatile set.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b2a26d37d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T16:58:10Z

    config: params.config,
    sourceConfigForSecrets: params.sourceConfigForSecrets,
    envShape,
-    authProfilesMtimeMs,
-    modelsFileMtimeMs,
+    authProfilesHash,
  });


Include models.json drift in cache key

This fingerprint no longer includes any models.json state, so once MODELS_JSON_STATE.readyCache is populated, unchanged config/auth inputs return early and never re-run planOpenClawModelsJson. If models.json is edited externally (manual edit, other process, partial corruption) after the first run, ensureOpenClawModelsJson will keep returning cached success and leave the file stale for the rest of the process. The previous mtime-based key at least forced re-planning after file edits.

Useful? React with 👍 / 👎.

Acknowledged. The models.json drift / external-edit issue overlaps with the targetProvider short-circuit semantics and the cache-key shape, so I'm holding it for a follow-up where we can settle the approach (always re-plan vs. validate-disk-matches-config vs. content-hash the file into the fingerprint). The security fixes that just landed (d505fa0) are independent and should be safe to land first.

Fixed in 235e3c2dcf. The fingerprint cache now has a two-factor key: fingerprint (config + auth-profiles + plugin metadata, hashed) AND modelsJsonHash (SHA-256 of the file as captured immediately after the plan-and-write). On every cache check, we recompute the file hash and compare; any external edit / corruption / tamper invalidates. Captured at all three plan return paths (skip, noop, write) so the second factor is always recorded.

The earlier comment about 'never include models.json state' was about avoiding self-invalidation \u2014 the new design captures the hash AT WRITE TIME, not as part of the input fingerprint, so the loop that previously caused every run to invalidate its own cache is avoided.

chatgpt-codex-connector · 2026-04-27T16:58:11Z

+  "access",
+  "refresh",
+  "token",
+  "expires",
+  "expiresAt",


Keep token credentials in auth-profile fingerprint

Treating token as a volatile field masks real auth-state changes for type: "token" profiles. Provider discovery derives usable credentials from cred.token (via resolveApiKeyFromCredential), so token updates that do not add/remove profile IDs will no longer invalidate this cache. That can leave implicit providers stuck in an outdated configured/unconfigured state until the cache is reset or the process restarts.

Useful? React with 👍 / 👎.

Fixed in d505fa064c. Removed 'token' from AUTH_PROFILE_VOLATILE_FIELDS. The volatile set now keeps only OAuth session fields (access, refresh) and timing fields. Static type: 'token' profiles' literal token key is now part of the fingerprint, so rotating a static API token invalidates the cache as expected. Documented the boundary inline.

…artup-cache-upstream # Conflicts: # src/agents/models-config.ts

Five issues raised by Aisle / Codex / Greptile review on PR openclaw#72869, addressed inline rather than deferred: 1. CWE-59 symlink-following chmod (Aisle high #1) ensureModelsFileModeForModelsJson called fs.chmod on a path that may be replaced by a symlink. If an attacker can write to the agent dir (or OPENCLAW_AGENT_DIR points there), the chmod followed the link and changed perms on an arbitrary owned file. Now lstat first and refuse to chmod symlinks or non-regular files. 2. Prototype pollution via JSON keys (Aisle medium #3 / CWE-1321) stripAuthProfilesVolatileFields() copied untrusted keys into a plain {} object. Special keys '__proto__', 'constructor', 'prototype' could mutate the result's prototype chain. Now uses Object.create(null) for the result and explicitly filters those three keys (belt-and-suspenders). 3. DoS via unbounded auth-profiles fingerprinting (Aisle medium #4) readAuthProfilesStableHash had no size or depth limits. - Added MAX_AUTH_PROFILES_BYTES = 8 MiB. Above the cap we hash raw bytes instead of running JSON.parse + recursive transform + stable-stringify. - Added MAX_AUTH_PROFILES_DEPTH = 64 with a depth-cap marker so the recursive walk can't stack-overflow on pathologically nested input. 4. 'token' incorrectly stripped from fingerprint (Codex P2 / Greptile P2) AUTH_PROFILE_VOLATILE_FIELDS included 'token' to keep OAuth session token rotation from invalidating the cache. But profiles with type: 'token' use the literal 'token' key as a long-lived static credential — stripping it would mask real auth-state changes when a user rotates a static API token. Removed 'token' from the volatile set and documented the boundary inline. OAuth session fields ('access', 'refresh') and timing fields stay stripped. Skipped from this commit (will reply on threads): - Cache short-circuit on stale on-disk credentials (Codex/Greptile P1): separate concern from the security fixes; needs design discussion on whether to validate disk-vs-config or remove the short-circuit. - models.json drift in cache key (Codex P1): same — touches the fingerprint shape and overlaps with the targetProvider short-circuit. - targetProvider short-circuit untested (Greptile P2): test follow-up once the short-circuit semantics are settled. - Aisle medium #5 (raw secrets in fingerprint cache): structurally larger refactor; needs to land separately to keep this commit\'s blast radius clear. Lint: 0 errors. TS: clean.

zeroaltitude · 2026-04-28T02:36:37Z

Three of the five Aisle findings landed in d505fa064c:

🟠 fix: add @lid format support and allowFrom wildcard handling #1 High — Symlink-following chmod (CWE-59): Now lstats the path first; refuses to chmod symlinks or non-regular files. The O_NOFOLLOW open+fchmod variant from your recommendation is even stronger but adds an open-then-close on every cache hit; the lstat check closes the same vulnerability with no extra fd traffic.
🟡 WA business, groups & office hours #3 Medium — Prototype pollution: stripAuthProfilesVolatileFields() now uses Object.create(null) for the result and explicitly filters __proto__, prototype, constructor (belt-and-suspenders).
🟡 Images not passed to Claude CLI - only path reference in text #4 Medium — Unbounded hashing DoS: Added MAX_AUTH_PROFILES_BYTES = 8 MiB (above the cap we hash raw bytes instead of parsing) and MAX_AUTH_PROFILES_DEPTH = 64 (recursive walk caps with a stable depth-marker).

Holding two for follow-ups:

🟠 Login fails with 'WebSocket Error (socket hang up)' ECONNRESET #2 High — Config drift bypass: overlaps with Codex/Greptile P1 review comments; needs design discussion on whether to always re-plan or validate-disk-matches-config. Tracked.
🟡 CLI: add Opencode integration #5 Medium — Raw secrets in fingerprint cache: structurally larger refactor (move from raw stable-stringified config to redacted-shape + secret-hash) that I want to land in its own PR with focused review.

@zeroaltitude

…ests Three review-driven fixes per @zeroaltitude direction (b)+(c) + secret hygiene + tests: (b) Validate disk-vs-config before short-circuiting [Aisle High #2 / Codex P1 / Greptile P1 on PR openclaw#72869] The previous targetProvider short-circuit fired whenever the on-disk provider entry contained ANY non-empty credential. That silently bypassed: - rotated apiKey: cold start with new key, old key on disk, short-circuit fires, all calls fail until something else invalidates - attacker-tampered baseUrl: redirect to exfil endpoint kept - attacker-injected headers: arbitrary auth material kept New readExistingProviderMatchesConfig() does a strict structural comparison: apiKey - resolved through resolveSecretInputRef (env-ref expansion via createConfigRuntimeEnv) before string equality vs. disk baseUrl - exact string equality headers - stable structural equality (key-order independent) auth - stable structural equality Any mismatch (or any state we cannot conclusively verify, like a non-env secret ref) returns false and falls through to full planning. The short-circuit is now safe to use on cold start and after gateway restart. (c) Hash models.json content into the cache key [Codex P1 on PR openclaw#72869] Previous fingerprint had no models.json input \u2014 once the cache was populated, unchanged config/auth returned cached success even after the file was edited externally / partially corrupted / manually tampered. Now readyCache stores both the input fingerprint AND the post-write models.json SHA-256. Cache hit requires both to match; any external edit invalidates. Captures the hash at three points (skip path, noop path, write path) so the second factor is always recorded. Aisle medium #5: hash fingerprint before storage Raw stable-stringified config (including apiKey strings) used to sit verbatim in MODELS_JSON_STATE.readyCache. SHA-256 over the canonical payload is now the cache key \u2014 deterministic but not reversible, so heap snapshots / debug telemetry / core dumps can't leak secrets via the readyCache state. Greptile P2: targetProvider short-circuit tests New file models-config.target-provider-short-circuit.test.ts with 6 cases: - hit-on-match (full structural match short-circuits) - miss-on-rotated-key (config apiKey change forces plan) - miss-on-baseUrl-change (tampered disk baseUrl rejects) - miss-on-tampered-headers (any header drift rejects) - miss-on-cold-cache (no disk file forces plan) - hit-after-warm-fingerprint + invalidation on external models.json edit (modelsJsonHash second-factor verified) Existing fingerprint-cache test updated: The 'volatile fields rotate' test mixed type:oauth (correctly volatile) with type:token (now correctly NOT volatile after d505fa0). Split into two tests: - OAuth session-field rotation does NOT invalidate (existing intent, narrowed to oauth-only profiles) - Static type:token credential rotation DOES invalidate (Codex/Greptile P2 - new correct behavior) State shape change: MODELS_JSON_STATE.readyCache value extended with modelsJsonHash: { fingerprint, modelsJsonHash, result } All three return paths in the plan closure capture this. Tests: 13/13 (6 new + 7 existing fingerprint-cache + file-mode). Lint: 0 errors. TS: clean.

zeroaltitude · 2026-04-28T03:53:40Z

🟡 #5 fixed in 235e3c2dcf. buildModelsJsonFingerprint now returns createHash('sha256').update(canonical).digest('hex') instead of the raw stable-stringified payload. The cache key only needs to be deterministic, not reversible \u2014 so the readyCache no longer holds verbatim config / sourceConfigForSecrets (including apiKey strings) in process memory. Heap snapshots / debug telemetry / core dumps can't leak secrets through this path anymore.

🟠 #2 (config drift bypass) also fixed in the same commit \u2014 see the Codex/Greptile P1 thread above for details (disk-vs-config structural comparison + models.json content hash as a second cache factor).

…artup-cache-upstream # Conflicts: # src/agents/models-config.ts

zeroaltitude · 2026-04-28T04:33:43Z

This PR has been split into two focused PRs for cleaner review:

🪪 perf(models-config): content-hash auth-profiles + models.json drift detection #73260 perf/models-config-cache-fingerprint — content-hash auth-profiles + models.json drift detection + Aisle security hardening (CWE-59 symlink chmod, CWE-1321 prototype pollution, CWE-312 secrets-in-cache, DoS hardening)
⚡ perf(models-config): targetProvider short-circuit with disk-vs-config validation #73261 perf/models-config-target-provider-short-circuit — targetProvider fast path with disk-vs-config structural validation (apiKey resolved through env-refs, baseUrl/headers/auth via stable equality)

Each new PR can land independently. The cache PR includes the security findings from the Aisle review on this thread; the short-circuit PR addresses Codex P1 / Greptile P1 / Aisle High #2 (the disk-trust-without-validation finding) with field-level comparison.

The branches are clean off origin/main (no merge cruft from this thread). Tests and lint are clean on both.

Closing this PR in favor of the split. Please move review attention to #73260 and #73261.

zeroaltitude · 2026-04-28T05:21:25Z

Closing this PR. Branch archived as archive/perf/models-config-startup-cache-upstream (still recoverable from origin via that ref). Forward all review attention to #73260 (cache) and #73261 (short-circuit).

zeroaltitude added 5 commits April 23, 2026 08:18

Merge branch 'main' into perf/models-config-startup-cache-upstream

dbec380

# Conflicts: # src/agents/pi-embedded-runner/run.ts

Merge branch 'main' into perf/models-config-startup-cache-upstream

1322255

Merge main into perf/models-config-startup-cache-upstream

6b2a26d

openclaw-barnacle Bot added agents Agent runtime and tooling size: M labels Apr 27, 2026

zeroaltitude marked this pull request as ready for review April 27, 2026 16:53

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into perf/models-config-st…

6d309f6

…artup-cache-upstream # Conflicts: # src/agents/models-config.ts

openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime channel: feishu Channel integration: feishu triage: dirty-candidate Candidate: broad unrelated surfaces; may need splitting or cleanup. labels Apr 28, 2026

openclaw-barnacle Bot added size: L and removed size: M labels Apr 28, 2026

openclaw-barnacle Bot added size: XL and removed size: L labels Apr 28, 2026

Merge remote-tracking branch 'origin/main' into perf/models-config-st…

1fadcde

…artup-cache-upstream # Conflicts: # src/agents/models-config.ts

openclaw-barnacle Bot added channel: slack Channel integration: slack commands Command implementations labels Apr 28, 2026

This was referenced Apr 28, 2026

perf(models-config): content-hash auth-profiles + models.json drift detection #73260

Open

perf(models-config): targetProvider short-circuit with disk-vs-config validation #73261

Open

zeroaltitude closed this Apr 28, 2026

zeroaltitude deleted the perf/models-config-startup-cache-upstream branch April 28, 2026 05:19

Uh oh!

Conversation

zeroaltitude commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix 1 — Stabilize the cache fingerprint (commit 918f3f9173)

Fix 2 — targetProvider short-circuit (commit 74e4d04771)

Tests

AI/Vibe-Coded disclosure

Uh oh!

aisle-research-bot Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

Description

Recommendation

Description

Recommendation

Description

Recommendation

Description

Recommendation

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

Confidence Score: 3/5

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

zeroaltitude Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

zeroaltitude Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

zeroaltitude Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

zeroaltitude Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

zeroaltitude commented Apr 28, 2026

Uh oh!

zeroaltitude commented Apr 28, 2026

Uh oh!

zeroaltitude commented Apr 28, 2026

Uh oh!

zeroaltitude commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zeroaltitude commented Apr 27, 2026 •

edited

Loading

Fix 1 — Stabilize the cache fingerprint (commit `918f3f9173`)

Fix 2 — `targetProvider` short-circuit (commit `74e4d04771`)

aisle-research-bot Bot commented Apr 27, 2026 •

edited

Loading