perf(models-config): targetProvider short-circuit with disk-vs-config validation#73261
perf(models-config): targetProvider short-circuit with disk-vs-config validation#73261zeroaltitude wants to merge 26 commits intoopenclaw:mainfrom
Conversation
…etection Replaces mtime-based fingerprinting with content-based hashes for the implicit-provider-discovery cache key, plus a second-factor models.json content hash that catches external edits / partial corruption / sibling process writes between cache hits. Includes the security hardening findings raised by Aisle on the original combined PR (openclaw#72869). # Why content hashes The previous mtime-based key invalidated on every OAuth token refresh because auth-profiles.json gets rewritten with new access/refresh timestamps even when the set of available providers doesn't change. Same for models.json mtime: the file is the OUTPUT of this function, so each call observed its own write and invalidated the next call. Now: - auth-profiles.json: SHA-256 over a stable serialization that strips volatile OAuth session fields (access, refresh, expires*, issuedAt, refreshed/lastChecked/lastRefresh/lastValidatedAt). Token rotation no longer invalidates; structural changes (added/removed profiles, rotated static type:token credentials) still do. - models.json: NOT included in the input fingerprint (would cause self-invalidation). Instead its content hash is captured at write time and stored alongside the readyCache entry. Every cache check recomputes the file hash and compares; any external edit / corruption / tamper invalidates the cache and forces re-plan. # Security hardening (Aisle review on PR openclaw#72869) - 🟠 High #1 (CWE-59 symlink-following chmod): ensureModelsFileMode now lstats first and refuses to chmod symlinks or non-regular files. Closes the attack where a symlink at ${agentDir}/models.json points at a sensitive owned file and our best-effort chmod follows. - 🟡 Med #3 (CWE-1321 prototype pollution): stripAuthProfilesVolatile now uses Object.create(null) for the result and explicitly filters __proto__ / prototype / constructor keys (belt-and-suspenders). - 🟡 Med #4 (DoS via unbounded fingerprinting): MAX_AUTH_PROFILES_BYTES = 8 MiB cap (above which we hash raw bytes), MAX_AUTH_PROFILES_DEPTH = 64 with a depth-cap marker so deeply-nested JSON cannot stack- overflow the gateway. - 🟡 Med #5 (CWE-312 secrets in cache): buildModelsJsonFingerprint now returns SHA-256 hex of the canonical payload instead of the raw stable-stringified payload. The readyCache holds no verbatim secret material; heap snapshots / debug telemetry / core dumps cannot leak apiKey strings via this path. # Token-rotation correctness "token" is intentionally NOT in AUTH_PROFILE_VOLATILE_FIELDS even though OAuth-style 'token' fields rotate. Profiles with type: 'token' use the literal token key as a long-lived static credential, and stripping it would mask real auth-state changes when a user rotates that credential. Documented inline. # Tests Two tests in models-config.fingerprint-cache.test.ts: - 'does not invalidate the cache when OAuth session fields rotate' (rotates access/refresh/expires; cache stays valid) - 'DOES invalidate the cache when a static type:token credential rotates' (rotates the literal token field; cache invalidates) # Compatibility MODELS_JSON_STATE.readyCache value shape extended with modelsJsonHash: { fingerprint, modelsJsonHash, result }. All three plan return paths (skip/noop/write) capture the post-write hash. The refreshedFingerprint re-key path forwards modelsJsonHash through unchanged. This is one of two PRs split out of openclaw#72869. The targetProvider short-circuit work landed separately.
… validation
Add a fast path to ensureOpenClawModelsJson for callers that know which
provider/model they're about to use (e.g. the pi-embedded runner).
When set, the implicit-provider-discovery pipeline can be skipped IF
the on-disk models.json provider entry STRUCTURALLY matches what the
current configuration would produce.
# Why structural comparison
A naive 'is configured' check (does any non-empty credential exist on
disk for this provider) silently bypasses several real failure modes:
- Rotated apiKey: cold start with new key in config, old key still on
disk, presence-only check fires, all calls fail with 401 until
something else invalidates.
- Attacker-tampered baseUrl: redirect to exfil endpoint kept across
restarts.
- Attacker-injected headers: arbitrary auth material kept across
restarts.
The new readExistingProviderMatchesConfig compares all four
security-relevant fields:
apiKey - resolved through resolveSecretInputRef (env-ref expansion
via createConfigRuntimeEnv) before string equality vs. disk
baseUrl - exact string equality
headers - stable structural equality (key-order independent)
auth - stable structural equality
Any mismatch falls through to full planning + writes. This is safe to
use on cold start and after gateway restart.
# API
Adds EnsureOpenClawModelsJsonOptions exported type with targetProvider
and targetModel, layered on top of the existing options shape (kept
all of pluginMetadataSnapshot, workspaceDir, providerDiscoveryProviderIds,
providerDiscoveryTimeoutMs).
# Tests
5 cases in models-config.target-provider-short-circuit.test.ts:
- hit-on-match: full structural match short-circuits planning
- miss-on-rotated-key: config apiKey change forces a full plan
- miss-on-baseUrl-change: tampered disk baseUrl rejects short-circuit
- miss-on-tampered-headers: any disk header drift rejects short-circuit
- miss-on-cold-cache: empty in-memory cache + missing disk file forces a plan
- hit-after-warm-fingerprint: warm in-memory cache reuse
# Dependencies
This is one of two PRs split out of the original combined openclaw#72869.
The cache-fingerprint work (auth-profiles content hash, models.json
drift detection, fingerprint hashing, security hardening) landed
separately as perf/models-config-cache-fingerprint.
This branch can land independently. When both branches land, the
original PR openclaw#72869 can be closed as superseded.
🔒 Aisle Security AnalysisWe found 4 potential security issue(s) in this PR:
1. 🟠 TOCTOU symlink swap enables arbitrary file read and chmod via models.json short-circuit path
DescriptionIn This creates a classic time-of-check/time-of-use window:
Additionally, Vulnerable code: const lst = await fs.lstat(targetPath).catch(() => null);
if (!lst || lst.isSymbolicLink() || !lst.isFile()) {
return false;
}
if (lst.size > MAX_MODELS_JSON_SHORT_CIRCUIT_BYTES) {
return false;
}
...
raw = await fs.readFile(targetPath, "utf8");and: await fs.chmod(pathname, 0o600).catch(() => {
// best-effort
});RecommendationMake the read and permission-setting operations symlink-safe and race-resistant by operating on an already-opened file descriptor and validating it with One approach (POSIX): import { open } from "node:fs/promises";
import { constants as FS } from "node:fs";
const fh = await open(targetPath, FS.O_RDONLY | FS.O_NOFOLLOW);
try {
const st = await fh.stat();
if (!st.isFile()) return false;
if (st.size > MAX_MODELS_JSON_SHORT_CIRCUIT_BYTES) return false;
const raw = await fh.readFile({ encoding: "utf8" });
...
} finally {
await fh.close();
}For mode enforcement, avoid
This removes the TOCTOU window and prevents symlink-following reads/chmods. 2. 🟠 SSRF / credential exfiltration via models.json per-model baseUrl/api not validated in targetProvider short-circuit
DescriptionThe However, This is security-relevant because the runtime later uses per-model fields originating from
Vulnerable logic (short-circuit trusts disk even if // Other provider fields (models[], ... compat, etc.) are NOT compared.
// ...
if (!stableEqual(configuredProvider.baseUrl, diskProvider.baseUrl)) return false;
// apiKey, headers, auth comparisons...
return true;Runtime sink consuming per-model baseUrl from models.json: baseUrl: providerConfig.baseUrl ?? discoveredModel.baseUrl,
...
baseUrl: requestConfig.baseUrl ?? discoveredModel.baseUrl,RecommendationDo not short-circuit unless all request-destination and request-shaping fields that can affect runtime network calls are proven consistent between config and disk. At minimum, include structural comparison of Example defensive approach (prefer correctness over speed): // Pseudocode: extend comparison to cover model-level transport fields
if (!stableEqual(configuredProvider.models, diskProvider.models)) return false;
if (!stableEqual(configuredProvider.request, diskProvider.request)) return false;
if (!stableEqual(configuredProvider.api, diskProvider.api)) return false;Alternatively, make the short-circuit conditional on Also consider validating that any resolved 3. 🟡 Potential stack/CPU DoS via recursive stableStringify on attacker-controlled models.json fields
Description
In
Vulnerable code: function stableEqual(a: unknown, b: unknown): boolean {
return stableStringify(a) === stableStringify(b);
}and the recursive serializer: function stableStringify(value: unknown): string {
if (value === null || typeof value !== "object") return JSON.stringify(value);
if (Array.isArray(value)) return `[${value.map((entry) => stableStringify(entry)).join(",")}]`;
const entries = Object.entries(value as Record<string, unknown>).toSorted(([a], [b]) =>
a.localeCompare(b),
);
return `{${entries.map(([key, entry]) => `${JSON.stringify(key)}:${stableStringify(entry)}`).join(",")}}`;
}Even though RecommendationAvoid unbounded recursive stringification on untrusted data. Options:
Example depth-limited stringify: function stableStringifySafe(value: unknown, depth = 0, maxDepth = 200): string {
if (depth > maxDepth) throw new Error("max depth exceeded");
if (value === null || typeof value !== "object") return JSON.stringify(value) ?? "null";
if (Array.isArray(value)) {
return `[${value.map(v => stableStringifySafe(v, depth + 1, maxDepth)).join(",")}]`;
}
const rec = value as Record<string, unknown>;
const keys = Object.keys(rec).sort();
return `{${keys.map(k => `${JSON.stringify(k)}:${stableStringifySafe(rec[k], depth + 1, maxDepth)}`).join(",")}}`;
}
function stableEqual(a: unknown, b: unknown): boolean {
try {
return stableStringifySafe(a) === stableStringifySafe(b);
} catch {
return false; // fall through to full plan
}
}This keeps the short-circuit check robust while preventing stack/CPU DoS. 4. 🔵 TOCTOU race in models.json short-circuit integrity check leads to stale readyCache
Description
Because there is no locking or post-check revalidation of the file’s contents/identity, another process (or thread) with concurrent write access to Impact:
Vulnerable code (short-circuit path): const matches = await readExistingProviderMatchesConfig(...);
if (matches) {
await ensureModelsFileModeForModelsJson(targetPath);
const result = { agentDir, wrote: false };
MODELS_JSON_STATE.readyCache.set(cacheKey, Promise.resolve({ fingerprint, result }));
return result;
}RecommendationMake the short-circuit check robust against concurrent modification before caching/returning: Option A (simplest): run the short-circuit path under the same Option B (stronger, cross-process TOCTOU mitigation): use an atomic check strategy and only cache if the file is unchanged:
Example sketch: const fd = await fs.open(targetPath, 'r');
try {
const s1 = await fd.stat();
const raw = await fd.readFile('utf8');
const parsed = JSON.parse(raw);
const ok = /* compare parsed vs config */;
const s2 = await fd.stat();
if (ok && s1.ino === s2.ino && s1.mtimeMs === s2.mtimeMs && s1.size === s2.size) {
MODELS_JSON_STATE.readyCache.set(cacheKey, Promise.resolve({ fingerprint, result }));
return result;
}
} finally {
await fd.close();
}Also consider avoiding caching the short-circuit result unless the check is performed under a lock or validated with a file-identity/mtime recheck. Analyzed PR: #73261 at commit Last updated on: 2026-04-28T05:45:21Z Re-review progress:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 682c12624c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Greptile SummaryAdds a Confidence Score: 5/5PR is safe to merge — no P0 or P1 findings; all previously flagged security issues confirmed resolved. All four security-critical field comparisons (apiKey, baseUrl, headers, auth) are symmetric and handle the undefined-vs-string drift case. The readyCache ordering fix is verified by the new fs.readFile spy test. Prototype injection, symlink, and oversized-file vectors are guarded. No new attack surfaces introduced. No files require special attention. Reviews (2): Last reviewed commit: "fix(models-config): symmetric baseUrl, e..." | Re-trigger Greptile |
Addresses Codex P1, Greptile P2, and Aisle Med #1/#2/#3 on openclaw#73260: # Streaming bounded hash (Codex P1 / Aisle Med #2) The previous oversize-branch in readAuthProfilesStableHash still did fs.readFile(path) which loaded the entire file into memory \u2014 the MAX_AUTH_PROFILES_BYTES cap was effectively unenforced. Same problem for readModelsJsonContentHash (no cap at all). New helper safeHashRegularFile(): - lstat first; reject symlinks and non-regular files - bail at lstat-time if size exceeds maxBytes (returns sentinel hash so size-change still invalidates the cache) - open with O_NOFOLLOW where supported (closes lstat\u2192open TOCTOU) - fstat after open to verify it's still a regular file - createReadStream(fd) with bounded highWaterMark; destroy if accumulated bytes exceed maxBytes (handles the case where attacker grows the file between lstat and stream completion) Both readers now route through safeHashRegularFile. Models.json gets a 1 MiB cap (MAX_MODELS_JSON_BYTES); auth-profiles keeps its 8 MiB cap. # O_NOFOLLOW chmod (Aisle Med #1, CWE-367) The previous lstat-then-chmod sequence was racy: an attacker who could rename/replace ${agentDir}/models.json between the two syscalls could have chmod() follow a swapped-in symlink to an arbitrary file. Replaced with fs.open(path, O_RDONLY | O_NOFOLLOW) \u2192 fchmod via the file handle. The open itself refuses symlinks atomically; the fchmod operates on the validated fd, eliminating the race. When O_NOFOLLOW is unavailable (rare but possible), falls back to plain O_RDONLY \u2014 the prior lstat protection was already best-effort. # Symlink-safe reads (Aisle Med #3, CWE-59) Previous fs.readFile in the hash readers followed symlinks. With the safeHashRegularFile refactor above, both auth-profiles.json and models.json now reject symlinks before reading. # JSDoc fix (Greptile P2) readModelsJsonContentHash's prior comment said "forces a re-plan" when the file is absent, but the call site does null === null comparison which is a cache hit, not a re-plan (the steady-state skip case for plan.action === 'skip'). Updated the comment to explain the actual semantics: stable absence is a stable result; drift is a re-plan. # Tests 14/14 existing tests pass. No new test cases needed \u2014 the existing file-mode and fingerprint-cache tests cover both the symlink-rejection paths (via lstat shortcut) and the modelsJsonHash second-factor. Lint: 0 errors.
…tegration Addresses Codex P2, Greptile P1+P2x2, and Aisle High #1 + Med #2/#3/#4 on PR openclaw#73261: # Greptile P1 / Aisle High #1: asymmetric baseUrl (CWE-918, SSRF) The previous guard: if (typeof configuredProvider.baseUrl === 'string' && configuredProvider.baseUrl !== diskProvider.baseUrl) short-circuited the check entirely when config omitted baseUrl (common for providers with compiled-in defaults). An attacker who could tamper with on-disk models.json could set baseUrl to an exfil endpoint and the short-circuit would accept it silently \u2014 exactly the SSRF/credential-exfil vector this PR was meant to close. Replaced with symmetric stableEqual() so config-undefined vs disk-string is a mismatch and falls through to full planning, which re-applies provider/plugin defaults and rewrites the file. # Codex P2: env-ref API key comparison resolveConfiguredApiKeyForCompare resolved env refs to the env-var VALUE (env[ref.id]). But planOpenClawModelsJson persists env-source api keys to models.json as the env-var NAME (e.g. 'OPENAI_API_KEY'), not the value \u2014 that's what resolveApiKeyFromCredential returns for env source. Comparing value-vs-name always mismatched, so the short-circuit never fired for the most common config form ('apiKey: "${env.OPENAI_API_KEY}"'). Now the helper returns the env-var NAME for env-source refs, while still verifying the env is currently populated (so we don't short-circuit on a misconfigured environment). Plaintext values still compare directly. # Greptile P2 (perf): readyCache integration The short-circuit returned before the readyCache check, so warm callers paid the disk read + JSON.parse cost on every call. Reordered: 1. compute fingerprint (cheap) 2. check readyCache \u2014 warm hit returns immediately, no disk I/O 3. if cold, attempt short-circuit on disk 4. if short-circuit succeeds, populate readyCache so subsequent calls take the warm path Net effect: warm callers now skip disk entirely; cold callers with intact disk state still get the short-circuit benefit; full plan fires only on real drift. # Aisle Med #2 (CWE-59): symlink-following short-circuit reads readExistingProviderMatchesConfig used fs.readFile on a possibly- symlinked target. Now lstat-checks first and refuses symlinks / non-regular files. # Aisle Med #3 (CWE-1321): prototype-chain key access explicitProviders[targetProvider] could fall through to a prototype key like '__proto__' / 'constructor' / 'prototype'. Now uses Object.hasOwn at both lookup sites (caller AND readExistingProviderMatchesConfig) to refuse inherited keys. Also explicit string check rejects the three known dangerous keys. # Aisle Med #4 (CWE-400): unbounded models.json read Added MAX_MODELS_JSON_SHORT_CIRCUIT_BYTES = 1 MiB cap on the disk read in the short-circuit path. Files above the cap fall through to full planning rather than being parsed. # Tests Updated hit-after-warm-fingerprint test (Greptile P2) to use fs.readFile spy and assert no disk read occurs on the warm path. Added short-circuit-populates-cache test that drops the in-memory cache between runs, fires the short-circuit, then verifies the third call takes the warm path with no disk read. 14/14 tests pass. Lint: 0 errors.
|
All five findings landed in
|
|
@greptile-apps: please do re-review/re-score |
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. Source inspection shows current main lacks target-provider/model hints and the embedded callers pass only Real behavior proof Next step before merge Security Review detailsBest possible solution: Rebase and reconcile the branch with current main and the companion cache-fingerprint work, then land after normal maintainer review and exact-head gates. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection shows current main lacks target-provider/model hints and the embedded callers pass only Is this the best way to solve the issue? Yes for the inspected implementation direction. The latest head binds scoped cache entries to the validated Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 4721ca8e459f. |
…rget-provider-short-circuit # Conflicts: # src/agents/models-config.ts
…rget-provider-short-circuit
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e4c01b6b8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Populate readyCache so the next call with identical inputs | ||
| // takes the warm-cache path above without re-reading disk. | ||
| MODELS_JSON_STATE.readyCache.set(cacheKey, Promise.resolve({ fingerprint, result })); | ||
| return result; |
There was a problem hiding this comment.
Scope ready-cache entries to short-circuit mode
When a targetProvider short-circuit succeeds, this stores a global readyCache entry keyed only by fingerprint, even though only one provider was validated. A later ensureOpenClawModelsJson(...) call without targetProvider can hit that cache and skip full planning, leaving other providers stale on disk. This is reproducible whenever one caller does targeted prewarm and another later expects full reconciliation in the same process.
Useful? React with 👍 / 👎.
| } else if (typeof diskProvider.apiKey === "string" && diskProvider.apiKey.length > 0) { | ||
| return false; |
There was a problem hiding this comment.
Treat non-string disk apiKey as mismatch
In the resolvedConfiguredApiKey === undefined branch, only non-empty string diskProvider.apiKey values are rejected. If models.json has a non-string apiKey (for example tampered object/number) while config omits apiKey, this function still returns a structural match and short-circuits planning. That preserves malformed or attacker-modified provider state instead of forcing a rewrite.
Useful? React with 👍 / 👎.
…rget-provider-short-circuit # Conflicts: # CHANGELOG.md # docs/.generated/plugin-sdk-api-baseline.sha256 # src/agents/models-config.ts
💡 Codex Reviewopenclaw/src/agents/models-config.ts Lines 790 to 793 in 8f1c6dc When ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…rget-provider-short-circuit
…g short-circuit (review feedback) Codex P2 round-6 on PR openclaw#73261: 'Accept implicit env apiKey markers in short-circuit compare' Paraphrased problem ------------------- When configuredProvider.apiKey is unset, the round-5 unset-config branch rejected ANY non-empty diskProvider.apiKey, but planOpenClawModelsJson legitimately persists env-derived markers even with no configured key (via resolveMissingProviderApiKey in src/agents/models-config.providers.normalize.ts -> secret-helpers.ts). In the dominant implicit-discovery setup (provider has models, auth comes from OPENAI_API_KEY env or AWS SDK env chain), the planner writes apiKey: "OPENAI_API_KEY" (the env-var name marker) to disk on the first pass. With round-5's strict rejection, every subsequent restart would miss the targetProvider short-circuit and re-run full implicit discovery, negating the new perf path despite disk and config being semantically aligned. Solution -------- Extend the unset-config branch to also accept disk apiKey values that match what resolveMissingProviderApiKey would have persisted. The decision tree mirrors the planner: - providerAuth === "aws-sdk": call resolveAwsSdkApiKeyVarName(env); accept iff disk equals that env-var name AND env[awsEnvVar] is populated (liveness check matches the existing resolveConfiguredApiKeyForCompare contract for env-source refs). - otherwise: call resolveEnvApiKeyVarName(providerKey, env); accept iff disk equals that env-var name AND env[envVarName] is populated. - profile-plaintext / non-default providerApiKeyResolver paths are NOT covered by this round (we don't have profile or resolver in scope at the comparator). Bounded perf cost: one full plan per restart that produces an unchanged disk shape; can be extended symmetrically in a later round if profile context plumbing makes it cheap. Anything else (number, null, object, array, string not derivable from the planner's env/aws-sdk paths, name pointing at an unrelated env var, env var no longer set) still falls through to full planning. Result ------ - src/agents/models-config.ts: + new helper diskApiKeyMatchesUnsetConfigPlannerOutput documents the planner-output decision tree and pins the env / aws-sdk branches with explicit liveness checks. + unset-config branch in readExistingProviderMatchesConfig now consults the helper before rejecting. + new imports for resolveAwsSdkApiKeyVarName / resolveEnvApiKeyVarName from models-config.providers.secret-helpers. - src/agents/models-config.target-provider-short-circuit.test.ts: three new regression tests pin the contract: + hit-on-env-marker-with-unset-config: short-circuit accepts disk apiKey: 'OPENAI_API_KEY' when config has no apiKey and env var is set. + miss-on-env-marker-with-unset-env: same setup but with the env var deleted between passes; short-circuit rejects (liveness). + miss-on-env-marker-name-mismatch: disk apiKey points at an unrelated env var that doesn't map to the openai provider; short-circuit rejects. Pre-existing miss-on-malformed-disk-apiKey kept and still passing. Validation gate --------------- - pnpm tsgo:core: clean - pnpm vitest run src/agents/models-config.target-provider-short-circuit.test.ts: 40/40 passed (37 prior + 3 new) - pnpm oxlint <touched files>: 0 warnings, 0 errors - pnpm plugin-sdk:api:check: OK - git diff --check: clean
Round 6 — addressing Codex P2 review feedbackCodex finding on Paraphrased problem: When Solution approach: Extend the unset-config branch to also accept disk
Anything else (number, null, object, array, string not derivable from the planner's env/aws-sdk paths, name pointing at an unrelated env var, env var no longer set) still falls through to full planning. Result: commit
Validation gate (last lines): |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32200a3170
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if ( | ||
| !stableEqualBounded( | ||
| configuredProvider.headers, | ||
| diskProvider.headers, | ||
| SHORT_CIRCUIT_COMPARE_MAX_DEPTH, | ||
| ) | ||
| ) { |
There was a problem hiding this comment.
Normalize secret-ref headers before short-circuit compare
The target-provider fast path compares configuredProvider.headers directly against disk, but planner output normalizes secret-ref headers (e.g. ${ENV} / SecretRef objects) into marker strings via normalizeHeaderValues and source-managed enforcement. As a result, any provider configured with header SecretRefs will always fail this comparison and re-run full planning on every call, which defeats the intended targetProvider performance win for header-auth providers even when disk and config are semantically equivalent.
Useful? React with 👍 / 👎.
…ls (Clawsweeper P2 round-6 on openclaw#73261) Both pi-embedded production callers (compact.ts and run.ts) were passing only { workspaceDir } to ensureOpenClawModelsJson, so the targetProvider short-circuit (options.targetProvider being undefined) never fired in production embedded runs, negating the perf benefit of the entire PR. Fix: forward the resolved `provider` as `targetProvider` in both call sites so the scoped-cache short-circuit can skip a full plan when the provider's disk config already matches. Regression coverage: export ensureOpenClawModelsJsonMock from compact.hooks.harness.ts; add two focused tests that pin the targetProvider wiring for the compact direct path (default openai provider and caller-supplied provider override). tsgo:core, tsgo:core:test → clean git diff --check → clean
Round 7 — addressing Clawsweeper P2 (thread targetProvider into embedded callers)Commit: [P2] Pass target hints from embedded callersProblem (paraphrased): Both Fix: Forward the resolved
await ensureOpenClawModelsJson(params.config, agentDir, {
workspaceDir: resolvedWorkspace,
// Thread the resolved provider so the targetProvider short-circuit can skip
// a full plan when this provider's disk config already matches.
targetProvider: provider,
});
await ensureOpenClawModelsJson(params.config, agentDir, {
workspaceDir: resolvedWorkspace,
// Thread the resolved provider so the targetProvider short-circuit can skip
// a full plan when this provider's disk config already matches.
targetProvider: provider,
});Regression coverage: Exported
Validation gate: Note: the 2 |
|
@clawsweeper please re-review. Round 7 commit |
…t compare (review feedback) Codex P2 round-7 on PR openclaw#73261: 'Normalize secret-ref headers before short-circuit compare' Paraphrased problem ------------------- The targetProvider fast path compared configuredProvider.headers directly against disk, but planner output normalizes secret-ref headers (env refs like \${ENV_VAR} and SecretRef objects) into marker strings via normalizeHeaderValues: - env refs -> 'secretref-env:<ENV_VAR_NAME>' - non-env -> 'secretref-managed' As a result, any provider configured with header SecretRefs (the common shape for header-auth providers) would always fail the deep compare and re-run full implicit discovery on every call, defeating the targetProvider perf win for that entire class of providers. Solution -------- Pre-normalize configuredProvider.headers using the same normalizeHeaderValues call the planner uses, threading cfg.secrets?.defaults through readExistingProviderMatchesConfig so the normalization produces identical markers to what the planner persisted. normalizeHeaderValues is idempotent for plain literal values (returns headers unchanged when no entry resolves as a secret ref), so this is a safe no-op for the common no-header / literal-header case. Anything else (header drift, attacker-injected headers, markers pointing at unrelated env vars) still falls through to full planning via the existing stableEqualBounded compare. Out of scope (deliberately): source-managed enforcement (enforceSourceManagedProviderSecrets) layers additional marker rewrites when sourceProviders is set. The short-circuit doesn't have sourceProviders in scope today, and the perf cost of falling through for source-managed providers is bounded. Can be extended in a later round if profile/source plumbing makes it cheap. Result ------ - src/agents/models-config.ts: + readExistingProviderMatchesConfig now takes secretDefaults: SecretDefaults | undefined. + Before the headers compare, configuredProvider.headers (when record-shaped or undefined) is run through normalizeHeaderValues to produce the same marker strings the planner persists. + Call site in ensureOpenClawModelsJson passes cfg.secrets?.defaults. + New imports for normalizeHeaderValues and the SecretDefaults type from models-config.providers.secret-helpers. - src/agents/models-config.target-provider-short-circuit.test.ts: two new regression tests: + hit-on-env-ref-header-normalization: short-circuit accepts disk headers shaped { 'X-Custom-Auth': 'secretref-env:OPENAI_API_KEY', 'X-Static': 'literal-value' } when config holds { 'X-Custom-Auth': '\${OPENAI_API_KEY}', 'X-Static': 'literal-value' }. + miss-on-env-ref-header-mismatch: same env-ref shape in config, but disk marker points at UNRELATED_TOKEN \u2014 short-circuit rejects (deep compare catches the env-name mismatch). Validation gate --------------- - pnpm tsgo:core: clean - pnpm tsgo:core:test: clean - pnpm vitest run src/agents/models-config.target-provider-short-circuit.test.ts: 44/44 passed (40 prior + 2 new round-6 + 2 new round-7) - pnpm oxlint <touched files>: 0 warnings, 0 errors - pnpm plugin-sdk:api:check: OK - git diff --check: clean
Round 7 — addressing Codex P2 review feedbackCodex finding on Paraphrased problem: The
As a result, any provider configured with header SecretRefs (the common shape for header-auth providers) would always fail the deep compare and re-run full implicit discovery on every call, defeating the Solution approach: Pre-normalize Out of scope (deliberately): source-managed enforcement ( Result: commit
Validation gate (last lines): |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83b72886f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (Array.isArray(diskProvider.models)) { | ||
| for (const m of diskProvider.models) { | ||
| if (!isRecord(m)) { | ||
| return false; | ||
| } | ||
| for (const f of PER_MODEL_TRANSPORT_FIELDS) { | ||
| if (Object.hasOwn(m, f)) { | ||
| return false; |
There was a problem hiding this comment.
Compare configured models before short-circuiting provider hit
This branch only checks whether disk models contain transport override fields and never validates configuredProvider.models against diskProvider.models, so a config edit that changes model IDs/entries (but not apiKey/baseUrl/api/headers/auth) still returns a short-circuit match. With this commit wiring targetProvider into the embedded runner paths, those calls can now skip planOpenClawModelsJson after model-list changes, leaving stale models.json and causing resolveModelAsync to miss newly configured models until some later full reconcile happens.
Useful? React with 👍 / 👎.
…re short-circuit (review feedback) Codex P1 round-8 on PR openclaw#73261: 'Compare configured models before short-circuiting provider hit' Paraphrased problem ------------------- The prior per-model loop in readExistingProviderMatchesConfig only validated that disk-side models carried no transport-override fields; it never compared configuredProvider.models against diskProvider.models. Result: a config edit that adds a new model id (without touching apiKey / baseUrl / api / headers / auth) hits the short-circuit, leaves models.json stale, and resolveModelAsync misses the newly configured model until some later full reconcile happens. With the targetProvider path now wired into the embedded runner, that staleness window is reachable from gateway hot paths and is a real correctness bug, not just a perf miss. Solution -------- Add a model-id subset check after the existing transport-only loop: - collectShortCircuitModelIds normalizes a provider's models field into a Set<string> of ids. Accepts the two runtime shapes: bare string entries ('gpt-5') and record entries with a string id field ({ id: 'gpt-5', ... }). Returns null on adversarial / malformed shapes (non-array with non-undefined value, non-record non-string entries, missing id, prototype-key collisions) so the comparator can fail closed. - When configuredProvider.models has any entries, every configured id MUST appear in diskProvider.models. Disk may legitimately contain MORE models than config (implicit discovery / plugin-contributed entries), so subset check, not strict equality. - Implicit-only mode (configured models is empty array OR omitted) legitimately defers to the planner's discovery output. We skip the subset check in that case and rely on the transport-only comparisons \u2014 preserving the perf path for the dominant implicit-mode setup. - Adversarial shapes (configuredIds === null OR diskIds === null) fail closed: refuse short-circuit, fall through to full planning. Result ------ - src/agents/models-config.ts: + new helper collectShortCircuitModelIds documents the parser contract: returns null for fail-closed shapes, empty Set for legitimate implicit-only mode, populated Set otherwise. + readExistingProviderMatchesConfig adds the subset check after the existing per-model transport loop. - src/agents/models-config.target-provider-short-circuit.test.ts: four new round-8 regression tests: + miss-on-newly-configured-model: config adds { id: 'gpt-6-newly-configured' }; short-circuit rejects, full plan re-runs. + miss-on-config-only-bare-string-model-add: same shape but with bare-string entries; subset check still fires. + hit-on-implicit-only-models-list: empty configured.models with matching transport still short-circuits (perf path preserved for implicit-discovery mode). + miss-on-malformed-configured-model-entry: configured.models has an entry { id: 1234 }; collector returns null; short-circuit refuses (fail-closed for adversarial shape). Validation gate --------------- - pnpm tsgo:core: clean - pnpm tsgo:core:test: clean - pnpm vitest run src/agents/models-config.target-provider-short-circuit.test.ts: 52/52 passed (44 prior + 4 new) - pnpm oxlint <touched files>: 0 warnings, 0 errors - pnpm plugin-sdk:api:check: OK - git diff --check: clean
Round 8 — addressing Codex P1 review feedbackCodex finding on Paraphrased problem: The prior per-model loop in Solution approach: Add a model-id subset check after the existing transport-only loop:
Result: commit
Validation gate (last lines): |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fed465f466
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const configuredProvider = Object.hasOwn(explicitProviders, targetProvider) | ||
| ? explicitProviders[targetProvider] | ||
| : undefined; |
There was a problem hiding this comment.
Normalize provider defaults before short-circuit comparison
Use the normalized provider shape (the same shape planOpenClawModelsJson writes) before calling readExistingProviderMatchesConfig; this currently compares disk against raw cfg.models.providers[targetProvider], so providers that inject defaults during normalization (for example extensions/ollama/provider-policy-api.ts sets a default baseUrl) will always miss the fast path even when disk is already correct. In those configs, every targeted call falls back to full planning and the new performance path is effectively disabled.
Useful? React with 👍 / 👎.
| if (configuredIds.size > 0) { | ||
| const diskIds = collectShortCircuitModelIds(diskProvider.models); |
There was a problem hiding this comment.
Check requested model on implicit-only short-circuit hits
When configuredProvider.models is empty/omitted, this branch skips all model-id validation and can still return a short-circuit match, even if models.json does not contain the model the caller is about to resolve. After this commit, embedded runner callers pass targetProvider on cold start, so stale disk catalogs in implicit-discovery setups can bypass resolveImplicitProviders, then resolveModelAsync fails with Unknown model until another path forces a full reconcile.
Useful? React with 👍 / 👎.
|
@clawsweeper please re-review. Round 8 follow-up commit |
…ly short-circuit on targetModel (review feedback) What ---- Round-9 review feedback on PR openclaw#73261 addressing two Codex P2 findings on the round-8 commit `fed465f466`: 1. [P2] Normalize provider defaults before short-circuit comparison (`src/agents/models-config.ts:1282`). The structural compare ran `cfg.models.providers[targetProvider]` AS AUTHORED against `models.json` AS NORMALIZED — but `planOpenClawModelsJson` writes the normalized shape, so any provider whose policy hook injects defaults (e.g. `extensions/ollama/provider-policy-api.ts` defaults `baseUrl` to `http://localhost:11434`) would always miss the short-circuit even when disk was already correct. Every targeted call would fall through to a full plan, disabling the perf path for that provider. 2. [P2] Check requested model on implicit-only short-circuit hits (`src/agents/models-config.ts:997`). When `configuredProvider.models` is `[]` / omitted (implicit-discovery mode), the prior contract skipped ALL model-id validation and could bless a stale disk catalog. With this round wiring `targetProvider` into the embedded runner cold paths, a stale `models.json` would short-circuit and then `resolveModelAsync` would fail with `Unknown model: <provider>/<model>` until another path forced a full reconcile. Why --- 1. Apply the same `normalizeProviderSpecificConfig(targetProvider, cfg)` pre-pass that `normalizeProviders` runs before plan writes. The hook is a no-op for providers without an opinion, so existing tests stay green; ollama and any future bundled provider with a normalize-config policy now hit the perf path on the second pass. 2. Thread the resolved model id through the embedded runner via the already-defined `EnsureOpenClawModelsJsonOptions.targetModel` (was reserved). In implicit-only mode the structural compare now also checks `targetModelId` is present on disk; if it isn't, refuse the short-circuit and let the planner rediscover. Explicit-mode contract (subset check) is unchanged. Folded `targetModelId` into the scoped readyCache key so a hit cached for model X cannot be reused as a "match" for model Y under the same (provider, fingerprint, models.json content) tuple when Y isn't on disk. How --- - `readExistingProviderMatchesConfig`: - Apply `normalizeProviderSpecificConfig` to the configured provider after the `isRecord` guard, then compare normalized vs disk. - Accept `targetModelId?: string` and, in the implicit-only branch (configured `models` empty), require `targetModelId` to be on disk; lazily compute `diskIds` so adversarial disk model rows still fail closed in both branches without parsing twice. - `modelsJsonScopedShortCircuitCacheKey`: append `\\0m:<id>` when a `targetModelId` is provided so per-model cache scope follows the per-model contract above. Keys without `targetModel` keep the prior shape. - `pi-embedded-runner/run.ts` and `pi-embedded-runner/compact.ts`: forward the resolved `modelId` as `targetModel` alongside `targetProvider`. - `EnsureOpenClawModelsJsonOptions.targetModel` JSDoc updated from "reserved" to the implicit-only contract. Tests ----- 4 new cases in `src/agents/models-config.target-provider-short-circuit.test.ts`, pinning the regressions: - `hit-on-policy-injected-default-baseUrl` — ollama config without `baseUrl` short-circuits on second pass because policy normalizes both sides to the default. Without the round-9 fix this falls through. - `miss-on-implicit-only-target-model-not-on-disk` — implicit mode with `targetModel` missing from disk now misses the short-circuit and triggers a re-plan. - `hit-on-implicit-only-target-model-on-disk` — symmetric companion; short-circuit still hits when the model is on disk. - `scoped-cache-per-target-model` — model X cached hit does not satisfy a follow-up call for model Y under the same fingerprint and models.json content when Y is not on disk. Validation gate --------------- - `pnpm tsgo:core` clean - `pnpm tsgo:core:test` clean - `pnpm vitest run src/agents/models-config.target-provider-short-circuit.test.ts`: 60 passed (30 × 2 contexts) - `pnpm vitest run src/agents/models-config.fingerprint-cache.test.ts src/agents/models-config.providers.policy.test.ts`: 24 passed - `pnpm vitest run src/infra/changelog-unreleased.test.ts`: 3 passed - `pnpm config:schema:check`: ok - `pnpm plugin-sdk:api:check`: ok - `pnpm oxlint <touched files>`: 0 new errors (the pre-existing `__testing` underscore-dangle finding in `pi-embedded-runner/compact.ts:1425` is upstream and unrelated) - `git diff --check`: clean Beads: openclaw-8zp
Round 9 — addressing Codex P2 review feedbackCodex review on round-8 commit [P2] Normalize provider defaults before short-circuit comparison —
|
…rt-circuit cache (review feedback)
Round-10 follow-up on Clawsweeper P2 (outstanding since 2026-04-30T22:05,
missed in rounds 6-9): "Cache only the validated models.json snapshot."
Pre-fix, `readExistingProviderMatchesConfig` would safe-read +
structurally validate the on-disk `models.json` once via
`safeReadFileOutcome`, then return a bare `boolean`. The caller at
models-config.ts:1397-1401 would then issue a SECOND read via
`readModelsJsonContentOutcome(targetPath)` and store THAT outcome's
hash in the provider-scoped readyCache. Between the two reads, an
attacker (or any external editor) replacing `models.json` could land
hash(swappedBytes) in the scoped cache as "the validated snapshot." A
later targeted call hitting the cache would compare current disk
against hash(swappedBytes) and accept it as a match, blessing
attacker-controlled provider transport (api / baseUrl / headers
consumed per-call by `pi-embedded-runner/model.ts`). CWE-345.
This commit collapses the two reads into one by threading the
validated `ContentHashOutcome` straight back from the structural
check. Concretely:
- `readExistingProviderMatchesConfig` now returns a discriminated
union: `{ matches: true; validatedModelsJsonOutcome }` on success,
`{ matches: false }` on every failure path. The validated outcome
is derived directly from the `safeReadFileOutcome` the function
already runs (line 838) — same bytes, same hash, no second disk
trip.
- The caller uses `matchOutcome.validatedModelsJsonOutcome` for the
scoped-cache `set` and drops the redundant
`readModelsJsonContentOutcome(targetPath)` call.
- The `kind !== "uncacheable"` guard is preserved as
belt-and-suspenders even though, on the success path, the validated
outcome is provably `hashed` (uncacheable cases early-return
`{ matches: false }` before we get here).
Three new regression tests pin the contract from independent angles:
- `cache-stores-validated-outcome` — happy path: cold + stable disk +
reset-cache + targeted call must take the disk-based short-circuit
exactly once and a follow-up targeted call must hit the warm scoped
cache without re-running discovery.
- `toctou-swap-cannot-bless-unvalidated-content` — security: after a
cold short-circuit populates the scoped cache, swap on-disk
`models.json` to a config-disagreeing baseUrl; the next targeted
call MUST detect drift, refuse the cache, fail the structural
check, and fall through to a full plan (resolveImplicit fires).
- `no-cache-on-uncacheable-validated-outcome` — defense-in-depth:
oversize `models.json` forces a full plan and the post-plan disk
size returns under cap, confirming the uncacheable branch is
reachable end-to-end without the redundant second read.
Adds a real-runtime proof script
`scripts/proof-73261-validated-outcome-cache.ts` that drives the
production `ensureOpenClawModelsJson` against a real on-disk
`models.json` in a temp dir, instruments `fs.promises.lstat` to count
reads per scenario, and self-checks that the round-10 contract holds:
[1/3] cache-stores-validated-outcome
lstat(models.json) during short-circuit: 1
lstat(models.json) during scoped-cache hit + drift-check: 1
ok
[2/3] toctou-swap-cannot-bless-unvalidated-content
disk baseUrl after cold plan: https://api.openai.com/v1
swap was not silently blessed by stale scoped cache
ok
[3/3] no-cache-on-uncacheable-validated-outcome
bloated models.json size: 2102064 bytes (> 1 MiB cap)
post-plan models.json size: 9105 bytes
ok
All runtime assertions passed.
Validation gate (full output captured for the PR comment):
pnpm tsgo:core ok
pnpm tsgo:core:test ok
pnpm vitest run src/agents/models-config.target-provider-... 66/66
pnpm vitest run src/agents/models-config.fingerprint-cache,
src/agents/models-config.write-serialization 32/32
pnpm config:schema:check ok
pnpm plugin-sdk:api:check ok
pnpm oxlint <touched files> ok
git diff --check clean
pnpm tsx scripts/proof-73261-validated-outcome-cache.ts ok
Beads: openclaw-8zp
Round 10 — Clawsweeper P2 (long-outstanding) addressedAcknowledging up front: this finding has been outstanding since the 2026-04-30T22:05 review pass and was missed across rounds 6, 7, 8, and 9. Apologies for the delay — round-10 (
|
Summary
Splits the targetProvider short-circuit half of #72869 into a standalone PR.
Adds a fast path to
ensureOpenClawModelsJsonfor callers that know which provider/model they are about to use (e.g. the pi-embedded runner). When set, the implicit-provider-discovery pipeline can be skipped IF the on-diskmodels.jsonprovider entry structurally matches what the current configuration would produce.Why structural comparison
A naive "is configured" check (does any non-empty credential exist on disk for this provider) silently bypasses several real failure modes:
The new
readExistingProviderMatchesConfigcompares all four security-relevant fields:apiKeyresolveSecretInputRef(env-ref expansion viacreateConfigRuntimeEnv) before string equality vs. diskbaseUrlheadersauthAny mismatch falls through to full planning + write. This is safe to use on cold start and after gateway restart.
API
Adds
EnsureOpenClawModelsJsonOptionsexported type withtargetProviderandtargetModel, layered on top of the existing options shape (kept all ofpluginMetadataSnapshot,workspaceDir,providerDiscoveryProviderIds,providerDiscoveryTimeoutMs).Tests
6 cases in
models-config.target-provider-short-circuit.test.ts:hit-on-match— full structural match short-circuits planningmiss-on-rotated-key— configapiKeychange forces a full planmiss-on-baseUrl-change— tampered diskbaseUrlrejects short-circuitmiss-on-tampered-headers— any disk header drift rejects short-circuitmiss-on-cold-cache— empty in-memory cache + missing disk file forces a planhit-after-warm-fingerprint— warm in-memory cache reuseDependencies
This is one of two PRs split out of the original combined #72869. The cache-fingerprint work (auth-profiles content hash, models.json drift detection, fingerprint hashing, security hardening) is in #73260.
This branch can land independently of the cache PR; the two changes touch different code paths.
Related
Real behavior proof
Behavior or issue addressed: Target-provider models-config short-circuit safety from PR perf(models-config): targetProvider short-circuit with disk-vs-config validation #73261, including provider-level
apidrift and uncacheable models hash fail-closed behavior.Real environment tested: Local OpenClaw topic branch
perf/models-config-target-provider-short-circuitat2f89c7b99a, including final perf(models-config): content-hash auth-profiles + models.json drift detection #73260 ContentHashOutcome contract, isolated temporary agent directory, productionensureOpenClawModelsJsoninvoked through atsxruntime driver with no test harness or mocks.Exact steps or command run after this patch: Ran a real proof driver after the patch covering steady-state targetProvider hit, provider
apidrift, and oversizemodels.json.Evidence after fix: Full copied runtime output is in the PR comment: perf(models-config): targetProvider short-circuit with disk-vs-config validation #73261 (comment)
Excerpt of copied live output:
Observed result after fix: The production path keeps the short-circuit only for structurally matching provider config, refuses provider-level
apidrift, and refuses uncacheable models hashes instead of treating them as stable cache hits.What was not tested: Nothing else for this PR's changed behavior beyond the isolated local runtime proof and supplemental automated validation.