Skip to content

Commit 10361a0

Browse files
committed
fix(models-config): symmetric baseUrl, env-var-name compare, cache integration
Addresses Codex P2, Greptile P1+P2x2, and Aisle High #1 + Med #2/#3/#4 on PR #73261: # Greptile P1 / Aisle High #1: asymmetric baseUrl (CWE-918, SSRF) The previous guard: if (typeof configuredProvider.baseUrl === 'string' && configuredProvider.baseUrl !== diskProvider.baseUrl) short-circuited the check entirely when config omitted baseUrl (common for providers with compiled-in defaults). An attacker who could tamper with on-disk models.json could set baseUrl to an exfil endpoint and the short-circuit would accept it silently \u2014 exactly the SSRF/credential-exfil vector this PR was meant to close. Replaced with symmetric stableEqual() so config-undefined vs disk-string is a mismatch and falls through to full planning, which re-applies provider/plugin defaults and rewrites the file. # Codex P2: env-ref API key comparison resolveConfiguredApiKeyForCompare resolved env refs to the env-var VALUE (env[ref.id]). But planOpenClawModelsJson persists env-source api keys to models.json as the env-var NAME (e.g. 'OPENAI_API_KEY'), not the value \u2014 that's what resolveApiKeyFromCredential returns for env source. Comparing value-vs-name always mismatched, so the short-circuit never fired for the most common config form ('apiKey: "${env.OPENAI_API_KEY}"'). Now the helper returns the env-var NAME for env-source refs, while still verifying the env is currently populated (so we don't short-circuit on a misconfigured environment). Plaintext values still compare directly. # Greptile P2 (perf): readyCache integration The short-circuit returned before the readyCache check, so warm callers paid the disk read + JSON.parse cost on every call. Reordered: 1. compute fingerprint (cheap) 2. check readyCache \u2014 warm hit returns immediately, no disk I/O 3. if cold, attempt short-circuit on disk 4. if short-circuit succeeds, populate readyCache so subsequent calls take the warm path Net effect: warm callers now skip disk entirely; cold callers with intact disk state still get the short-circuit benefit; full plan fires only on real drift. # Aisle Med #2 (CWE-59): symlink-following short-circuit reads readExistingProviderMatchesConfig used fs.readFile on a possibly- symlinked target. Now lstat-checks first and refuses symlinks / non-regular files. # Aisle Med #3 (CWE-1321): prototype-chain key access explicitProviders[targetProvider] could fall through to a prototype key like '__proto__' / 'constructor' / 'prototype'. Now uses Object.hasOwn at both lookup sites (caller AND readExistingProviderMatchesConfig) to refuse inherited keys. Also explicit string check rejects the three known dangerous keys. # Aisle Med #4 (CWE-400): unbounded models.json read Added MAX_MODELS_JSON_SHORT_CIRCUIT_BYTES = 1 MiB cap on the disk read in the short-circuit path. Files above the cap fall through to full planning rather than being parsed. # Tests Updated hit-after-warm-fingerprint test (Greptile P2) to use fs.readFile spy and assert no disk read occurs on the warm path. Added short-circuit-populates-cache test that drops the in-memory cache between runs, fires the short-circuit, then verifies the third call takes the warm path with no disk read. 14/14 tests pass. Lint: 0 errors.
1 parent 682c126 commit 10361a0

2 files changed

Lines changed: 181 additions & 52 deletions

File tree

src/agents/models-config.target-provider-short-circuit.test.ts

Lines changed: 58 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -198,17 +198,71 @@ describe("ensureOpenClawModelsJson targetProvider short-circuit", () => {
198198
expect(resolveImplicitProvidersCallCount).toBe(1);
199199
});
200200

201-
it("hit-after-warm-fingerprint: identical inputs reuse the in-memory cache without re-running plan", async () => {
201+
it("hit-after-warm-fingerprint: warm in-memory cache hit takes the readyCache path", async () => {
202+
// After the first call populates readyCache (either via plan or
203+
// via short-circuit), the next call with identical inputs hits
204+
// the in-memory cache BEFORE any disk read. This validates the
205+
// ordering fix for Greptile P2: short-circuit runs after
206+
// readyCache check so warm callers don't re-read models.json.
202207
const agentDir = await fixtureSuite.createCaseDir("agent");
203208
const cfg = createOpenAiConfig();
204209

205210
await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
206211
expect(resolveImplicitProvidersCallCount).toBe(1);
207212

208-
// Same config + same disk + warm in-memory cache: both the
209-
// targetProvider short-circuit AND the fingerprint cache hit are
210-
// available. Either way, no re-plan should fire.
213+
// Spy on fs.readFile to verify the second call performs no disk
214+
// reads on the models-config codepath. Use the dynamic import
215+
// form so the spy installs against the same fs/promises instance
216+
// models-config is using.
217+
const fsPromises = await import("node:fs/promises");
218+
const readFileSpy = vi.spyOn(fsPromises.default, "readFile");
219+
try {
220+
await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
221+
expect(resolveImplicitProvidersCallCount).toBe(1);
222+
// No models.json read on the warm path.
223+
const modelsJsonReads = readFileSpy.mock.calls.filter((args) => {
224+
const arg = args[0];
225+
return typeof arg === "string" && arg.endsWith("/models.json");
226+
});
227+
expect(modelsJsonReads).toHaveLength(0);
228+
} finally {
229+
readFileSpy.mockRestore();
230+
}
231+
});
232+
233+
it("short-circuit-populates-cache: subsequent calls take the warm path even after a cold short-circuit", async () => {
234+
// Greptile P2 fix: when the targetProvider short-circuit fires,
235+
// it now populates readyCache so subsequent calls don't repeat
236+
// the disk + parse work.
237+
const agentDir = await fixtureSuite.createCaseDir("agent");
238+
const cfg = createOpenAiConfig();
239+
240+
// First call: cold start, plan runs and populates readyCache.
211241
await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
212242
expect(resolveImplicitProvidersCallCount).toBe(1);
243+
244+
// Drop the in-memory cache to simulate a fresh process. Disk
245+
// state remains intact, so the second call should fire the
246+
// short-circuit and populate readyCache.
247+
resetModelsJsonReadyCacheForTest();
248+
resolveImplicitProvidersCallCount = 0;
249+
await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
250+
expect(resolveImplicitProvidersCallCount).toBe(0); // short-circuit
251+
252+
// Third call: readyCache should now be populated by the short
253+
// circuit, and no disk read should occur.
254+
const fsPromises = await import("node:fs/promises");
255+
const readFileSpy = vi.spyOn(fsPromises.default, "readFile");
256+
try {
257+
await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
258+
expect(resolveImplicitProvidersCallCount).toBe(0);
259+
const modelsJsonReads = readFileSpy.mock.calls.filter((args) => {
260+
const arg = args[0];
261+
return typeof arg === "string" && arg.endsWith("/models.json");
262+
});
263+
expect(modelsJsonReads).toHaveLength(0);
264+
} finally {
265+
readFileSpy.mockRestore();
266+
}
213267
});
214268
});

src/agents/models-config.ts

Lines changed: 123 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -178,16 +178,26 @@ export type EnsureOpenClawModelsJsonOptions = {
178178
};
179179

180180
/**
181-
* Resolve a configured provider's `apiKey` reference into the literal
182-
* value that planOpenClawModelsJson would write to disk, so we can
183-
* compare config-vs-disk during the short-circuit check. Mirrors the
184-
* env-ref handling in `models-config.providers.secret-helpers.ts` but
185-
* narrowed to the comparison use case.
181+
* Resolve a configured provider's `apiKey` reference into the form that
182+
* planOpenClawModelsJson actually writes to disk, so we can compare
183+
* config-vs-disk during the short-circuit check.
184+
*
185+
* IMPORTANT: env-ref API keys are persisted to models.json as the
186+
* env-var **NAME** (e.g. `"OPENAI_API_KEY"`), not the env-var value.
187+
* That's the form `resolveApiKeyFromCredential` produces for env-source
188+
* credentials and the form the rest of the runtime expects. Comparing
189+
* against the resolved value would always mismatch and silently skip
190+
* the short-circuit on every call (Codex P2 on PR #73261).
191+
*
192+
* The env var is only consulted to verify it's currently set — if the
193+
* variable is missing or empty, no usable credential exists and the
194+
* caller should fall through to full planning rather than short-circuit.
186195
*
187196
* Returns:
188-
* - the literal string for plaintext / env-resolved values
197+
* - the env-var name for env-source secret refs
198+
* - the literal string for plaintext values
189199
* - undefined if no apiKey was configured
190-
* - null if a secret ref could not be resolved (e.g. env var unset OR
200+
* - null if a secret ref could not be resolved (env var unset OR
191201
* non-env source like keyring; in either case we can't safely match
192202
* against disk so the caller should NOT short-circuit)
193203
*/
@@ -201,13 +211,18 @@ function resolveConfiguredApiKeyForCompare(
201211
if (typeof apiKey === "string" && apiKey.length > 0) {
202212
const ref = resolveSecretInputRef({ value: apiKey }).ref;
203213
if (!ref || !ref.id.trim()) {
214+
// Plaintext literal value — disk holds the same literal.
204215
return apiKey;
205216
}
206217
if (ref.source !== "env") {
207218
return null;
208219
}
209-
const value = env[ref.id.trim()];
210-
return typeof value === "string" && value.length > 0 ? value : null;
220+
// Env source: disk holds the env var NAME, not the value. Verify
221+
// the env is currently populated so we don't short-circuit on a
222+
// misconfigured environment, but compare against the var name.
223+
const id = ref.id.trim();
224+
const value = env[id];
225+
return typeof value === "string" && value.length > 0 ? id : null;
211226
}
212227
if (isRecord(apiKey)) {
213228
const ref = resolveSecretInputRef({ value: apiKey, refValue: apiKey }).ref;
@@ -217,8 +232,9 @@ function resolveConfiguredApiKeyForCompare(
217232
if (ref.source !== "env") {
218233
return null;
219234
}
220-
const value = env[ref.id.trim()];
221-
return typeof value === "string" && value.length > 0 ? value : null;
235+
const id = ref.id.trim();
236+
const value = env[id];
237+
return typeof value === "string" && value.length > 0 ? id : null;
222238
}
223239
return null;
224240
}
@@ -233,23 +249,38 @@ function stableEqual(a: unknown, b: unknown): boolean {
233249
return stableStringify(a) === stableStringify(b);
234250
}
235251

252+
/**
253+
* Hard cap on the bytes we will read + parse from models.json during
254+
* the short-circuit check (Aisle medium #4 on PR #73261). Realistic
255+
* models.json sizes are dominated by listed models per provider; 1 MiB
256+
* is plenty of headroom while bounding the worst-case allocation.
257+
*/
258+
const MAX_MODELS_JSON_SHORT_CIRCUIT_BYTES = 1 * 1024 * 1024;
259+
236260
/**
237261
* Verify that the on-disk models.json provider entry STRUCTURALLY
238262
* matches what the current configuration would produce. Used by the
239263
* short-circuit fast path to skip the implicit-provider-discovery
240264
* pipeline only when the disk state is provably consistent with config.
241265
*
242-
* Compares:
266+
* Compares (all symmetric — either side undefined != string is a
267+
* mismatch):
243268
* apiKey — resolved through env-ref expansion before comparing
244-
* baseUrl — strict string equality
245-
* headers — stable structural equality (key-order independent)
269+
* (env-source values compare by env-var NAME, not value,
270+
* since that's what plan writes to disk)
271+
* baseUrl — stable structural equality (closes asymmetric-undef bug)
272+
* headers — stable structural equality
246273
* auth — stable structural equality
247274
*
275+
* Other provider fields (models[], maxTokens, contextWindow, cost,
276+
* compat, etc.) are NOT compared. Tampering with those would not
277+
* cause SSRF / credential exfil but might change inference behaviour;
278+
* accepting that trade-off keeps the short-circuit reachable. If a
279+
* field becomes security-critical later, add it here.
280+
*
248281
* Any mismatch (or any state we cannot conclusively verify, like a
249-
* non-env secret ref) returns false so the caller falls through to the
250-
* full plan + write path. This closes the "presence-only" check that
251-
* previously bypassed planning when on-disk credentials were stale or
252-
* attacker-tampered.
282+
* non-env secret ref) returns false so the caller falls through to
283+
* the full plan + write path.
253284
*/
254285
async function readExistingProviderMatchesConfig(
255286
targetPath: string,
@@ -260,6 +291,25 @@ async function readExistingProviderMatchesConfig(
260291
if (!isRecord(configuredProvider)) {
261292
return false;
262293
}
294+
// Reject prototype-chain key collisions for targetProvider (Aisle
295+
// medium #3 on PR #73261). String keys like "__proto__" /
296+
// "constructor" / "prototype" should not steer the short-circuit.
297+
if (
298+
targetProvider === "__proto__" ||
299+
targetProvider === "constructor" ||
300+
targetProvider === "prototype"
301+
) {
302+
return false;
303+
}
304+
// Symlink-safe + size-capped read (Aisle medium #2 + #4). Refuses
305+
// symlinks, non-regular files, and files larger than the cap.
306+
const lst = await fs.lstat(targetPath).catch(() => null);
307+
if (!lst || lst.isSymbolicLink() || !lst.isFile()) {
308+
return false;
309+
}
310+
if (lst.size > MAX_MODELS_JSON_SHORT_CIRCUIT_BYTES) {
311+
return false;
312+
}
263313
let raw: string;
264314
try {
265315
raw = await fs.readFile(targetPath, "utf8");
@@ -275,15 +325,25 @@ async function readExistingProviderMatchesConfig(
275325
if (!isRecord(parsed) || !isRecord(parsed.providers)) {
276326
return false;
277327
}
328+
// Use Object.hasOwn to refuse inherited keys — belt-and-suspenders
329+
// against prototype-chain access (Aisle medium #3).
330+
if (!Object.hasOwn(parsed.providers, targetProvider)) {
331+
return false;
332+
}
278333
const diskProvider = parsed.providers[targetProvider];
279334
if (!isRecord(diskProvider)) {
280335
return false;
281336
}
282337

283-
if (
284-
typeof configuredProvider.baseUrl === "string" &&
285-
configuredProvider.baseUrl !== diskProvider.baseUrl
286-
) {
338+
// Symmetric baseUrl comparison. The previous asymmetric check
339+
// (`typeof configuredProvider.baseUrl === "string" && ... !== ...`)
340+
// skipped validation entirely when config omitted baseUrl, letting
341+
// an attacker-injected disk baseUrl slip through (Greptile P1
342+
// security + Aisle High #1 on PR #73261). Now: any difference
343+
// between configured and disk baseUrl — including config-undefined
344+
// vs disk-string — falls through to full planning, which will
345+
// re-apply provider/plugin defaults and rewrite the file.
346+
if (!stableEqual(configuredProvider.baseUrl, diskProvider.baseUrl)) {
287347
return false;
288348
}
289349

@@ -336,32 +396,6 @@ export async function ensureOpenClawModelsJson(
336396
const agentDir = agentDirOverride?.trim() ? agentDirOverride.trim() : resolveOpenClawAgentDir();
337397
const targetPath = path.join(agentDir, "models.json");
338398

339-
// --- SHORT-CIRCUIT FAST PATH ---
340-
// If the caller specified a target provider AND the on-disk provider
341-
// entry STRUCTURALLY matches the current config (apiKey resolved
342-
// through env-refs, baseUrl/headers/auth via stable equality), skip
343-
// the implicit-discovery pipeline entirely. Any drift (rotated key,
344-
// attacker-tampered baseUrl/headers, missing fields) falls through to
345-
// full plan + write.
346-
const targetProvider = options?.targetProvider?.trim();
347-
if (targetProvider) {
348-
const explicitProviders = cfg.models?.providers ?? {};
349-
const configuredProvider = explicitProviders[targetProvider];
350-
if (configuredProvider) {
351-
const env = createConfigRuntimeEnv(cfg);
352-
const matches = await readExistingProviderMatchesConfig(
353-
targetPath,
354-
targetProvider,
355-
configuredProvider,
356-
env,
357-
);
358-
if (matches) {
359-
await ensureModelsFileModeForModelsJson(targetPath);
360-
return { agentDir, wrote: false };
361-
}
362-
}
363-
}
364-
365399
const fingerprint = await buildModelsJsonFingerprint({
366400
config: cfg,
367401
sourceConfigForSecrets: resolved.sourceConfigForSecrets,
@@ -378,11 +412,52 @@ export async function ensureOpenClawModelsJson(
378412
const cacheKey = modelsJsonReadyCacheKey(targetPath, fingerprint);
379413
const cached = MODELS_JSON_STATE.readyCache.get(cacheKey);
380414
if (cached) {
415+
// Warm in-memory cache hit: same inputs, already-planned result.
416+
// This is the fastest path — no disk I/O at all.
381417
const settled = await cached;
382418
await ensureModelsFileModeForModelsJson(targetPath);
383419
return settled.result;
384420
}
385421

422+
// --- TARGETPROVIDER SHORT-CIRCUIT FAST PATH ---
423+
// The fingerprint cache missed (cold start, gateway restart, or
424+
// input drift), but the caller hinted which provider it intends to
425+
// use. If the on-disk provider entry STRUCTURALLY matches the
426+
// current config (apiKey env-var name, baseUrl, headers, auth all
427+
// identical), skip the heavy implicit-discovery pipeline. Any
428+
// drift (rotated key, attacker-tampered baseUrl/headers, missing
429+
// fields) falls through to full plan + write.
430+
//
431+
// Order matters: we run AFTER the readyCache check so warm callers
432+
// skip the disk read entirely. We also POPULATE the readyCache
433+
// after a successful short-circuit so subsequent calls hit the
434+
// in-memory path instead of repeating the disk + parse work
435+
// (Greptile P2 on PR #73261).
436+
const targetProvider = options?.targetProvider?.trim();
437+
if (targetProvider) {
438+
const explicitProviders = cfg.models?.providers ?? {};
439+
const configuredProvider = Object.hasOwn(explicitProviders, targetProvider)
440+
? explicitProviders[targetProvider]
441+
: undefined;
442+
if (configuredProvider) {
443+
const env = createConfigRuntimeEnv(cfg);
444+
const matches = await readExistingProviderMatchesConfig(
445+
targetPath,
446+
targetProvider,
447+
configuredProvider,
448+
env,
449+
);
450+
if (matches) {
451+
await ensureModelsFileModeForModelsJson(targetPath);
452+
const result = { agentDir, wrote: false };
453+
// Populate readyCache so the next call with identical inputs
454+
// takes the warm-cache path above without re-reading disk.
455+
MODELS_JSON_STATE.readyCache.set(cacheKey, Promise.resolve({ fingerprint, result }));
456+
return result;
457+
}
458+
}
459+
}
460+
386461
const pending = withModelsJsonWriteLock(targetPath, async () => {
387462
// Ensure config env vars (e.g. AWS_PROFILE, AWS_ACCESS_KEY_ID) are
388463
// are available to provider discovery without mutating process.env.

0 commit comments

Comments
 (0)