fix(models-config): close cache-vs-config drift, hash fingerprint, +tests

zeroaltitude · zeroaltitude · commit 235e3c2dcf45 · 2026-04-27T20:53:04.000-07:00
Three review-driven fixes per @zeroaltitude direction (b)+(c) + secret hygiene + tests: (b) Validate disk-vs-config before short-circuiting [Aisle High #2 / Codex P1 / Greptile P1 on PR #72869] The previous targetProvider short-circuit fired whenever the on-disk provider entry contained ANY non-empty credential. That silently bypassed: - rotated apiKey: cold start with new key, old key on disk, short-circuit fires, all calls fail until something else invalidates - attacker-tampered baseUrl: redirect to exfil endpoint kept - attacker-injected headers: arbitrary auth material kept New readExistingProviderMatchesConfig() does a strict structural comparison: apiKey - resolved through resolveSecretInputRef (env-ref expansion via createConfigRuntimeEnv) before string equality vs. disk baseUrl - exact string equality headers - stable structural equality (key-order independent) auth - stable structural equality Any mismatch (or any state we cannot conclusively verify, like a non-env secret ref) returns false and falls through to full planning. The short-circuit is now safe to use on cold start and after gateway restart. (c) Hash models.json content into the cache key [Codex P1 on PR #72869] Previous fingerprint had no models.json input \u2014 once the cache was populated, unchanged config/auth returned cached success even after the file was edited externally / partially corrupted / manually tampered. Now readyCache stores both the input fingerprint AND the post-write models.json SHA-256. Cache hit requires both to match; any external edit invalidates. Captures the hash at three points (skip path, noop path, write path) so the second factor is always recorded. Aisle medium #5: hash fingerprint before storage Raw stable-stringified config (including apiKey strings) used to sit verbatim in MODELS_JSON_STATE.readyCache. SHA-256 over the canonical payload is now the cache key \u2014 deterministic but not reversible, so heap snapshots / debug telemetry / core dumps can't leak secrets via the readyCache state. Greptile P2: targetProvider short-circuit tests New file models-config.target-provider-short-circuit.test.ts with 6 cases: - hit-on-match (full structural match short-circuits) - miss-on-rotated-key (config apiKey change forces plan) - miss-on-baseUrl-change (tampered disk baseUrl rejects) - miss-on-tampered-headers (any header drift rejects) - miss-on-cold-cache (no disk file forces plan) - hit-after-warm-fingerprint + invalidation on external models.json edit (modelsJsonHash second-factor verified) Existing fingerprint-cache test updated: The 'volatile fields rotate' test mixed type:oauth (correctly volatile) with type:token (now correctly NOT volatile after d505fa0). Split into two tests: - OAuth session-field rotation does NOT invalidate (existing intent, narrowed to oauth-only profiles) - Static type:token credential rotation DOES invalidate (Codex/Greptile P2 - new correct behavior) State shape change: MODELS_JSON_STATE.readyCache value extended with modelsJsonHash: { fingerprint, modelsJsonHash, result } All three return paths in the plan closure capture this. Tests: 13/13 (6 new + 7 existing fingerprint-cache + file-mode). Lint: 0 errors. TS: clean.
diff --git a/src/agents/models-config-state.ts b/src/agents/models-config-state.ts
@@ -1,10 +1,28 @@
 const MODELS_JSON_STATE_KEY = Symbol.for("openclaw.modelsJsonState");
 
+/**
+ * Cache entry shape captured at write/plan completion. The `fingerprint` is
+ * a SHA-256 hex digest of the canonical input shape (config + auth-profiles
+ * stable hash + plugin metadata) — NOT the raw stable-stringified payload.
+ * Hashing it before storage keeps raw secrets out of process memory
+ * (Aisle medium #5 on PR #72869) so heap snapshots / debug telemetry / core
+ * dumps cannot leak `apiKey` material via the readyCache.
+ *
+ * `modelsJsonHash` is captured immediately after the plan-and-write
+ * completes successfully. The cache check verifies that the current
+ * on-disk models.json still hashes to this value before treating the
+ * entry as a hit (Codex P1 on PR #72869). Any external edit / partial
+ * corruption / manual tamper changes the hash and invalidates the cache.
+ */
 type ModelsJsonState = {
   writeLocks: Map<string, Promise<void>>;
   readyCache: Map<
     string,
-    Promise<{ fingerprint: string; result: { agentDir: string; wrote: boolean } }>
+    Promise<{
+      fingerprint: string;
+      modelsJsonHash: string | null;
+      result: { agentDir: string; wrote: boolean };
+    }>
   >;
 };
 
@@ -17,7 +35,11 @@ export const MODELS_JSON_STATE = (() => {
       writeLocks: new Map<string, Promise<void>>(),
       readyCache: new Map<
         string,
-        Promise<{ fingerprint: string; result: { agentDir: string; wrote: boolean } }>
+        Promise<{
+          fingerprint: string;
+          modelsJsonHash: string | null;
+          result: { agentDir: string; wrote: boolean };
+        }>
       >(),
     };
   }
diff --git a/src/agents/models-config.fingerprint-cache.test.ts b/src/agents/models-config.fingerprint-cache.test.ts
@@ -110,18 +110,13 @@ describe("ensureOpenClawModelsJson fingerprint cache", () => {
     expect(resolveImplicitProvidersCallCount).toBe(firstCount);
   });
 
-  it("does not invalidate the cache when auth-profiles volatile fields rotate", async () => {
+  it("does not invalidate the cache when OAuth session fields rotate", async () => {
     const agentDir = await fixtureSuite.createCaseDir("agent");
     const cfg = createOpenAiConfig();
 
     await writeAuthProfiles(agentDir, {
       version: 1,
       profiles: {
-        "anthropic:default": {
-          type: "token",
-          provider: "anthropic",
-          token: "sk-ant-first-token-value", // pragma: allowlist secret
-        },
         "openai-codex:default": {
           type: "oauth",
           provider: "openai-codex",
@@ -137,17 +132,13 @@ describe("ensureOpenClawModelsJson fingerprint cache", () => {
     const firstCount = resolveImplicitProvidersCallCount;
     expect(firstCount).toBe(1);
 
-    // Simulate an OAuth token refresh: volatile fields (access/refresh/expires/token)
+    // Simulate an OAuth token refresh: access/refresh/expires fields
     // rotate, but the set of providers the user can use does not change.
+    // These fields stay in AUTH_PROFILE_VOLATILE_FIELDS.
     await new Promise((resolve) => setTimeout(resolve, 10));
     await writeAuthProfiles(agentDir, {
       version: 1,
       profiles: {
-        "anthropic:default": {
-          type: "token",
-          provider: "anthropic",
-          token: "sk-ant-rotated-token-value", // pragma: allowlist secret
-        },
         "openai-codex:default": {
           type: "oauth",
           provider: "openai-codex",
@@ -163,6 +154,48 @@ describe("ensureOpenClawModelsJson fingerprint cache", () => {
     expect(resolveImplicitProvidersCallCount).toBe(firstCount);
   });
 
+  it("DOES invalidate the cache when a static type:token credential rotates (Codex/Greptile P2)", async () => {
+    // Counterpart to the OAuth-rotation test above. Profiles with
+    // `type: "token"` use the literal `token` key as a long-lived static
+    // credential. The user rotating this credential must invalidate the
+    // cache so the implicit-provider-discovery pipeline re-runs against
+    // the new value (Codex/Greptile P2 on PR #72869: "token" used to be
+    // in the volatile fields set, masking real auth-state changes).
+    const agentDir = await fixtureSuite.createCaseDir("agent");
+    const cfg = createOpenAiConfig();
+
+    await writeAuthProfiles(agentDir, {
+      version: 1,
+      profiles: {
+        "anthropic:default": {
+          type: "token",
+          provider: "anthropic",
+          token: "sk-ant-first-token-value", // pragma: allowlist secret
+        },
+      },
+    });
+
+    await ensureOpenClawModelsJson(cfg, agentDir);
+    const firstCount = resolveImplicitProvidersCallCount;
+    expect(firstCount).toBe(1);
+
+    await new Promise((resolve) => setTimeout(resolve, 10));
+    await writeAuthProfiles(agentDir, {
+      version: 1,
+      profiles: {
+        "anthropic:default": {
+          type: "token",
+          provider: "anthropic",
+          token: "sk-ant-rotated-token-value", // pragma: allowlist secret
+        },
+      },
+    });
+
+    await ensureOpenClawModelsJson(cfg, agentDir);
+    // Static-credential rotation must trigger a re-plan.
+    expect(resolveImplicitProvidersCallCount).toBe(firstCount + 1);
+  });
+
   it("invalidates the cache when an auth profile is added or removed", async () => {
     const agentDir = await fixtureSuite.createCaseDir("agent");
     const cfg = createOpenAiConfig();
diff --git a/src/agents/models-config.target-provider-short-circuit.test.ts b/src/agents/models-config.target-provider-short-circuit.test.ts
@@ -0,0 +1,226 @@
+import fs from "node:fs/promises";
+import path from "node:path";
+import { afterAll, afterEach, beforeAll, describe, expect, it, vi } from "vitest";
+import type { OpenClawConfig } from "../config/types.openclaw.js";
+import { createFixtureSuite } from "../test-utils/fixture-suite.js";
+import {
+  installModelsConfigTestHooks,
+  MODELS_CONFIG_IMPLICIT_ENV_VARS,
+  unsetEnv,
+} from "./models-config.e2e-harness.js";
+
+vi.mock("../plugins/manifest-registry.js", () => ({
+  clearPluginManifestRegistryCache: () => undefined,
+  loadPluginManifestRegistry: () => ({ plugins: [] }),
+}));
+
+vi.mock("./model-auth-env-vars.js", () => ({
+  listKnownProviderEnvApiKeyNames: () => ["OPENAI_API_KEY"],
+  PROVIDER_ENV_API_KEY_CANDIDATES: { openai: ["OPENAI_API_KEY"] },
+  resolveProviderEnvApiKeyCandidates: () => ({ openai: ["OPENAI_API_KEY"] }),
+}));
+
+vi.mock("../plugins/provider-runtime.js", () => ({
+  applyProviderConfigDefaultsWithPlugin: (config: OpenClawConfig) => config,
+  applyProviderNativeStreamingUsageCompatWithPlugin: () => undefined,
+  normalizeProviderConfigWithPlugin: () => undefined,
+  resetProviderRuntimeHookCacheForTest: () => undefined,
+  resolveProviderConfigApiKeyWithPlugin: () => undefined,
+  resolveProviderSyntheticAuthWithPlugin: () => undefined,
+}));
+
+/**
+ * Track implicit-provider-discovery invocations so we can verify whether
+ * the targetProvider short-circuit fired (no call) or fell through to
+ * full planning (one call per ensureOpenClawModelsJson invocation).
+ */
+let resolveImplicitProvidersCallCount = 0;
+vi.mock("./models-config.providers.js", async () => {
+  const actual = await vi.importActual<typeof import("./models-config.providers.js")>(
+    "./models-config.providers.js",
+  );
+  return {
+    ...actual,
+    resolveImplicitProviders: async () => {
+      resolveImplicitProvidersCallCount += 1;
+      return {};
+    },
+  };
+});
+
+let clearConfigCache: typeof import("../config/config.js").clearConfigCache;
+let clearRuntimeConfigSnapshot: typeof import("../config/config.js").clearRuntimeConfigSnapshot;
+let ensureOpenClawModelsJson: typeof import("./models-config.js").ensureOpenClawModelsJson;
+let resetModelsJsonReadyCacheForTest: typeof import("./models-config.js").resetModelsJsonReadyCacheForTest;
+
+const fixtureSuite = createFixtureSuite("openclaw-models-target-provider-");
+
+function createOpenAiConfig(apiKey = "sk-test-static-value"): OpenClawConfig {
+  return {
+    models: {
+      providers: {
+        openai: {
+          baseUrl: "https://api.openai.com/v1",
+          // pragma: allowlist secret
+          apiKey,
+          api: "openai-completions" as const,
+          models: [],
+        },
+      },
+    },
+  };
+}
+
+beforeAll(async () => {
+  await fixtureSuite.setup();
+  ({ ensureOpenClawModelsJson, resetModelsJsonReadyCacheForTest } =
+    await import("./models-config.js"));
+  ({ clearConfigCache, clearRuntimeConfigSnapshot } = await import("../config/config.js"));
+  installModelsConfigTestHooks();
+});
+
+afterEach(() => {
+  clearRuntimeConfigSnapshot();
+  clearConfigCache();
+  resetModelsJsonReadyCacheForTest();
+  resolveImplicitProvidersCallCount = 0;
+  unsetEnv([...MODELS_CONFIG_IMPLICIT_ENV_VARS]);
+});
+
+afterAll(async () => {
+  await fixtureSuite.cleanup();
+});
+
+/**
+ * Six tests for the targetProvider short-circuit semantics on PR #72869
+ * (Greptile P2 + Aisle High #2 + Codex P1).
+ *
+ * The short-circuit was previously a "presence-only" check that fired when
+ * any non-empty credential was on disk for the requested provider. That
+ * silently bypassed configuration drift (rotated keys, attacker-tampered
+ * baseUrl/headers/auth). The fix structurally compares disk vs. config
+ * before short-circuiting and falls through to full planning on any
+ * mismatch.
+ */
+describe("ensureOpenClawModelsJson targetProvider short-circuit", () => {
+  it("hit-on-match: full disk-vs-config match short-circuits planning", async () => {
+    const agentDir = await fixtureSuite.createCaseDir("agent");
+    const cfg = createOpenAiConfig();
+
+    // First call: cold start, must run plan and write models.json.
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+
+    // Second call with identical config + intact disk state: short-circuit
+    // path now sees a structural match and returns without re-planning.
+    resetModelsJsonReadyCacheForTest();
+    resolveImplicitProvidersCallCount = 0;
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(0);
+  });
+
+  it("miss-on-rotated-key: config apiKey change forces a full plan", async () => {
+    const agentDir = await fixtureSuite.createCaseDir("agent");
+    // pragma: allowlist secret
+    const cfgOriginal = createOpenAiConfig("sk-test-original-key");
+
+    await ensureOpenClawModelsJson(cfgOriginal, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+
+    // Rotate the key in config, simulate a gateway restart (clear in-memory
+    // cache), and verify the next call falls through to planning instead of
+    // returning stale on-disk state with the OLD key.
+    resetModelsJsonReadyCacheForTest();
+    resolveImplicitProvidersCallCount = 0;
+    // pragma: allowlist secret
+    const cfgRotated = createOpenAiConfig("sk-test-rotated-key");
+    await ensureOpenClawModelsJson(cfgRotated, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+  });
+
+  it("miss-on-baseUrl-change: tampered disk baseUrl rejects the short-circuit", async () => {
+    const agentDir = await fixtureSuite.createCaseDir("agent");
+    const cfg = createOpenAiConfig();
+
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+
+    // Simulate an attacker editing models.json to redirect baseUrl to an
+    // exfiltration endpoint. Clear the in-memory cache (e.g. gateway
+    // restart) so the short-circuit path is the only thing that could
+    // trust this disk state.
+    const targetPath = path.join(agentDir, "models.json");
+    const raw = await fs.readFile(targetPath, "utf8");
+    const parsed = JSON.parse(raw);
+    parsed.providers.openai.baseUrl = "https://attacker.example/v1";
+    await fs.writeFile(targetPath, JSON.stringify(parsed));
+
+    resetModelsJsonReadyCacheForTest();
+    resolveImplicitProvidersCallCount = 0;
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    // Falls through to plan, which will rewrite the file with the correct
+    // baseUrl from config.
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+  });
+
+  it("miss-on-tampered-headers: any disk header drift rejects the short-circuit", async () => {
+    const agentDir = await fixtureSuite.createCaseDir("agent");
+    const cfg = createOpenAiConfig();
+
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+
+    // Inject attacker-supplied headers (e.g. Authorization override) onto
+    // the disk row. Config has none, so the structural comparison must
+    // reject this and force a full plan that overwrites with config shape.
+    const targetPath = path.join(agentDir, "models.json");
+    const raw = await fs.readFile(targetPath, "utf8");
+    const parsed = JSON.parse(raw);
+    parsed.providers.openai.headers = { "X-Injected-Auth": "attacker-token" };
+    await fs.writeFile(targetPath, JSON.stringify(parsed));
+
+    resetModelsJsonReadyCacheForTest();
+    resolveImplicitProvidersCallCount = 0;
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+  });
+
+  it("miss-on-cold-cache: empty in-memory cache + missing disk file forces a plan", async () => {
+    const agentDir = await fixtureSuite.createCaseDir("agent");
+    const cfg = createOpenAiConfig();
+
+    // No prior writes — disk has no models.json. Even with targetProvider
+    // set, the short-circuit cannot match against a non-existent file
+    // and must fall through to the full plan.
+    resetModelsJsonReadyCacheForTest();
+    resolveImplicitProvidersCallCount = 0;
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+  });
+
+  it("hit-after-warm-fingerprint: identical inputs reuse the in-memory cache without re-running plan", async () => {
+    const agentDir = await fixtureSuite.createCaseDir("agent");
+    const cfg = createOpenAiConfig();
+
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+
+    // Same config + same disk + warm in-memory cache: both the
+    // targetProvider short-circuit AND the fingerprint cache hit are
+    // available. Either way, no re-plan should fire.
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(1);
+
+    // External edit to models.json after the warm cache populated must
+    // invalidate the in-memory cache via the modelsJsonHash second factor
+    // (Codex P1: previously the fingerprint alone was the key, so external
+    // edits silently returned stale cached success).
+    const targetPath = path.join(agentDir, "models.json");
+    await fs.writeFile(
+      targetPath,
+      JSON.stringify({ providers: { openai: { baseUrl: "https://attacker.example/v1" } } }),
+    );
+    await ensureOpenClawModelsJson(cfg, agentDir, { targetProvider: "openai" });
+    expect(resolveImplicitProvidersCallCount).toBe(2);
+  });
+});
diff --git a/src/agents/models-config.ts b/src/agents/models-config.ts