Skip to content

fix(models): provider auth pre-warm #85125

Merged
sjf merged 6 commits into
mainfrom
sjf/discord-picker-pagination
May 22, 2026
Merged

fix(models): provider auth pre-warm #85125
sjf merged 6 commits into
mainfrom
sjf/discord-picker-pagination

Conversation

@sjf

@sjf sjf commented May 21, 2026

Copy link
Copy Markdown
Contributor

Fix auth-config pre-warm

Untitled.mp4

The gateway pre-warms a "prepared provider auth" map so every /models call (Discord, Telegram, CLI, status, pickers) can skip the per-provider auth-discovery sweep that costs ~22 s on a typical 30-provider config. Three things were wrong with the existing pre-warm:

1. TTL. The prepared map carried a 10 s TTL — any read more than 10 s after the warm completed fell through to the slow compute path. With warm taking ~50 s and /models typically used minutes apart, the map was useful for ~10 s and inert thereafter. The TTL was added with the intent of bounding staleness if auth state changes outside the gateway, but with no auto-rewarm it just guaranteed the perf benefit applied to almost no real call. Removed entirely. The map is invalidated explicitly on the events that actually change the answer — config reload, plugin reload, auth-profile logout, and now any observed auth-profile failure (see below). With no TTL, the prepared map is valid for the gateway's whole lifetime; freshness is event-driven rather than time-bounded.

2. workspaceDir scope mismatch. The warmer derived workspaceDir via resolveDefaultAgentWorkspaceDir() (lands on ~/.openclaw/workspace). The actual call path for agentId="main" derives it via resolveAgentWorkspaceDir(cfg, "main") (lands on the repo-local .openclaw/workspace on dev checkouts). Scope check failed → fell through every time, even though the map was populated. Fixed: warmer now uses resolveAgentWorkspaceDir(cfg, agentId), matching the call path. workspaceDir is no longer stored on the prepared state — it's recomputed at read time from (cfg, agentId) since both inputs are already matched.

3. Default-agent only. The prepared state was a single value scoped to the default agent. Calls for any non-default agentId always missed and paid the full price. Fixed: replaced with Map<agentId, PreparedProviderAuthState> populated by iterating listAgentIds(cfg) at warm time. Catalog load is shared across agents, so per-extra-agent cost is just the ~18 s auth sweep, not the full ~50 s.

API cleanup: hasAuthForModelProvider and createProviderAuthChecker now take agentId?: string instead of agentDir?: string. agentId is the canonical identifier; agentDir is a derived path and could theoretically collide across agents via config override. Slow path derives agentDir = resolveAgentDir(cfg, agentId) internally when it needs the filesystem store. Callers (commands-models.ts, model-catalog-visibility.ts, flows/model-picker.ts) updated to pass agentId.

Self-heal on observed auth failure

ClawSweeper's P1 finding flagged that removing the TTL leaves the prepared map stale if auth state changes outside the gateway's own reload/logout paths (token rotation, OAuth revoke, billing failure). The follow-up commit 961b5bc63f addresses this without reintroducing the TTL:

  • auth-profiles/usage.ts exposes a single mutable hook slot — setAuthProfileFailureHook(hook). markAuthProfileFailure calls the hook (try/catched) on both success paths after recording the failure.
  • Gateway startup wires the hook to clearCurrentProviderAuthState. Any observed auth failure now clears the prepared map, forcing the next /models call to recompute against the real auth state.
  • Single-listener slot, no Set/registry — single-purpose seam between two modules that intentionally don't statically import each other.

This covers the stale TRUE direction (map says auth, reality says none — the user-visible case where Discord shows a model that 401s when picked).

Watch auth-profiles.json for external changes

Closes the stale FALSE direction (user added auth externally — codex login from another shell, hand-edit, etc. — and the gateway hasn't observed any request through that profile yet, so the auth-failure hook never fires). New src/agents/auth-profiles-watcher.ts uses chokidar (already a dep, same options as gateway/config-reload.ts) to watch <agentDir>/auth-profiles.json for every configured agent. Any change event fires clearCurrentProviderAuthState, so the next /models call recomputes against the on-disk state. Wired up in gateway startup right after the failure hook.

Together with the failure hook, the prepared map now self-heals both directions without a TTL: the events that change the answer (failed auth call, file write) explicitly invalidate; otherwise the map is valid for the gateway's whole lifetime.

Real behavior proof — hot path

Behavior addressed: /models (Discord, Telegram, CLI) latency on a real openclaw gateway. Before this PR the call took ~22 s for every invocation because the prepared auth map was either expired (10 s TTL) or scope-mismatched and every call fell back to the per-provider auth-discovery sweep. After this PR the call hits the warmed map and returns in single-digit ms.

Real environment tested: local openclaw dev gateway on macOS, single machine, same config, same plugin set, captured back-to-back. Catalog at capture time: 955 model entries across ~30 unique providers, 40 providers warmed.

Exact steps or command run after this patch:

./run.sh openclaw gateway run            # gateway start
# wait for "provider auth state pre-warmed in <ms>" log
# then exercise /models from Discord 12×
grep 'buildModelsProviderData done' /tmp/openclaw/openclaw-*.log
grep 'warmCurrentProviderAuthState done' /tmp/openclaw/openclaw-*.log

Evidence after fix — /tmp/openclaw/openclaw-2026-05-20.log:

BEFORE (HEAD before this PR, no prepared-state hits — 10 s TTL expired and falling through):

21:52:01  54,812 ms   first /models call after restart (catalog cold)
21:53:06  20,613 ms   subsequent calls land on catalog cache but
21:53:28  20,470 ms   resolveVisibleModelCatalog re-runs the full
21:53:50  20,403 ms   per-provider auth-discovery sweep every time
21:54:12  20,473 ms
21:54:34  20,766 ms
21:54:55  20,758 ms
21:55:17  20,449 ms
21:55:38  20,622 ms
Metric ms
First call (cold catalog) 54,812
Subsequent calls n=8 min 20,403
Subsequent calls n=8 median ~20,570
Subsequent calls n=8 mean ~20,569

Matching gateway liveness diagnostic captured during the BEFORE run shows the call blocks the event loop:

14:44:14 [diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu
  interval=30s
  eventLoopDelayP99Ms=24461.2
  eventLoopDelayMaxMs=24461.2
  eventLoopUtilization=0.987
  cpuCoreRatio=1
  active=1 waiting=0 queued=0
  work=[active=agent:main:discord:direct:1499113075284901960 (processing, q=1, age=28s)]

The 24.5 s blocked window aligns with the ~22.9 s buildModelsProviderData runtime. Discord's 3 s ack deadline runs on the same loop, so the deferral never gets sent → "This interaction failed."

AFTER (this PR's prepared-state with workspace/agentId fixes — gateway restarted with the branch applied, warmer ran at startup, /models exercised 12×):

22:01:07  warmCurrentProviderAuthState done in 49,203 ms providers=40   ← one-time
22:01:16   3 ms  preparedAuth=true agentId=main
22:01:19   4 ms  preparedAuth=true agentId=main
22:01:22   6 ms  preparedAuth=true agentId=main
22:01:24   6 ms  preparedAuth=true agentId=main
22:01:28   1 ms  preparedAuth=true agentId=main
22:01:33   2 ms  preparedAuth=true agentId=main
22:01:40   4 ms  preparedAuth=true agentId=main
22:01:46   6 ms  preparedAuth=true agentId=main
22:01:48   5 ms  preparedAuth=true agentId=main
22:01:51   5 ms  preparedAuth=true agentId=main
22:01:53   5 ms  preparedAuth=true agentId=main
22:01:54   5 ms  preparedAuth=true agentId=main
Metric ms
Hot-path call n=12 min 1
Hot-path call n=12 median ~5
Hot-path call n=12 max 6
Hot-path call n=12 mean ~4.3
One-time gateway-startup warm cost 49,203

Observed result after fix:

Scenario BEFORE AFTER Speedup
First /models after restart 54,812 ms ~5 ms (post-warm) ~10,000×
Every subsequent /models ~20,569 ms ~5 ms ~4,100×
One-time startup warm cost n/a 49,203 ms

Discord behavior: ack lands inside the 3 s deadline on every interaction; no more "This interaction failed." Event-loop liveness warnings (p99 ~24 s) gone.

Real behavior proof — file-change invalidation (fs watcher)

Captured against the latest build on sjf/discord-picker-pagination (fc49e59147). Gateway restarted, startup warm completed, then ~/.openclaw/agents/main/agent/auth-profiles.json was hand-edited three times in ~30 s (edit, delete, re-create) to exercise the chokidar watcher. Each event clears the prepared map and schedules a background re-warm.

/tmp/openclaw/openclaw-2026-05-21.log:

21:16:17 PDT  provider auth state pre-warmed in 23109ms             ← startup
21:16:45 PDT  provider auth state re-warmed (auth-profiles.json change) in 8189ms
21:17:06 PDT  provider auth state re-warmed (auth-profiles.json change) in 8153ms
21:17:17 PDT  provider auth state re-warmed (auth-profiles.json change) in 8202ms
Metric ms
Startup pre-warm (catalog cold) 23,109
Re-warm n=3 min 8,153
Re-warm n=3 mean 8,181
Re-warm n=3 max 8,202

Re-warm is ~3× faster than the cold pre-warm because the model catalog is reused across warms — only the per-agent auth-discovery sweep re-runs.

Note on chokidar unlink: deleting auth-profiles.json produces a normal change/unlink event, not an error event. The watcher's error handler is for transient OS-level failures (lost fsevents stream, permission errors); a deliberate rm is handled cleanly and the watcher picks the file up again on re-create.

What was not tested in this gateway run: the auth-failure self-heal hook under live API traffic (requires a real outbound call that 401s — covered by unit tests at src/agents/auth-profiles.markauthprofilefailure.test.ts); clean gateway-shutdown sequence (the watcher handle is now in postReadySidecars so stopPostReadySidecarsAfterCloseStarted closes it, but I didn't capture a shutdown log in this run); fs-watcher invalidation under high write churn.

Verification

  • pnpm test src/agents/model-provider-auth.test.ts — 5/5 pass.
  • pnpm test src/agents/auth-profiles.markauthprofilefailure.test.ts — 13/13 pass (added 2 tests covering the hook fires + survives a throwing listener).
  • Local build (./run.sh build) clean (exit 0).

Discord picker pagination is on the companion PR #85138.

@sjf sjf requested a review from a team as a code owner May 21, 2026 23:27
@sjf sjf force-pushed the sjf/discord-picker-pagination branch from 0acb0f6 to 2c791b9 Compare May 21, 2026 23:27
@openclaw-barnacle openclaw-barnacle Bot added channel: discord Channel integration: discord agents Agent runtime and tooling size: M maintainer Maintainer-authored PR labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR removes the prepared provider-auth TTL, keys warmed auth by agentId, changes model auth checks to use agentId, and adds auth-failure plus auth-profiles.json watcher invalidation with tests.

Reproducibility: yes. Current main has the 10-second singleton prepared-auth state and scope guards that can make /models fall back to the slow provider-auth sweep, and the PR body includes before/after gateway logs; I did not run a live gateway in this read-only review.

PR rating
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🦐 gold shrimp
Summary: Useful bug-fix direction, but the PR remains blocked by incomplete real invalidation proof and a reload-lifecycle defect in the watcher wiring.

Rank-up moves:

  • Fix the auth-profile watcher so hot reloads cannot stop it permanently or leave it watching stale agent paths.
  • Add redacted real gateway proof for auth-failure invalidation and auth-profiles.json file-change invalidation.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: Copied live gateway logs show the hot /models speedup, but the PR still needs redacted after-fix logs, terminal output, or recording proof for auth-failure and auth-profiles.json invalidation. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Mantis proof suggestion
A real Discord /models run with redacted gateway logs would materially help verify the user-visible timeout fix, even though focused logs remain better for auth invalidation proof. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify Discord /models returns before timeout after provider-auth prewarm and capture redacted gateway logs showing preparedAuth=true.

Risk before merge

  • The PR changes cache freshness from TTL-backed refresh to event-driven invalidation, so stale auth answers now depend on every invalidation and watcher lifecycle path being correct.
  • The new watcher is tied to startup config and the shared post-ready sidecar list; config hot reloads that restart Gmail sidecars or change agent auth paths can leave auth-profile invalidation disabled until gateway restart.
  • Per-agent warming can add significant post-ready work for multi-agent configs, and the companion non-blocking slow-path PR remains separate at perf(models): make provider auth checks non-blocking #85152.
  • The PR body has real hot-path timing logs, but not real after-fix proof for auth-failure invalidation or auth-profiles.json watcher invalidation.

Maintainer options:

  1. Make auth watching reload-aware (recommended)
    Track the auth-profile watcher outside the Gmail sidecar lifecycle, refresh it when agent ids or auth paths change, and cover that behavior with focused tests before merge.
  2. Accept event-driven cache semantics explicitly
    Maintainers can choose to accept the no-TTL cache only after owning the stale-auth edge cases and documenting the invalidation model.
  3. Hold for the non-blocking slow path
    Pause this PR if maintainers want the prepared-cache fix to land together with the stacked slow-path availability work in perf(models): make provider auth checks non-blocking #85152.

Next step before merge
Needs contributor real behavior proof plus maintainer review of the event-driven auth-cache lifecycle; automation cannot supply the contributor's gateway invalidation evidence.

Security
Cleared: No concrete supply-chain, permission, credential-storage, or secret-exposure regression was found in the diff; the remaining concerns are auth-cache correctness and availability.

Review findings

  • [P2] Keep auth-profile watching reload-aware — src/gateway/server-startup-post-attach.ts:1046-1054
Review details

Best possible solution:

Keep the agentId-scoped prewarm direction, but make auth invalidation own a reload-aware watcher lifecycle and require redacted real gateway proof for both hot-path speedup and auth-state invalidation before merge.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main has the 10-second singleton prepared-auth state and scope guards that can make /models fall back to the slow provider-auth sweep, and the PR body includes before/after gateway logs; I did not run a live gateway in this read-only review.

Is this the best way to solve the issue?

Not yet. The agentId-scoped prewarm is the right boundary for the cache miss, but the no-TTL event-driven design needs reload-aware watcher lifecycle and real invalidation proof before it is the maintainable fix.

Label justifications:

  • P2: This is a normal-priority gateway/model-auth performance and correctness fix with meaningful but limited blast radius.
  • merge-risk: 🚨 compatibility: The PR changes the internal model auth checker contract from agentDir to agentId and removes TTL-based cache refresh behavior.
  • merge-risk: 🚨 auth-provider: The PR changes how provider auth availability is cached and invalidated, so stale answers can affect model/provider visibility.
  • merge-risk: 🚨 availability: Prepared-cache misses and per-agent warming can still affect gateway responsiveness, especially before the companion non-blocking slow-path work lands.
  • rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🦐 gold shrimp, and Useful bug-fix direction, but the PR remains blocked by incomplete real invalidation proof and a reload-lifecycle defect in the watcher wiring.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: Copied live gateway logs show the hot /models speedup, but the PR still needs redacted after-fix logs, terminal output, or recording proof for auth-failure and auth-profiles.json invalidation. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Full review comments:

  • [P2] Keep auth-profile watching reload-aware — src/gateway/server-startup-post-attach.ts:1046-1054
    This watcher is built once from cfgAtStart and then added to postReadySidecars. Current hot reload stops that shared sidecar list when the Gmail watcher restarts, and config reload can also change agent dirs before warmCurrentProviderAuthState(nextConfig) runs, so the new auth watcher can be stopped or left watching old paths while the no-TTL auth map stays dependent on it. Please give the auth-profile watcher its own reload-aware lifecycle or recreate it when the watched agent auth paths change.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.82

What I checked:

  • Protected PR state: The provided live GitHub context shows this PR has the protected maintainer label, so cleanup review must keep it open for explicit maintainer handling.
  • Current main prepared-auth baseline: Current main keeps a singleton prepared auth state with a 10-second TTL and a default-agent workspace guard, matching the performance miss the PR is trying to fix. (src/agents/model-provider-auth.ts:29, e399a92e6cc9)
  • PR watcher wiring: The PR wires setAuthProfileFailureHook, creates a watcher from params.cfgAtStart, and pushes that watcher into postReadySidecars. (src/gateway/server-startup-post-attach.ts:1046, fc49e5914731)
  • Current reload lifecycle conflict: Current main's hot reload path calls stopPostReadySidecars() when restarting the Gmail watcher, while the same reload handler rewarms provider auth with nextConfig; a startup-bound auth watcher in the shared sidecar list can therefore be stopped or left on old auth-profile paths. (src/gateway/server-reload-handlers.ts:432, e399a92e6cc9)
  • History provenance: git blame and git log --follow show the central prepared-auth, gateway startup, and reload lifecycle code in this checkout all trace to commit 48bb3b0a74761fa50e2ff5367132595dcff64121. (src/agents/model-provider-auth.ts:1, 48bb3b0a7476)
  • Related stacked work: The PR body and related context point to perf(models): make provider auth checks non-blocking #85152 as the companion non-blocking provider-auth slow-path work, so this PR still carries availability risk when the prepared map misses.

Likely related people:

  • steipete: Git blame and shortlog show Peter Steinberger authored the current-main prepared provider-auth, auth-profile failure, gateway startup, and reload lifecycle code in commit 48bb3b0a74761fa50e2ff5367132595dcff64121. (role: introduced behavior and recent area contributor; confidence: high; commits: 48bb3b0a7476; files: src/agents/model-provider-auth.ts, src/agents/auth-profiles/usage.ts, src/gateway/server-startup-post-attach.ts)
  • 100menotu001: The commit that introduced the current prepared-auth and gateway lifecycle surfaces is titled with thanks to @100menotu001, so they may have useful context even though the local commit author is Peter Steinberger. (role: credited contributor on introducing commit; confidence: medium; commits: 48bb3b0a7476; files: src/agents/model-provider-auth.ts, src/gateway/server-startup-post-attach.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against e399a92e6cc9.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@sjf sjf force-pushed the sjf/discord-picker-pagination branch from 2c791b9 to 8735f04 Compare May 21, 2026 23:37
@sjf sjf changed the title Remove ttl on auth config. Prewarm prepared config for each agent. Key by agent ID instead of agent dir fix(models): provider auth pre-warm + Discord picker pagination May 22, 2026
@sjf sjf force-pushed the sjf/discord-picker-pagination branch from 8735f04 to 5ed8a63 Compare May 22, 2026 00:29
@openclaw-barnacle openclaw-barnacle Bot added size: S and removed channel: discord Channel integration: discord size: M labels May 22, 2026
@sjf sjf changed the title fix(models): provider auth pre-warm + Discord picker pagination fix(models): provider auth pre-warm May 22, 2026
@sjf

sjf commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper review

@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added the proof: 🎥 video Contributor real behavior proof includes video or recording evidence. label May 22, 2026
@sjf

sjf commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

Re: the P1 "Preserve agent-scoped auth in picker flows" finding — I don't think this is a real concern. Walked every production caller of promptDefaultModel / promptModelAllowlist:

src/wizard/setup.ts:632       → no agentDir, no agentId, workspaceDir from setup
src/wizard/setup.ts:691       → no agentDir, no agentId, workspaceDir from setup
src/commands/configure.gateway-auth.ts:213 → no agentDir, no agentId, workspaceDir = resolveDefaultAgentWorkspaceDir()
src/commands/configure.gateway-auth.ts:265 → no agentDir, no agentId, workspaceDir = resolveDefaultAgentWorkspaceDir()

No production caller passes agentDir to the picker — never has. They all run in wizard/configure contexts that operate before any specific agent is in scope. So:

  • Before this PR: picker forwarded agentDir: undefinedcreateProviderAuthChecker({ agentDir: undefined })hasAuthForModelProvider's old params.agentDir === undefined guard passed → hit the default-agent prepared map.
  • After this PR: picker passes no agentIdresolvePreparedStateForCaller falls back to resolveDefaultAgentId(cfg) → hits the default-agent prepared map.

Same path, same answer. The picker's own agentDir param is retained for unrelated uses (plugin-runtime gating at src/flows/model-picker.ts:516, 796); only the auth-checker side stopped consuming it.

If a future caller ever needs agent-scoped picker auth, the fix is additive: thread agentId?: string through PromptDefaultModelParams and pass it to createProviderAuthChecker. Not a regression introduced by this PR.

When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on #85125 without
reintroducing the TTL.
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: M and removed size: S labels May 22, 2026
@clawsweeper clawsweeper Bot added the merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. label May 22, 2026
@sjf

sjf commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper review

@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

sjf added 3 commits May 21, 2026 20:37
…e visible

Adds a chokidar watcher on every configured agent's auth-profiles.json.
Any change fires clearCurrentProviderAuthState so the next model-listing
call recomputes against the on-disk auth state. Closes the stale-FALSE
direction (user adds auth via codex login, hand-edit, etc.) that the
auth-failure hook can't catch on its own.
@sjf sjf force-pushed the sjf/discord-picker-pagination branch from 4b1e017 to 8ad489a Compare May 22, 2026 03:38
@sjf

sjf commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper review

@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot removed the proof: 🎥 video Contributor real behavior proof includes video or recording evidence. label May 22, 2026
…nvalidate

Addresses three ClawSweeper findings on the fs-watcher commit:

- [P1] auth-profile watcher now handles chokidar 'error' events (logs +
  closes once) mirroring the gateway config-reload pattern. Without
  this, an unhandled error from chokidar can crash the gateway.

- [P2] auth-profile watcher handle is pushed into postReadySidecars so
  stopPostReadySidecarsAfterCloseStarted closes it on gateway shutdown.

- [P2] auth-failure and file-change invalidation paths now schedule a
  background rewarm (with a 'reason=' log line). Without this, the next
  /models call after an invalidation paid the slow per-provider path
  until the next reload. The warmer's existing generation counter
  handles concurrent rewarms safely.
@sjf

sjf commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper review

@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@sjf sjf merged commit 55cfe00 into main May 22, 2026
105 checks passed
sjf added a commit that referenced this pull request May 22, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on #85125 without
reintroducing the TTL.
@sjf sjf deleted the sjf/discord-picker-pagination branch May 22, 2026 04:52
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
When markAuthProfileFailure observes an auth failure at request time
(token rotated, OAuth revoke, etc.), fire a hook that clears the
prepared provider-auth map so the next model-listing call recomputes
against the real auth state. Single mutable hook slot wired up at
gateway startup; no TTL or polling.

Addresses ClawSweeper's P1 freshness finding on openclaw#85125 without
reintroducing the TTL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant