Skip to content

perf(models): make provider auth checks non-blocking#85152

Merged
sjf merged 1 commit into
mainfrom
sjf/discord-picker-async-auth
May 22, 2026
Merged

perf(models): make provider auth checks non-blocking#85152
sjf merged 1 commit into
mainfrom
sjf/discord-picker-async-auth

Conversation

@sjf

@sjf sjf commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Make the provider-auth check path asynchronous without changing the auth decision logic.

When the prepared provider-auth map misses, model-listing surfaces can walk the catalog and compute auth for many providers. Each provider check can do sync filesystem reads and external-CLI discovery. hasAuthForModelProvider now yields once with setImmediate before that slow path, and the existing callers now await the same checker so long provider sweeps release the event loop between providers.

This PR intentionally does not change prepared-auth cache semantics, invalidation, auth-profile failure handling, watchers, TTL behavior, or provider eligibility rules.

Changes

  • hasAuthForModelProvider now returns Promise<boolean>.
  • createProviderAuthChecker now returns an async checker with the same per-provider cache.
  • warmCurrentProviderAuthState awaits the existing provider check while preparing the current auth map.
  • Model catalog visibility, /models provider data, gateway models.list, and model-picker call sites now await the async checker directly.
  • Tests that call these async paths now await the results.

Evidence

Before this change, logs showed slow /models calls when the prepared auth map missed:

14:38:39  buildModelsProviderData done in 22941ms preparedAuth=false agentId=main
14:39:14  buildModelsProviderData done in 22610ms preparedAuth=false agentId=main
14:39:44  buildModelsProviderData done in 22814ms preparedAuth=false agentId=main
14:40:17  buildModelsProviderData done in 22825ms preparedAuth=false agentId=main

The matching gateway diagnostic showed the event loop blocked for the same window:

14:44:14 [diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu
  eventLoopDelayP99Ms=24461.2
  eventLoopDelayMaxMs=24461.2
  eventLoopUtilization=0.987

Verification

  • node scripts/run-vitest.mjs src/agents/model-provider-auth.test.ts src/agents/model-catalog-visibility.test.ts src/agents/model-auth.profiles.test.ts src/agents/model-auth.workspace-plugin.test.ts
    • 8 test files passed, 126 tests passed.
  • node scripts/run-vitest.mjs src/commands/model-picker.test.ts src/auto-reply/reply/commands-models.test.ts src/gateway/server-methods/models.test.ts
    • 4 test files passed, 83 tests passed.
  • Live Discord /models interaction during a prepared-auth miss, to confirm the interaction ack is no longer starved while the provider sweep continues.

@sjf sjf requested a review from a team as a code owner May 22, 2026 01:15
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Latest ClawSweeper review: 2026-05-22 05:34 UTC / May 22, 2026, 1:34 AM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR makes provider-auth/model-catalog helpers Promise-returning, yields once before slow auth discovery, and awaits the checker from model listing, gateway models.list, and model-picker paths.

Reproducibility: no. high-confidence live reproduction was run in this read-only review. Source inspection plus the PR's before logs make the cold-path event-loop starvation plausible, but after-fix Discord /models behavior still needs real proof.

PR rating
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🐚 platinum hermit
Summary: Overall is F because real behavior proof is absent, even though the current patch reads as a clean mechanical async migration with no blocking code finding.

Rank-up moves:

  • Attach redacted after-fix Discord /models cold-path proof with event-loop or liveness diagnostics.
  • Keep the focused auth/catalog/model-listing tests green on the current PR head.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs real behavior proof before merge: Missing: the PR body has before logs and test commands but no after-fix real Discord/gateway proof; add redacted logs, terminal output, screenshots, recordings, or a linked artifact showing /models ack and event-loop diagnostics after this patch, then update the PR body for automatic re-review or ask a maintainer to comment @clawsweeper re-review.

Mantis proof suggestion
A real Discord interaction timeout and event-loop liveness change is best verified with live transport proof plus diagnostics, and no narrower Discord Mantis lane fits this /models cold path. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify Discord /models during provider-auth cold path responds before timeout and captures reduced event-loop delay diagnostics.

Risk before merge

  • The after-fix live Discord /models cold-path behavior is not proven on this PR; the body contains before logs and unit-test commands but leaves the real transport proof unchecked.
  • Provider visibility is auth-sensitive, so a missed await or async cache mistake could hide available providers or expose unavailable model choices.
  • The underlying filesystem and external-CLI probes remain synchronous; yielding between providers mitigates multi-provider starvation but does not prove a single slow provider probe cannot still stall the gateway.

Maintainer options:

  1. Require live cold-path proof (recommended)
    Require redacted after-fix Discord /models proof with liveness or event-loop diagnostics before merge, while keeping the focused async auth/catalog tests green.
  2. Accept maintainer-owned transport proof
    Maintainers could run and own equivalent live gateway proof themselves if they are comfortable accepting the remaining single-probe synchronous risk.
  3. Pause for deeper auth-store async work
    Pause this PR if maintainers want the durable fix to convert auth-profile and external-CLI discovery itself to async rather than yielding between synchronous probes.

Next step before merge
Human handling is needed because the PR is auth-sensitive gateway availability work with a protected maintainer label and missing contributor real behavior proof, not a narrow repairable code defect.

Security
Cleared: Cleared: the diff changes async control flow around existing provider-auth checks and does not add dependencies, downloaded code, secret handling, or broader permissions.

Review details

Best possible solution:

Land the mechanical async/yield conversion only after redacted live Discord /models proof shows the cold miss no longer starves the interaction ack or event loop while focused auth visibility tests remain green.

Do we have a high-confidence way to reproduce the issue?

No high-confidence live reproduction was run in this read-only review. Source inspection plus the PR's before logs make the cold-path event-loop starvation plausible, but after-fix Discord /models behavior still needs real proof.

Is this the best way to solve the issue?

Yes, the current head looks like a narrow mechanical async/yield mitigation that preserves the existing auth decision path. The merge-safe path still requires live proof because this mitigates, rather than removes, synchronous provider probes.

Label justifications:

  • P1: The PR targets a gateway responsiveness path that can break real Discord /models interactions by missing the interaction ack window.
  • merge-risk: 🚨 auth-provider: The diff changes central provider-auth filtering helpers, so missed awaits or cache mistakes could change which authenticated model providers users see.
  • merge-risk: 🚨 availability: The change is meant to reduce event-loop starvation, but after-fix live liveness behavior is not yet proven and slow probes remain synchronous.
  • rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🧂 unranked krab, patch quality is 🐚 platinum hermit, and Overall is F because real behavior proof is absent, even though the current patch reads as a clean mechanical async migration with no blocking code finding.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: Missing: the PR body has before logs and test commands but no after-fix real Discord/gateway proof; add redacted logs, terminal output, screenshots, recordings, or a linked artifact showing /models ack and event-loop diagnostics after this patch, then update the PR body for automatic re-review or ask a maintainer to comment @clawsweeper re-review.

What I checked:

  • Live PR state: Live GitHub API shows the PR is open at head 2393543 with maintainer, P1, merge-risk: auth-provider, merge-risk: availability, and status: needs proof; the body still has the live Discord /models prepared-auth miss check unchecked. (23935436535b)
  • Current main slow path: On current main, hasAuthForModelProvider is synchronous and falls through from the prepared map to runtime auth, auth-profile store, and external-CLI-backed discovery on a miss. (src/agents/model-provider-auth.ts:79, e32e0f3f7f3e)
  • Current model listing callers: Current main model listing paths synchronously call resolveVisibleModelCatalog and per-provider auth filtering while building /models provider data and gateway models.list responses. (src/auto-reply/reply/commands-models.ts:160, e32e0f3f7f3e)
  • PR async conversion: The PR diff changes hasAuthForModelProvider and createProviderAuthChecker to return promises, adds an awaited setImmediate before the slow path, and awaits the checker from model catalog, /models, gateway, and picker call sites. (src/agents/model-provider-auth.ts:79, 23935436535b)
  • Current head test migration: The PR files API shows the current head has migrated the previously sync auth/catalog tests to await/.resolves, so the earlier stale review findings about unawaited tests no longer apply to this head. (src/agents/model-provider-auth.test.ts:73, 23935436535b)
  • Feature history: Blame and log show the current provider-auth/model-listing helpers were introduced in dca9cec and recently reshaped by the merged provider-auth prewarm work in 95343af. (src/agents/model-provider-auth.ts:79, 95343affbbee)

Likely related people:

  • sjf: Auth prewarm, invalidation, and related model-listing performance work on current main came from the same contributor's recent merged commits, and this PR is a follow-up on that exact path. (role: recent area contributor; confidence: high; commits: 95343affbbee, 55cfe00a3a40; files: src/agents/model-provider-auth.ts, src/agents/auth-profiles-watcher.ts, src/gateway/server-startup-post-attach.ts)
  • Dallin Romney: The current model-provider auth helper, catalog visibility helper, /models provider data, gateway models method, and model picker files trace back to the broad commit that introduced these surfaces in the checked-out history. (role: original introducer; confidence: medium; commits: dca9cecaeed9; files: src/agents/model-provider-auth.ts, src/agents/model-catalog-visibility.ts, src/auto-reply/reply/commands-models.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against e32e0f3f7f3e.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

Base automatically changed from sjf/discord-picker-pagination to main May 22, 2026 04:52
@clawsweeper clawsweeper Bot added the merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. label May 22, 2026
@sjf sjf force-pushed the sjf/discord-picker-async-auth branch 2 times, most recently from 1f50e29 to 06d514a Compare May 22, 2026 05:05
@sjf sjf force-pushed the sjf/discord-picker-async-auth branch 2 times, most recently from 5acc480 to 69a4f80 Compare May 22, 2026 05:18
@sjf sjf force-pushed the sjf/discord-picker-async-auth branch from 69a4f80 to 2393543 Compare May 22, 2026 05:19
@sjf

sjf commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

/review

Please review the current PR head for the async provider-auth change. Focus on whether this is a mechanical async conversion only and whether any provider-auth decision logic changed unintentionally.

@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@sjf sjf merged commit 62a330e into main May 22, 2026
111 checks passed
@sjf sjf deleted the sjf/discord-picker-async-auth branch May 22, 2026 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant