fix(infra): wrap provider auth resolution in timeout for status --usage --json#74185
fix(infra): wrap provider auth resolution in timeout for status --usage --json#74185yelog wants to merge 8 commits into
Conversation
Greptile SummaryWraps Confidence Score: 4/5Safe to merge — the change is small, consistent with the existing timeout pattern, and the new test covers the regression path. No P0 or P1 findings. Two P2 style suggestions (complex type in test, missing log on timeout). Score stays at 4/5. No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/infra/provider-usage.load.test.ts
Line: 176-181
Comment:
**Overly complex config type extraction**
The conditional type chain here resolves to `OpenClawConfig`, which is already exported from `../config/config.js` (the same import used in the source file). Importing it directly is cleaner and easier to understand.
```suggestion
config: {} as import("../config/config.js").OpenClawConfig,
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/infra/provider-usage.load.ts
Line: 94-105
Comment:
**No signal on auth timeout**
`withTimeout` silently returns `[]` when auth resolution exceeds `timeoutMs`. At the call site there's no observable difference between "no providers configured" and "auth timed out", which makes this hard to diagnose in the field. Consider emitting a debug/warning log before returning the fallback so operators can distinguish the two cases.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(infra): wrap provider auth resolutio..." | Re-trigger Greptile |
| config: {} as Parameters<typeof loadProviderUsageSummary>[0] extends infer O | ||
| ? O extends { config?: infer C } | ||
| ? NonNullable<C> | ||
| : never | ||
| : never, | ||
| env: {}, |
There was a problem hiding this comment.
Overly complex config type extraction
The conditional type chain here resolves to OpenClawConfig, which is already exported from ../config/config.js (the same import used in the source file). Importing it directly is cleaner and easier to understand.
| config: {} as Parameters<typeof loadProviderUsageSummary>[0] extends infer O | |
| ? O extends { config?: infer C } | |
| ? NonNullable<C> | |
| : never | |
| : never, | |
| env: {}, | |
| config: {} as import("../config/config.js").OpenClawConfig, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/provider-usage.load.test.ts
Line: 176-181
Comment:
**Overly complex config type extraction**
The conditional type chain here resolves to `OpenClawConfig`, which is already exported from `../config/config.js` (the same import used in the source file). Importing it directly is cleaner and easier to understand.
```suggestion
config: {} as import("../config/config.js").OpenClawConfig,
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| const auths = await withTimeout( | ||
| resolveProviderAuths({ | ||
| providers: opts.providers ?? usageProviders, | ||
| auth: opts.auth, | ||
| agentDir: opts.agentDir, | ||
| config, | ||
| env, | ||
| skipPluginAuthWithoutCredentialSource: opts.skipPluginAuthWithoutCredentialSource, | ||
| }), | ||
| timeoutMs, | ||
| [], | ||
| ); |
There was a problem hiding this comment.
withTimeout silently returns [] when auth resolution exceeds timeoutMs. At the call site there's no observable difference between "no providers configured" and "auth timed out", which makes this hard to diagnose in the field. Consider emitting a debug/warning log before returning the fallback so operators can distinguish the two cases.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/provider-usage.load.ts
Line: 94-105
Comment:
**No signal on auth timeout**
`withTimeout` silently returns `[]` when auth resolution exceeds `timeoutMs`. At the call site there's no observable difference between "no providers configured" and "auth timed out", which makes this hard to diagnose in the field. Consider emitting a debug/warning log before returning the fallback so operators can distinguish the two cases.
How can I resolve this? If you propose a fix, please make it concise.|
Codex review: needs changes before merge. Reviewed June 1, 2026, 10:57 PM ET / 02:57 UTC. Summary PR surface: Source +35, Tests +162. Total +197 across 6 files. Reproducibility: Do we have a high-confidence way to reproduce the issue? Source inspection confirms current main has no timeout around resolveProviderAuths and the PR proof shows a non-TTY status command completing after the patch, but I did not run a live expired-OAuth subprocess reproduction in this checkout. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Copy recommended automerge instructionNext step before merge
Security Review findings
Review detailsBest possible solution: Land the auth-resolution timeout while preserving existing OAuth refresh behavior by default, or have provider/auth owners explicitly accept and document cached-only status usage with focused expired-token tests. Do we have a high-confidence way to reproduce the issue? Do we have a high-confidence way to reproduce the issue? Source inspection confirms current main has no timeout around resolveProviderAuths and the PR proof shows a non-TTY status command completing after the patch, but I did not run a live expired-OAuth subprocess reproduction in this checkout. Is this the best way to solve the issue? Is this the best way to solve the issue? No: wrapping auth resolution in the existing timeout is the right layer, but suppressing OAuth refresh for status usage is broader than needed and can remove existing OAuth usage visibility. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 9d9a6140a3e4. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +35, Tests +162. Total +197 across 6 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
d619d4d to
680d2cd
Compare
889bbc8 to
9852b63
Compare
|
Rebased onto latest Validation on the rebased branch: The first formatter attempt was interrupted while pnpm retried optional binary downloads from the configured registry; retrying the same formatter command passed. Remaining merge gate: I still need to add redacted real behavior proof from a non-TTY |
|
After-fix real behavior proof from the PR branch, redacted to command status and JSON shape only. Behavior addressed: non-TTY Evidence after fix: {
"branch": "fix/status-json-usage-auth-timeout-74085",
"head": "9852b6356b6a8369bc4f9e19b71745466d6103c5",
"command": "node scripts/run-node.mjs status --usage --json",
"nonTty": true,
"exitStatus": 0,
"signal": null,
"timedOut": false,
"elapsedMs": 6513,
"stdoutBytes": 2757,
"stderrBytes": 0,
"parsedJson": true,
"topLevelKeys": [
"agents",
"channelSummary",
"gateway",
"gatewayService",
"heartbeat",
"lastHeartbeat",
"memory",
"memoryPlugin",
"nodeService",
"os",
"queuedSystemEvents",
"runtimeVersion",
"secretDiagnostics",
"sessions",
"taskAudit",
"tasks",
"update",
"updateChannel",
"updateChannelSource",
"usage"
],
"hasUsage": true,
"usageKeys": [
"providers",
"updatedAt"
],
"usageProviders": 1,
"stderrPreview": []
}Observed result after fix: The non-TTY subprocess exited successfully in about 6.5s, did not hit the 45s timeout, emitted parseable JSON on stdout, and included a top-level @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
ClawSweeper PR egg ✨ Hatched: 💎 rare Frosted Proofling Hatch commandComment Hatchability rules:
Rarity: 💎 rare. What is this egg doing here?
|
954a9c3 to
0d90f2c
Compare
…ge --json resolveProviderAuths was called without any timeout boundary, so plugin auth hooks (OAuth token refresh, secret exec resolvers) could hang indefinitely in non-TTY subprocess contexts. Wrap the call in the existing withTimeout helper using the same timeoutMs budget so the status JSON command returns empty providers on auth timeout instead of hanging. Fixes openclaw#74085
0d90f2c to
7a24973
Compare
|
Removed the release-owned The remaining merge decision is the auth/provider policy point ClawSweeper called out: status usage now avoids refresh-side effects and may omit expired OAuth-backed usage instead of refreshing credentials. If maintainers want a narrower policy, I can adjust the branch to keep only the auth-resolution timeout boundary. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Summary
resolveProviderAuths()in the existingwithTimeouthelper so auth resolution (OAuth token refresh, plugin auth hooks, secret exec resolvers) cannot hang indefinitelyProblem
openclaw status --usage --jsonhangs when called from a non-TTY subprocess (regression from 2026.4.23 to 2026.4.26). The individual provider usage fetches are bounded bywithTimeout, butresolveProviderAuths()— which iterates sequentially over 7 providers calling plugin auth hooks, OAuth token refresh, and secret ref resolvers — had no timeout boundary. In a non-TTY subprocess, any of these can block indefinitely.Changes
src/infra/provider-usage.load.tsresolveProviderAuths()call at line 94 inwithTimeout(…, timeoutMs, [])using the same timeout budget as the fetch phase[](empty auths) which causesloadProviderUsageSummaryto return{ providers: [] }instead of hangingsrc/infra/provider-usage.load.test.tsresolveProviderAuthsto support testing the timeout pathReal behavior proof
Behavior addressed: non-TTY
status --usage --jsonshould complete instead of hanging while provider usage auth is resolved.Real environment tested: local OpenClaw source checkout on branch
fix/status-json-usage-auth-timeout-74085, head9852b6356b6a8369bc4f9e19b71745466d6103c5.Exact steps or command run after this patch:
Evidence after fix:
{ "branch": "fix/status-json-usage-auth-timeout-74085", "head": "9852b6356b6a8369bc4f9e19b71745466d6103c5", "command": "node scripts/run-node.mjs status --usage --json", "nonTty": true, "exitStatus": 0, "signal": null, "timedOut": false, "elapsedMs": 6513, "stdoutBytes": 2757, "stderrBytes": 0, "parsedJson": true, "topLevelKeys": [ "agents", "channelSummary", "gateway", "gatewayService", "heartbeat", "lastHeartbeat", "memory", "memoryPlugin", "nodeService", "os", "queuedSystemEvents", "runtimeVersion", "secretDiagnostics", "sessions", "taskAudit", "tasks", "update", "updateChannel", "updateChannelSource", "usage" ], "hasUsage": true, "usageKeys": [ "providers", "updatedAt" ], "usageProviders": 1, "stderrPreview": [] }Observed result after fix: The non-TTY subprocess exited successfully in about 6.5s, did not hit the 45s timeout, emitted parseable JSON on stdout, and included a top-level
usageobject with provider usage data.What was not tested: This was a local source checkout proof rather than a packaged install, and it does not print provider names, auth profiles, credentials, or raw usage values.
Fixes #74085