[Bug]: sessions.list O(rows) plugin-metadata scans under concurrency: per-row read of a globally-mutated active workspaceDir (residual of #76562)

### Bug type

Regression (worked before, now fails)

### Beta release blocker

No

### Summary

`sessions.list` (and other per-row control-plane RPCs) becomes O(rows) slow — tens of seconds — **only when other agents/crons are running concurrently**. On an idle gateway it is fast. The cause is not cache size: the per-row plugin-metadata lookup reads a **process-global "active plugin-registry workspace dir"** that concurrent agent-turns/crons mutate while `sessions.list` is `await`-yielding between row batches. The metadata-snapshot memo key includes `workspaceDir`, so it changes on essentially every row, the memo never hits, and each row triggers a fresh full `loadPluginMetadataSnapshot` scan (~100 ms).

This is the residual concurrency facet of the now-closed #76562. That issue was closed as completed after the maintainer could not reproduce it on an idle/quiet 2026.5.28 gateway, but the failure mode persists under real multi-agent load. Filing separately so it is tracked on an open issue with a precise root cause and a fix.


### Steps to reproduce

1. Run a gateway with several concurrent actors (e.g. 1 main agent + a couple of crons actively taking turns).
2. While they are running, issue `sessions.list` (dashboard load, MCP client, or `openclaw` CLI).
3. Observe the call takes ~10 s even though the session store index is tiny (7–13 entries).
4. For contrast, stop all agents/crons (idle gateway) and repeat — the same `sessions.list` returns in milliseconds.

To capture evidence: `OPENCLAW_DIAGNOSTICS=1 OPENCLAW_DIAGNOSTICS_TIMELINE_PATH=/tmp/x.jsonl` and inspect the `gateway.sessions.list` span tree plus `plugins.metadata.scan` spans.


### Expected behavior

A single `sessions.list` resolves plugin metadata **once** and reuses it for all rows, independent of concurrent agent/cron activity. Wall time should track the store size (milliseconds for a small store), not the number of concurrent actors.


### Actual behavior

One `sessions.list` call on a busy gateway (diagnostics timeline):


### OpenClaw version

2026.5.22 (also reproduced against a current `main` source checkout while developing the fix below).

### Operating system

Linux x64

### Install method

Source checkout / development workflow.

### Model

Anthropic-family + OpenAI-compat providers via a proxy; not model-specific — the hot path is plugin model-id normalization, which runs regardless of the routed model.

### Provider / routing chain

Multi-provider config (Anthropic messages API + OpenAI-compat). The slowdown is independent of the routing chain; it is driven by concurrent actors mutating the global active workspace dir, not by any provider call.

### Additional provider/model setup details

Multi-agent gateway: 1 main agent + a secondary agent + several crons.


### Logs, screenshots, and evidence

```shell
Multi-agent gateway: 1 main agent + a secondary agent + several crons.
**Mechanism.** Call chain per row (lightweight list rows still hit this via model-ref/runtime resolution):


buildGatewaySessionRow
  -> ... -> normalizeProviderModelIdWithRuntime
  -> normalizeProviderModelIdWithManifest
  -> resolveManifestModelIdNormalizationPolicy
  -> resolveMetadataSnapshotForPolicies        (src/plugins/manifest-model-id-normalization.ts)
       const workspaceDir = params.workspaceDir ?? getActivePluginRegistryWorkspaceDirFromState();
       const current = getCurrentPluginMetadataSnapshot({ config, env, workspaceDir });
       if (current) return current;
       return loadPluginMetadataSnapshot({ config: config ?? {}, env, workspaceDir });  // full scan


`listSessionsFromStoreAsync` deliberately yields every `SESSIONS_LIST_YIELD_BATCH_SIZE` rows:


if ((i + 1) % SESSIONS_LIST_YIELD_BATCH_SIZE === 0 && i + 1 < entries.length) {
  await new Promise((resolve) => setImmediate(resolve));   // gives concurrent agent-turns the loop
}


During each yield, a concurrent agent-turn/cron calls `setActivePluginRegistry(...)`, setting the **global** active-registry workspace to *its* workspace. So row N reads workspace A, row N+1 reads B, etc. `computePluginMetadataSnapshotMemoKey` includes `workspaceDir`, so the key differs per row, `getCurrentPluginMetadataSnapshot(...)` misses, and `loadPluginMetadataSnapshot(...)` runs a fresh ~100 ms scan. With derived/bundled registries (`registrySource: "derived"`) the result is not stored in the process memo, so the next row cannot reuse it either.

In short: **a single `sessions.list` reads a process-global workspace per row, and that global is mutated underneath it by concurrent work across its own `await` points** → O(rows) full plugin-metadata scans. This is why an idle benchmark looked fixed by #76655 but real multi-agent gateways still pin CPU.
```

### Impact and severity

High under load: control-plane RPCs (`sessions.list`, and any per-row resolver sharing this path) take tens of seconds and saturate the single gateway event loop, degrading UI/WebSocket responsiveness and channel turn latency for everyone on the gateway. Scales with both row count and concurrent-actor count.


### Additional information

**Suggested fix (implemented and validated locally).** Pin the active plugin-registry workspace dir for the duration of the row-building batch so every row in one `sessions.list` reads a stable value, immune to concurrent global mutation, while other concurrent async contexts still observe the live global. `AsyncLocalStorage` scopes the pin to the batch's async context only:

```ts
// runtime-workspace-state.ts
const pinnedWorkspaceDirStorage = new AsyncLocalStorage<{ workspaceDir: string | undefined }>();

export function getActivePluginRegistryWorkspaceDirFromState(): string | undefined {
  const pinned = pinnedWorkspaceDirStorage.getStore();
  if (pinned) return pinned.workspaceDir;
  return (globalThis as ...)[PLUGIN_REGISTRY_STATE]?.workspaceDir ?? undefined;
}

export function withPinnedActivePluginRegistryWorkspaceDir<T>(fn: () => T): T {
  if (pinnedWorkspaceDirStorage.getStore()) return fn();         // nested: reuse outer pin
  const workspaceDir = (globalThis as ...)[PLUGIN_REGISTRY_STATE]?.workspaceDir ?? undefined;
  return pinnedWorkspaceDirStorage.run({ workspaceDir }, fn);
}
```

…then wrap the `listSessionsFromStoreAsync` row loop (including its inter-batch yields) in `await withPinnedActivePluginRegistryWorkspaceDir(async () => { ... })`. This collapses the O(rows) scans to one regardless of concurrent agent/cron activity.

A concurrency regression test locks it in: fire the pinned scope, mutate the active workspace via `setActivePluginRegistry` mid-scope across a `setImmediate` yield, and assert reads inside the scope stay stable while reads after exit observe the live (mutated) global.

I'm happy to open a PR with this change, and can share the raw diagnostics timeline JSONL if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: sessions.list O(rows) plugin-metadata scans under concurrency: per-row read of a globally-mutated active workspaceDir (residual of #76562) #90814

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: sessions.list O(rows) plugin-metadata scans under concurrency: per-row read of a globally-mutated active workspaceDir (residual of #76562) #90814

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions