Skip to content

perf(plugins,gateway): thread metadata snapshot + discovery through hot paths + plugin owner fixes#84649

Merged
RomneyDa merged 5 commits into
mainfrom
dallin-perf-startup-trace
May 24, 2026
Merged

perf(plugins,gateway): thread metadata snapshot + discovery through hot paths + plugin owner fixes#84649
RomneyDa merged 5 commits into
mainfrom
dallin-perf-startup-trace

Conversation

@RomneyDa

@RomneyDa RomneyDa commented May 20, 2026

Copy link
Copy Markdown
Member

Threading + correctness slice of the original plugin cold-start performance work.

Context

The original PR also rewrote the loadPluginMetadataSnapshot memo (LRU, symmetric memoKeys, watched-file freshness, eligibility broadening). That work has been superseded by two PRs from @steipete that landed on main:

  • 12f82270c perf: cache stable gateway metadata — process-scoped install-records cache, switches channel-catalog-registry to use discovery candidate metadata instead of re-loading full plugin manifests
  • 4314674054 perf: reuse plugin metadata snapshots (perf: reuse plugin metadata snapshots #85843) — array-of-8 memo, removes per-call watched-file freshness machinery, adds resolvePluginMetadataSnapshot that consults the current-snapshot handoff, drops ~370 lines from plugin-metadata-snapshot.ts

Those PRs implement the same architectural direction this branch was going (and aligned with the new src/plugins/CLAUDE.md rule: "gateway plugin metadata is stable while gateway runs. Reuse current snapshots ... avoid per-call stat/read/hash freshness"). The memo commits from this branch were dropped during rebase; only the orthogonal pieces remain.

What's in the PR

1. PluginMetadataSnapshot + discovery threaded through hot paths

resolvePluginProviders, isPluginProvidersLoadInFlight, resolveOwningPluginIdsForProvider, and resolveExternalAuthProfilesWithPlugins all dispatch to helpers that have a manifestRegistry fast path. They now thread the snapshot through so the fast path actually fires instead of every helper rebuilding the manifest registry from disk. loadPluginManifestRegistryForInstalledIndex fires drop ~685 → ~10 per resolve chain.

A second plumb adds discovery?: PluginDiscoveryResult to PluginMetadataSnapshot and threads it through applyPluginAutoEnabledetectPluginAutoEnableCandidatesresolvePluginAutoEnableReadinesscollectConfiguredChannelIds → the bundled-channel leaf helpers, so the caller's already-resolved discovery short-circuits the cascade.

This is independent of memoization — it works because we don't redundantly call loadPluginMetadataSnapshot within a single resolve chain, and helpers reuse the manifest registry the entry-point already resolved.

2. Discovery threading through gateway applyPluginAutoEnable call sites

10+ gateway call sites now pass discovery into applyPluginAutoEnable so the bundled-channel cascade short-circuits:

  • Startup-time: server-plugin-bootstrap, server-startup-plugins, server-startup-config, server-plugins (two sites)
  • Per-RPC: server-methods/channels (status + start), server-methods/send, server.impl getRuntimeConfig callback

Per-RPC sites source discovery via getCurrentPluginMetadataSnapshot keyed on the runtime config; falls through to the original slow path when the snapshot is absent or policy-incompatible.

3. setup.cliBackends owner regression fix

75ab995b — the rewritten single owner-scan in resolveOwningPluginIdsForProvider was missing plugin.setup?.cliBackends. Pre-PR the no-registry fallback went through resolvePluginContributionOwners({ contribution: "cliBackends" }) which includes both top-level and setup CLI backends; the replacement scan needed the same coverage. Includes regression test in providers.test.ts.

4. Workspace inheritance fix

67ee6315isPluginProvidersLoadInFlight and resolvePluginProviders now resolve env and workspaceDir once at entry (falling back to getActivePluginRegistryWorkspaceDir) and pass them into both loadPluginMetadataSnapshot and resolvePluginProviderLoadBase. Pre-fix the snapshot used params.workspaceDir raw while the load base inherited the active workspace, so workspace-scoped provider plugins could be absent from the snapshot manifest registry. Includes regression test.

5. Test isolation for the snapshot auto-fetch

1abe536f — the threading commit added an auto-fetch fall-through in resolveOwningPluginIdsForProvider. Without a mock, providers tests would hit real disk reads. Adds vi.doMock("./plugin-metadata-snapshot.js") to wrap the existing fixture. Also clears the memo in io.write-config.test.ts afterEach and in resetGatewayModelPricingCacheForTest.

Verification

  • pnpm tsgo --noEmit --project tsconfig.core.json — clean
  • node scripts/run-vitest.mjs src/plugins/plugin-metadata-snapshot.memo.test.ts src/plugins/plugin-registry-snapshot.test.ts src/plugins/providers.test.ts src/plugins/runtime/load-context.test.ts src/config/io.write-config.test.ts src/gateway/server-plugin-bootstrap.test.ts src/gateway/server-startup-config.recovery.test.ts src/gateway/model-pricing-cache.test.ts — 201/201 pass

Not run: full pnpm test / pnpm check, sibling-process plugin-install interleaving.

Perf notes

Earlier measurements on this branch (pre-rebase, when the memo commits were still in) showed ~5s cold-start gateway improvement. With the memo work now provided by 4314674054 on main, re-measurement should attribute the cold-start win primarily to the threading + discovery short-circuit (this PR) layered on top of the memo (already on main). Warm-start gateway boot is bottlenecked on runProviderCatalog (network) and is unaffected by either set of changes.

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 24, 2026, 4:36 PM ET / 20:36 UTC.

Summary
This PR threads plugin metadata snapshots and discovery through plugin provider, auto-enable, channel presence, and gateway call paths, plus focused provider-owner and test-isolation fixes.

PR surface: Source +156, Tests +47. Total +203 across 27 files.

Reproducibility: no. live current-main reproduction was run in this read-only review. Source inspection and the PR's profiling/proof context show the repeated metadata/discovery seams being addressed, but the exact cold-start delta was not remeasured here.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster ✨ media proof bonus
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Have a maintainer explicitly accept or reduce the provider-auth compatibility risk before merge.

Risk before merge

  • Provider owner and external-auth profile resolution now rely more heavily on metadata snapshot manifest registries; unusual workspace-scoped, setup.cliBackends, or external-auth plugin shapes are compatibility-sensitive even with the added focused regression test.
  • Gateway per-RPC and startup auto-enable paths now reuse current snapshot discovery when compatible, so snapshot compatibility and plugin-install or registry-refresh interleaving remain merge-relevant upgrade proof areas.

Maintainer options:

  1. Land with explicit risk acceptance (recommended)
    Merge once maintainers are satisfied that the green provider/gateway checks and focused owner tests cover the compatibility-sensitive paths changed here.
  2. Request install and registry interleaving proof
    Ask for a targeted plugin install, registry refresh, or stale persisted-registry scenario before merge if snapshot freshness is the deciding concern.
  3. Pause for plugin runtime owner review
    Hold the PR for a plugin-runtime owner to inspect the provider owner scan and external-auth profile path before accepting the change.

Next step before merge
Protected maintainer PR with compatibility-sensitive provider/gateway behavior should be handled by human maintainer review rather than an automated repair lane.

Security
Cleared: The diff stays in plugin metadata, discovery, gateway/provider wiring, and tests, with no dependency, workflow, secret, permission, or supply-chain change found.

Review details

Best possible solution:

Land the snapshot/discovery threading after maintainers accept the provider-auth compatibility tradeoff and the current green checks/proof are considered enough for this hot path.

Do we have a high-confidence way to reproduce the issue?

No live current-main reproduction was run in this read-only review. Source inspection and the PR's profiling/proof context show the repeated metadata/discovery seams being addressed, but the exact cold-start delta was not remeasured here.

Is this the best way to solve the issue?

Yes, the approach fits the repository direction because it carries prepared snapshot and discovery facts through existing seams instead of adding broad filesystem caches. The remaining question is maintainer risk acceptance, not a different repair shape.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 8bf4f7d4a88d.

Label changes

Label justifications:

  • P2: This is a normal-priority gateway/plugin cold-start performance and correctness PR with limited but nontrivial hot-path blast radius.
  • merge-risk: 🚨 compatibility: The PR changes plugin metadata snapshot reuse, discovery threading, and provider owner resolution that existing plugin setups depend on during startup and RPC handling.
  • merge-risk: 🚨 auth-provider: The diff touches provider owner matching and external-auth profile plugin resolution, so missed manifest or workspace edge cases could affect auth-provider routing.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (screenshot): The existing PR context and proof labels indicate screenshot-backed cold-start/profile evidence plus successful focused verification for the changed behavior.
  • proof: sufficient: Contributor real behavior proof is sufficient. The existing PR context and proof labels indicate screenshot-backed cold-start/profile evidence plus successful focused verification for the changed behavior.
  • proof: 📸 screenshot: Contributor real behavior proof includes screenshot evidence. The existing PR context and proof labels indicate screenshot-backed cold-start/profile evidence plus successful focused verification for the changed behavior.
Evidence reviewed

PR surface:

Source +156, Tests +47. Total +203 across 27 files.

View PR surface stats
Area Files Added Removed Net
Source 26 298 142 +156
Tests 1 47 0 +47
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 27 345 142 +203

What I checked:

  • Protected maintainer PR: Live pull metadata shows authorAssociation MEMBER, the maintainer label, head f160425, and current merge-risk labels, so cleanup review should not close it automatically. (f16042590fc6)
  • Provider owner scan changed: The PR replaces the no-registry provider/contribution fallback with a metadata snapshot manifest-registry scan and includes providers, top-level cliBackends, and setup.cliBackends in the match. (src/plugins/providers.ts:520, f16042590fc6)
  • Provider runtime snapshot threading: resolvePluginProviders and isPluginProvidersLoadInFlight now resolve one metadata snapshot, inherit workspaceDir, and pass the snapshot to setup/runtime provider load-state helpers. (src/plugins/providers.runtime.ts:310, f16042590fc6)
  • Gateway discovery reuse: Gateway request/startup call sites pass current snapshot manifestRegistry and discovery into applyPluginAutoEnable, matching the scoped plugin/gateway hot-path rule to carry prepared facts forward. (src/gateway/server.impl.ts:837, f16042590fc6)
  • Scoped policy fit: The plugin scoped guide says gateway plugin metadata is stable while the gateway runs and should reuse current snapshots, discovery, lookup tables, and bounded process caches instead of per-call filesystem freshness. (src/plugins/AGENTS.md:13, 8bf4f7d4a88d)
  • CI and proof state: Head f160425 has relevant provider/gateway/check/type/build/security checks completing successfully, and the proof workflow reports success. (f16042590fc6)

Likely related people:

  • steipete: Peter Steinberger authored the merged metadata snapshot reuse PR referenced by this PR and earlier provider auto-load/plugin auto-enable work in the affected paths. (role: feature-history owner; confidence: high; commits: 431467405486, fff7e610df31, 1afa076cfaf5; files: src/plugins/plugin-metadata-snapshot.ts, src/plugins/providers.ts, src/config/plugin-auto-enable.shared.ts)
  • vincentkoc: Vincent Koc has recent merged history centralizing provider scope, manifest owner trust policy, and provider load narrowing used by this PR's owner-resolution path. (role: adjacent provider/plugin-contract owner; confidence: high; commits: 12db6dfc8df6, 6a189eec0b1c, 0fd9aa8e001e; files: src/plugins/providers.ts, src/plugins/providers.runtime.ts, src/plugins/plugin-metadata-snapshot.ts)
  • gumadeiras: Gustavo Madeira Santana has merged work on gateway startup/runtime seams and preserving auto-enabled plugin config through startup, which overlaps the gateway auto-enable call sites touched here. (role: adjacent gateway/runtime contributor; confidence: medium; commits: 8de63ca26825, 470d6aee0f76, 6f4d13f3bd49; files: src/gateway/server.impl.ts, src/config/plugin-auto-enable.shared.ts, src/gateway/server-startup-config.ts)
  • RomneyDa: Dallin Romney owns this performance branch, has detailed maintainer review notes on the provider/discovery tradeoffs, and split related plugin cold-start work into companion branches in the PR discussion. (role: active PR/domain owner; confidence: medium; commits: c4663a1bc708, 75ab995b382e, f16042590fc6; files: src/plugins/providers.ts, src/plugins/providers.runtime.ts, src/gateway/server.impl.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Sunspot Proofling

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: guards the happy path.
Image traits: location proof lagoon; accessory shell-shaped keyboard; palette plum, gold, and soft gray; mood celebratory; pose waving from a small platform; shell soft speckled shell; lighting bright celebratory glints; background quiet workflow signs.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Sunspot Proofling in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot added P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels May 20, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: M and removed gateway Gateway runtime size: L labels May 20, 2026
@RomneyDa

Copy link
Copy Markdown
Member Author

Pulled the installRecords threading bits out of this PR — they were:

  • adding installRecords?: Record<string, PluginInstallRecord> to PluginLoadOptions
  • making resolvePluginLoadCacheContext and buildProvenanceIndex use params.installRecords ?? loadInstalledPluginIndexInstallRecordsSync(...)
  • populating installRecords on the runtime load context + load-options builder from snapshot.index

That work eliminated ~264 loadInstalledPluginIndexInstallRecordsSync calls per cold start (~140ms of disk I/O) but didn't measurably move the headline freeze, so it's parked for a follow-on PR rather than mixed into this one.

Branch: dallin-perf-install-records-threading — branched off this PR's tip before the revert, so the diff against the post-merge state of this PR is exactly the install-records hunks.

@RomneyDa

Copy link
Copy Markdown
Member Author

@clawsweeper reconsider labels, explain the compatibility risk one or remove it.

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper is taking a look at your question.

I asked ClawSweeper to answer this maintainer mention in the next review comment. Tiny claws, bounded scope: this is a read-only assist pass unless it produces one of the existing structured safe-action markers.

Request: reconsider labels, explain the compatibility risk one or remove it.

@RomneyDa RomneyDa self-assigned this May 20, 2026
@RomneyDa

RomneyDa commented May 20, 2026

Copy link
Copy Markdown
Member Author

Addressing the three review concerns:

F1 — derived-cache invalidation gap. Resolved with watched-files widening + post-get freshness check.

  • For derived snapshots, the cached entry's watchedFiles now includes the derived index's plugin manifest paths (manifestPath, source, setupSource, package.json). Gathered via collectWatchedFilesForDerivedIndex — cheap in-memory walk over index.plugins, no disk reads.
  • At the cache-hit site, hashWatchedFiles(state.watchedFiles) is re-computed and compared against the stored hash. Stale entry → evict + re-derive.
  • The memoKey stays the lookup-time (persisted-state) key, so hit symmetry is preserved.
  • New regression test: refreshes derived snapshots when a derived-only plugin manifest changes.

F2 — resolveOwningPluginIdsForProvider slow-path replacement. Existing tests in providers.test.ts already cover both code paths (expectOwningPluginIds(\"claude-cli\", [\"anthropic\"]) exercises the cliBackends shape; dynamic-provider tests cover the providers shape). The remaining gap was test setup: the mock chain didn't mock loadPluginMetadataSnapshot, so the new auto-fetch fall-through was hitting real disk reads and returning empty registries — that's the 9 test failures the external reviewer flagged. Added a vi.doMock(\"./plugin-metadata-snapshot.js\", ...) that wraps the existing loadPluginManifestRegistryMock fixture. All 55 providers tests now pass.

F4 — eager discoverOpenClawPlugins in resolvePluginRuntimeLoadContext. Properly plumbed instead of removed.

The eager fetch was net-positive (saved ~34 downstream discoveries by short-circuiting the bundled-channel cascade), so just deleting it gave back a chunk of the headline freeze win. Fixed instead by capturing the discovery during snapshot building and reusing it at the consumer:

  • resolveInstalledPluginIndexRegistry already constructed the discovery internally; now returns it alongside registry and candidates.
  • New loadInstalledPluginIndexWithDiscovery sibling to loadInstalledPluginIndex — both share buildInstalledPluginIndex which captures discovery.
  • PluginRegistrySnapshotResult gets optional discovery; the derived branch forwards it.
  • PluginMetadataSnapshot type gets discovery?: PluginDiscoveryResult; loadPluginMetadataSnapshotImpl attaches it.
  • resolvePluginRuntimeLoadContext reads metadataSnapshot.discovery and passes it to applyPluginAutoEnable — no fresh scan.

Nothing is synthesized or derived; the discovery is the same object reference that was created during resolveInstalledPluginIndexRegistry, threaded back up through each layer's return type, and read at the consumer. When the snapshot is loaded from the persisted cache (no fresh discovery), metadataSnapshot.discovery is undefined and the downstream auto-enable chain handles its own discovery fallback — same as pre-PR behavior on that path.

Verification: 93/93 plugin tests pass, typecheck clean. Local cold-start freeze re-checked.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 20, 2026
@RomneyDa RomneyDa force-pushed the dallin-perf-startup-trace branch from 406acb4 to 728b97d Compare May 21, 2026 22:28
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper assist is taking a look at your question.

I queued a lightweight read-only assist pass. It will post a separate answer comment and will not edit the durable ClawSweeper review comment or trigger close, merge, repair, label, or branch changes.

Request: update labels, provide explanation for any of the merge risk ones you added, or remove them

@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper assist: the current labels already reflect the requested update: the merge-risk labels were removed after the re-review of commit 406acb4c45e85b4e17bd6351e0a3b576a152e00b. The PR is currently labeled rating: 🐚 platinum hermit and status: 👀 ready for maintainer look.

Evidence:

  • merge-risk: 🚨 compatibility was removed at 2026-05-21T14:21:55Z.
  • merge-risk: 🚨 auth-provider was removed at 2026-05-21T14:22:00Z.
  • The rating was changed from rating: 🧂 unranked krab to rating: 🐚 platinum hermit at 2026-05-21T14:21:03Z / 14:21:06Z.
  • The status was changed from status: ⏳ waiting on author to status: 👀 ready for maintainer look at 2026-05-21T14:21:29Z / 14:21:42Z.
  • The risk labels were originally tied to provider owner matching: the PR had replaced the fallback owner path and initially missed plugin.setup?.cliBackends; commit 406acb4c45 added that case and a setup-only regression test.

Suggested next action: no label action is indicated from the supplied context; for a fresh correctness pass on head 728b97da7d281df048a6b52b343f101012f6e613, use @clawsweeper review.


Source: #84649 (comment)
Assist model: gpt-5.5, reasoning low.

@RomneyDa RomneyDa force-pushed the dallin-perf-startup-trace branch from 728b97d to bd58a70 Compare May 22, 2026 19:37
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 22, 2026

@RomneyDa RomneyDa left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refreshed inline notes after rebasing past #85201 (Vincent's derived-snapshot freshness fix). Previous review comments were anchored to pre-rebase SHAs and have been replaced.

The PR now coexists with #85201: persisted snapshots get the cache-busting + LRU treatment, derived snapshots stay out of the memo per #85201's correctness rule.

Comment thread src/plugins/plugin-metadata-snapshot.ts Outdated
Comment thread src/plugins/plugin-metadata-snapshot.ts Outdated
Comment thread src/plugins/plugin-metadata-snapshot.ts
Comment thread src/plugins/plugin-metadata-snapshot.ts Outdated
Comment thread src/plugins/plugin-metadata-snapshot.ts Outdated
Comment thread src/plugins/providers.ts
Comment thread src/plugins/providers.ts
Comment thread src/plugins/providers.runtime.ts
Comment thread src/plugins/providers.runtime.ts
Comment thread src/plugins/provider-runtime.ts
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 22, 2026
@RomneyDa RomneyDa force-pushed the dallin-perf-startup-trace branch from 5a990ea to 3afed66 Compare May 22, 2026 21:36
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 22, 2026
@RomneyDa

Copy link
Copy Markdown
Member Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

RomneyDa added 5 commits May 24, 2026 13:28
With the snapshot memo now actually hitting, route the snapshot's
manifestRegistry and discovery through the helper chains that already
had fast paths for them. Eliminates redundant per-call rebuilds at
two big amplifiers.

- Provider resolve paths (resolvePluginProviders /
  isPluginProvidersLoadInFlight / resolveOwningPluginIdsForProvider /
  resolveExternalAuthProfilesWithPlugins) self-service a snapshot once
  at the public entry, then thread it as a separate required arg
  through resolvePluginProviderLoadBase,
  resolveExplicitProviderOwnerPluginIds, and the setup/runtime load
  state helpers. Inner reads change from
  'params.pluginMetadataSnapshot?.x' to 'snapshot.x', no more
  enrichedParams clone. loadPluginManifestRegistryForInstalledIndex
  fires drop ~685 -> ~10 per cold start.

- Bundled-channel / auto-enable chain accepts an optional
  PluginDiscoveryResult. discoverOpenClawPlugins is fired once during
  snapshot building (resolveInstalledPluginIndexRegistry already
  produced it internally; now bubbled up through
  loadInstalledPluginIndexWithDiscovery, PluginRegistrySnapshotResult,
  and onto PluginMetadataSnapshot.discovery). load-context reads
  metadataSnapshot.discovery and passes it through
  applyPluginAutoEnable, so the bundled-channel cascade
  (collectConfiguredChannelIds, listBundledChannelIdsWith*,
  listPotentialConfiguredChannelPresenceSignals) short-circuits
  instead of each leaf re-firing discovery. Persisted-cache path is
  unchanged: no discovery on the snapshot, downstream chain handles
  its own fallback (pre-PR behavior on that path).
…registry

The snapshot memo is now process-scoped and effective (~98% hit rate).
Three test files were depending on cache misses (because the broken
cache returned them) — each test would set up its own
loadPluginManifestRegistry mock and expect a fresh derive. With the
cache fixed, an earlier test's mocked registry now leaks into later
tests in the same file.

- io.write-config.test.ts: afterEach now clears the snapshot memo so
  the 'demo' plugin mocked in the first test does not survive into
  'keeps shipped plugin install config records when index migration
  fails', which expects an empty registry to surface the 'plugin not
  found: demo' warning.

- gateway/model-pricing-cache.ts: resetGatewayModelPricingCacheForTest
  also clears the memo. Tests in model-pricing-cache.test.ts assert
  loadPluginManifestRegistryForInstalledIndex was called; the memo
  hit otherwise skips the call.

- providers.test.ts: vi.doMock loadPluginMetadataSnapshot to wrap the
  existing loadPluginManifestRegistryMock fixture. The plumbing
  commit added an auto-fetch fall-through in
  resolveOwningPluginIdsForProvider; without the mock, providers
  tests hit real disk reads and return empty registries (which is
  what surfaced as 9 unrelated-looking failures in the prior CI
  run).
resolveOwningPluginIdsForProvider now also checks plugin.setup?.cliBackends.
The pre-PR no-registry fallback used resolvePluginContributionOwners which
includes both top-level cliBackends and setup.cliBackends; the PR's manifest
scan replacement was missing the setup case.
…adata snapshot

isPluginProvidersLoadInFlight and resolvePluginProviders now resolve
env and workspaceDir once at the entry point (falling back to
getActivePluginRegistryWorkspaceDir) and pass them into both
loadPluginMetadataSnapshot and resolvePluginProviderLoadBase. Pre-fix
the snapshot used params.workspaceDir raw while the load base inherited
the active workspace, so workspace-scoped provider plugins could be
absent from the snapshot manifest registry even though owner resolution
expected them.

Regression test asserts the snapshot mock receives the active
workspaceDir when the caller omits it.
Every gateway applyPluginAutoEnable call now passes the snapshot's
PluginDiscoveryResult so the bundled-channel cascade (collectConfiguredChannelIds
→ listBundledChannelIdsWith* → listPotentialConfiguredChannelPresenceSignals)
short-circuits instead of each leaf re-firing discovery.

Startup-time sites pull discovery from the snapshot/lookup-table they already
hold:
- server-plugin-bootstrap.ts (pluginLookUpTable)
- server-startup-plugins.ts (pluginMetadataSnapshot)
- server-startup-config.ts (pluginMetadataSnapshot)
- server-plugins.ts (pluginLookUpTable, both call sites)

Per-RPC sites (server.impl getRuntimeConfig callback, server-methods/channels
status + start handlers, server-methods/send) source discovery via
getCurrentPluginMetadataSnapshot using the runtime config to validate
compatibility. Falls through to the original slow path when the snapshot is
absent or incompatible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants