fix: singleton DB init per dbPath + fallback provider config#302
Conversation
|
Cause is upstream changes in latest release around launch agent + registering plugins @jalehman we found the same bug in numerous plugins being fired continuously for every session in memory. Patch is two fold designed to prevent DB lockup and prevent provider call storms from LCM. |
There was a problem hiding this comment.
Pull request overview
This PR hardens the Lossless Claw (LCM) OpenClaw plugin against repeated register() calls within the same process by introducing a per-dbPath singleton init, preventing concurrent SQLite migrations/connection storms. It also adds explicit fallback provider configuration for compaction summarization, with improved fallback logging/backoff and circuit-breaker messaging.
Changes:
- Add a process-global singleton map keyed by normalized
dbPathto reuse a single DB connection + engine across repeatedregister()calls. - Introduce
fallbackProvidersconfiguration (env + plugin config) and extend summarization to try explicit fallback provider/model pairs with exponential backoff and clearer stderr logging. - Update circuit-breaker logs and add/adjust tests to cover singleton reuse and new fallback messaging.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/plugin/shared-init.ts | New globalThis + Symbol.for() singleton store for shared init closures per normalized DB path. |
| src/plugin/index.ts | Reuse existing shared init on repeated register(); extract wirePluginHandlers(); clear singleton on gateway_stop; log fallback providers on startup. |
| src/db/connection.ts | Export normalizePath() for consistent singleton keying. |
| src/db/config.ts | Add fallbackProviders to config and parse from LCM_FALLBACK_PROVIDERS / plugin config. |
| src/summarize.ts | Append configured fallback providers into candidate resolution; add provider-fallback stderr logs and exponential backoff; log when all providers are exhausted. |
| src/engine.ts | Add half-threshold early warning and revise circuit-breaker “OPEN” logging. |
| src/startup-banner-log.ts | Add new startup banner key for fallback-providers. |
| test/plugin-config-registration.test.ts | Add tests ensuring singleton reuse across repeated register() and fresh init after gateway_stop; clear shared init in teardown. |
| test/plugin-prompt-hook.test.ts | Clear shared init in teardown to avoid cross-test leakage. |
| test/summarize.test.ts | Update auth-fallback assertion to match new console.error “PROVIDER FALLBACK” messaging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1b8f90e to
d14af62
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
## Problem OpenClaw v2026.4.5+ calls plugin register() per-agent-context (main, subagents, cron lanes) — not once at startup. Each call opens a new DB connection and runs migrations, causing "Migration failed: database is locked" storms on large databases. PR Martian-Engineering#288's deferred-init fix was merged but does not address this per-context re-registration. ## Solution ### Singleton DB + engine (critical fix) Uses globalThis + Symbol.for() singleton (same pattern as startup-banner-log.ts) keyed on normalized dbPath. When register() is called again with the same DB path, it skips init entirely and wires handlers to the existing waitForEngine/waitForDatabase closures via wirePluginHandlers(). gateway_stop clears the singleton so a fresh init occurs on restart. The shared state stores only the closures (not mutable copies of database/lcm locals), avoiding stale-reference bugs. ### Fallback provider config (additive) - Add fallbackProviders config field (env: LCM_FALLBACK_PROVIDERS, format: provider/model,provider/model) for explicit compaction summarization fallbacks - Append to existing 5-level candidate chain with dedup - Exponential backoff (500ms→8s) between candidate retries - PROVIDER FALLBACK / ALL PROVIDERS EXHAUSTED messages on stderr - Half-threshold early warning and CIRCUIT BREAKER OPEN/CLOSED messages with cooldown time - Startup banner for configured fallback providers
d14af62 to
4e923c8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/summarize.ts:1496
- The new "ALL PROVIDERS EXHAUSTED" block is effectively unreachable because the loop returns/throws inside the per-candidate try/catch on the final candidate. Also, despite the log claiming compaction is falling back to deterministic truncation, this path currently returns an empty string (or throws) rather than producing the deterministic fallback summary. Please refactor the control flow so exhaustion is actually detected and the function returns the deterministic truncation fallback (and/or emits this log) when all candidates fail.
console.error(
`[lcm] ALL PROVIDERS EXHAUSTED: ${resolvedCandidates.length} candidate(s) tried, none succeeded. Compaction falling back to deterministic truncation. Check provider keys and quotas.`,
);
if (lastAuthError) {
throw lastAuthError;
}
return "";
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const cooldownMin = Math.round(this.config.circuitBreakerCooldownMs / 60000); | ||
| console.error( | ||
| `[lcm] compaction circuit breaker OPEN: ${state.failures} consecutive auth failures for ${key}. Compaction halted. Will auto-retry after ${Math.round(this.config.circuitBreakerCooldownMs / 60000)}m or gateway restart.`, | ||
| `[lcm] CIRCUIT BREAKER OPEN: compaction disabled for ${key}. Auto-retry in ${cooldownMin}m. LCM is operating in degraded mode.`, | ||
| ); |
There was a problem hiding this comment.
cooldownMin is computed with Math.round(circuitBreakerCooldownMs / 60000), which can log Auto-retry in 0m for cooldowns under 30s (including tests/overrides). Consider using Math.ceil (or logging seconds when <60s) so the message never reports a zero-minute cooldown when the breaker is actually open.
Route terminal non-auth provider failures through the shared exhaustion handler so deterministic truncation actually runs, add regression coverage, and include a changeset for the runtime behavior fix. Regeneration-Prompt: | Address the PR review finding in the multi-provider summarizer fallback path. The existing code added an ALL PROVIDERS EXHAUSTED log after the candidate loop, but the loop always returned, continued, or threw before that block could execute. Preserve existing auth-failure behavior because LcmProviderAuthError is used intentionally by compaction and the circuit breaker, but make terminal non-auth failures fall through to one shared exhaustion path that logs clearly and returns buildDeterministicFallbackSummary instead of an empty string. Add a focused regression test that exhausts all resolved non-auth candidates and proves both the terminal log and deterministic fallback behavior. Add a patch changeset because this changes runtime behavior and logging for plugin summarization fallback.
|
Thank you! |
Summary
Prevents "Migration failed: database is locked" storms caused by OpenClaw v2026.4.5+ calling plugin
register()per-agent-context (main, subagents, cron lanes). Also adds explicit fallback provider configuration for compaction summarization.Follow-up to #288 — the deferred-init fix merged in #288 handles the two-process launchd race, but does not prevent the per-context re-registration that opens multiple DB connections and runs migrations concurrently within the same process.
Problem
On OpenClaw v2026.4.5, every subagent spawn calls
register()again. Each call:DatabaseSyncconnection to the samelcm.dbrunLcmMigrations()in theLcmContextEngineconstructorownsCompaction: false)Production logs show 478 re-registrations in a single gateway session, with repeated migration lock failures.
Solution
Part 1: Singleton DB + engine per dbPath
Uses
globalThis+Symbol.for()singleton (same pattern asstartup-banner-log.ts) keyed on normalizeddbPath. Whenregister()is called again with the same DB path, it skips init and wires handlers viawirePluginHandlers().The shared state stores the
waitForEngine/waitForDatabaseclosures (not mutable copies of locals), avoiding stale-reference bugs after deferred init resolves.Part 2: Fallback provider config
Explicit
fallbackProvidersconfig for compaction summarization with backoff and degradation logging.Changes by file
src/plugin/shared-init.tsSymbol.for(). StoreswaitForEngine/waitForDatabaseclosures andstoppedflag per normalized dbPath.src/plugin/index.tsregister()— reuse existing init or create fresh. ExtractwirePluginHandlers()for handler/tool/engine wiring.gateway_stopclears singleton. Fallback providers startup banner.src/db/connection.tsnormalizePath()for singleton key normalization.src/db/config.tsfallbackProviders: Array<{provider, model}>field. Parse from envLCM_FALLBACK_PROVIDERS(format:provider/model,provider/model) and plugin config array.openclaw.plugin.jsonfallbackProvidersto configSchema (required foradditionalProperties: false) and uiHints.src/summarize.ts[lcm] PROVIDER FALLBACK:andALL PROVIDERS EXHAUSTEDmessages on stderr.src/engine.tsCIRCUIT BREAKER OPENwith cooldown time.src/startup-banner-log.tsfallback-providersbanner key.test/plugin-config-registration.test.tsclearAllSharedInit()in afterEach.test/plugin-prompt-hook.test.tsclearAllSharedInit()in afterEach.test/summarize.test.tsPROVIDER FALLBACKmessage format (console.error instead of console.warn).Adversarial audit findings (addressed)
openclaw.plugin.jsonmissingfallbackProvidersin configSchema —additionalProperties: falsewould strip the fieldshared.lcmreference after deferred init resolves (from earlier version)console.warn→console.errormigration incomplete (some paths still warn)fallbackProvidersfield?? []in summarize.ts handles undefined safely; test uses type castTest plan
register()with same dbPath does NOT callcreateLcmDatabaseConnectiongateway_stop, nextregister()opens a fresh connection