Skip to content

fix: singleton DB init per dbPath + fallback provider config#302

Merged
jalehman merged 2 commits intoMartian-Engineering:mainfrom
electricsheephq:fix/singleton-db-init-fallback-providers
Apr 6, 2026
Merged

fix: singleton DB init per dbPath + fallback provider config#302
jalehman merged 2 commits intoMartian-Engineering:mainfrom
electricsheephq:fix/singleton-db-init-fallback-providers

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

@100yenadmin 100yenadmin commented Apr 6, 2026

Summary

Prevents "Migration failed: database is locked" storms caused by OpenClaw v2026.4.5+ calling plugin register() per-agent-context (main, subagents, cron lanes). Also adds explicit fallback provider configuration for compaction summarization.

Follow-up to #288 — the deferred-init fix merged in #288 handles the two-process launchd race, but does not prevent the per-context re-registration that opens multiple DB connections and runs migrations concurrently within the same process.

Problem

On OpenClaw v2026.4.5, every subagent spawn calls register() again. Each call:

  1. Opens a new DatabaseSync connection to the same lcm.db
  2. Runs runLcmMigrations() in the LcmContextEngine constructor
  3. Multiple concurrent migrations on a 1.9GB DB → "database is locked" errors
  4. Engine runs degraded (ownsCompaction: false)

Production logs show 478 re-registrations in a single gateway session, with repeated migration lock failures.

Solution

Part 1: Singleton DB + engine per dbPath

Uses globalThis + Symbol.for() singleton (same pattern as startup-banner-log.ts) keyed on normalized dbPath. When register() is called again with the same DB path, it skips init and wires handlers via wirePluginHandlers().

The shared state stores the waitForEngine/waitForDatabase closures (not mutable copies of locals), avoiding stale-reference bugs after deferred init resolves.

Part 2: Fallback provider config

Explicit fallbackProviders config for compaction summarization with backoff and degradation logging.

Changes by file

File Change
src/plugin/shared-init.ts New. Process-global singleton map via Symbol.for(). Stores waitForEngine/waitForDatabase closures and stopped flag per normalized dbPath.
src/plugin/index.ts Singleton check at top of register() — reuse existing init or create fresh. Extract wirePluginHandlers() for handler/tool/engine wiring. gateway_stop clears singleton. Fallback providers startup banner.
src/db/connection.ts Export normalizePath() for singleton key normalization.
src/db/config.ts Add fallbackProviders: Array<{provider, model}> field. Parse from env LCM_FALLBACK_PROVIDERS (format: provider/model,provider/model) and plugin config array.
openclaw.plugin.json Add fallbackProviders to configSchema (required for additionalProperties: false) and uiHints.
src/summarize.ts Append fallback providers to candidate chain. Exponential backoff (500ms→8s) between retries. [lcm] PROVIDER FALLBACK: and ALL PROVIDERS EXHAUSTED messages on stderr.
src/engine.ts Half-threshold early warning for circuit breaker. CIRCUIT BREAKER OPEN with cooldown time.
src/startup-banner-log.ts Add fallback-providers banner key.
test/plugin-config-registration.test.ts Tests: singleton reuse (no new DB connection on 2nd register), fresh connection after gateway_stop. clearAllSharedInit() in afterEach.
test/plugin-prompt-hook.test.ts clearAllSharedInit() in afterEach.
test/summarize.test.ts Update auth-fallback test to match new PROVIDER FALLBACK message format (console.error instead of console.warn).

Adversarial audit findings (addressed)

Finding Severity Resolution
openclaw.plugin.json missing fallbackProviders in configSchema — additionalProperties: false would strip the field CRITICAL Added to configSchema with JSON Schema validation
Stale shared.lcm reference after deferred init resolves (from earlier version) CRITICAL Fixed by storing closures, not mutable copies
console.warnconsole.error migration incomplete (some paths still warn) LOW Intentional — only fallback/exhaustion paths upgraded, diagnostic paths stay warn
Test mocks missing fallbackProviders field LOW Defensive ?? [] in summarize.ts handles undefined safely; test uses type cast

Test plan

  • 298 tests passing (294 non-registration + 4 registration including 2 new singleton tests)
  • Pre-existing plugin-config-registration timeouts match main (1.9GB prod DB, not introduced by this PR)
  • Singleton test: 2nd register() with same dbPath does NOT call createLcmDatabaseConnection
  • Post-stop test: after gateway_stop, next register() opens a fresh connection
  • Adversarial audit: configSchema validated, stale-reference bug prevented

@100yenadmin
Copy link
Copy Markdown
Contributor Author

100yenadmin commented Apr 6, 2026

Cause is upstream changes in latest release around launch agent + registering plugins @jalehman we found the same bug in numerous plugins being fired continuously for every session in memory.

Patch is two fold designed to prevent DB lockup and prevent provider call storms from LCM.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Lossless Claw (LCM) OpenClaw plugin against repeated register() calls within the same process by introducing a per-dbPath singleton init, preventing concurrent SQLite migrations/connection storms. It also adds explicit fallback provider configuration for compaction summarization, with improved fallback logging/backoff and circuit-breaker messaging.

Changes:

  • Add a process-global singleton map keyed by normalized dbPath to reuse a single DB connection + engine across repeated register() calls.
  • Introduce fallbackProviders configuration (env + plugin config) and extend summarization to try explicit fallback provider/model pairs with exponential backoff and clearer stderr logging.
  • Update circuit-breaker logs and add/adjust tests to cover singleton reuse and new fallback messaging.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/plugin/shared-init.ts New globalThis + Symbol.for() singleton store for shared init closures per normalized DB path.
src/plugin/index.ts Reuse existing shared init on repeated register(); extract wirePluginHandlers(); clear singleton on gateway_stop; log fallback providers on startup.
src/db/connection.ts Export normalizePath() for consistent singleton keying.
src/db/config.ts Add fallbackProviders to config and parse from LCM_FALLBACK_PROVIDERS / plugin config.
src/summarize.ts Append configured fallback providers into candidate resolution; add provider-fallback stderr logs and exponential backoff; log when all providers are exhausted.
src/engine.ts Add half-threshold early warning and revise circuit-breaker “OPEN” logging.
src/startup-banner-log.ts Add new startup banner key for fallback-providers.
test/plugin-config-registration.test.ts Add tests ensuring singleton reuse across repeated register() and fresh init after gateway_stop; clear shared init in teardown.
test/plugin-prompt-hook.test.ts Clear shared init in teardown to avoid cross-test leakage.
test/summarize.test.ts Update auth-fallback assertion to match new console.error “PROVIDER FALLBACK” messaging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/db/config.ts
@100yenadmin 100yenadmin force-pushed the fix/singleton-db-init-fallback-providers branch from 1b8f90e to d14af62 Compare April 6, 2026 21:02
@100yenadmin 100yenadmin requested a review from Copilot April 6, 2026 21:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/db/config.ts Outdated
Comment thread src/db/config.ts Outdated
Comment thread src/plugin/index.ts Outdated
Comment thread src/summarize.ts
## Problem

OpenClaw v2026.4.5+ calls plugin register() per-agent-context (main,
subagents, cron lanes) — not once at startup. Each call opens a new
DB connection and runs migrations, causing "Migration failed: database
is locked" storms on large databases. PR Martian-Engineering#288's deferred-init fix
was merged but does not address this per-context re-registration.

## Solution

### Singleton DB + engine (critical fix)

Uses globalThis + Symbol.for() singleton (same pattern as
startup-banner-log.ts) keyed on normalized dbPath. When register()
is called again with the same DB path, it skips init entirely and
wires handlers to the existing waitForEngine/waitForDatabase closures
via wirePluginHandlers(). gateway_stop clears the singleton so a
fresh init occurs on restart.

The shared state stores only the closures (not mutable copies of
database/lcm locals), avoiding stale-reference bugs.

### Fallback provider config (additive)

- Add fallbackProviders config field (env: LCM_FALLBACK_PROVIDERS,
  format: provider/model,provider/model) for explicit compaction
  summarization fallbacks
- Append to existing 5-level candidate chain with dedup
- Exponential backoff (500ms→8s) between candidate retries
- PROVIDER FALLBACK / ALL PROVIDERS EXHAUSTED messages on stderr
- Half-threshold early warning and CIRCUIT BREAKER OPEN/CLOSED
  messages with cooldown time
- Startup banner for configured fallback providers
@100yenadmin 100yenadmin force-pushed the fix/singleton-db-init-fallback-providers branch from d14af62 to 4e923c8 Compare April 6, 2026 21:24
@100yenadmin 100yenadmin requested a review from Copilot April 6, 2026 21:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/summarize.ts:1496

  • The new "ALL PROVIDERS EXHAUSTED" block is effectively unreachable because the loop returns/throws inside the per-candidate try/catch on the final candidate. Also, despite the log claiming compaction is falling back to deterministic truncation, this path currently returns an empty string (or throws) rather than producing the deterministic fallback summary. Please refactor the control flow so exhaustion is actually detected and the function returns the deterministic truncation fallback (and/or emits this log) when all candidates fail.

    console.error(
      `[lcm] ALL PROVIDERS EXHAUSTED: ${resolvedCandidates.length} candidate(s) tried, none succeeded. Compaction falling back to deterministic truncation. Check provider keys and quotas.`,
    );
    if (lastAuthError) {
      throw lastAuthError;
    }
    return "";

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/engine.ts
Comment on lines +1346 to 1349
const cooldownMin = Math.round(this.config.circuitBreakerCooldownMs / 60000);
console.error(
`[lcm] compaction circuit breaker OPEN: ${state.failures} consecutive auth failures for ${key}. Compaction halted. Will auto-retry after ${Math.round(this.config.circuitBreakerCooldownMs / 60000)}m or gateway restart.`,
`[lcm] CIRCUIT BREAKER OPEN: compaction disabled for ${key}. Auto-retry in ${cooldownMin}m. LCM is operating in degraded mode.`,
);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cooldownMin is computed with Math.round(circuitBreakerCooldownMs / 60000), which can log Auto-retry in 0m for cooldowns under 30s (including tests/overrides). Consider using Math.ceil (or logging seconds when <60s) so the message never reports a zero-minute cooldown when the breaker is actually open.

Copilot uses AI. Check for mistakes.
Route terminal non-auth provider failures through the shared exhaustion handler so deterministic truncation actually runs, add regression coverage, and include a changeset for the runtime behavior fix.

Regeneration-Prompt: |
  Address the PR review finding in the multi-provider summarizer fallback path. The existing code added an ALL PROVIDERS EXHAUSTED log after the candidate loop, but the loop always returned, continued, or threw before that block could execute. Preserve existing auth-failure behavior because LcmProviderAuthError is used intentionally by compaction and the circuit breaker, but make terminal non-auth failures fall through to one shared exhaustion path that logs clearly and returns buildDeterministicFallbackSummary instead of an empty string. Add a focused regression test that exhausts all resolved non-auth candidates and proves both the terminal log and deterministic fallback behavior. Add a patch changeset because this changes runtime behavior and logging for plugin summarization fallback.
@jalehman jalehman merged commit 558183d into Martian-Engineering:main Apr 6, 2026
1 check passed
@jalehman
Copy link
Copy Markdown
Contributor

jalehman commented Apr 6, 2026

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants