Skip to content

[Bug]: Gateway self-induced hot-reload loop corrupts manifest.db on Linux VPS #67436

@leythlahcene-max

Description

@leythlahcene-max

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

On Linux VPS with systemd, gateway auto-provisions plugins on boot (brave, anthropic, openai, memory-core) by writing to openclaw.json via atomic rename, which triggers its own hot-reload watcher → SIGUSR1 every 3-6 min → kills Manifest mid-SQLite write → manifest.db corruption (database disk image is malformed).

Steps to reproduce

  1. Install OpenClaw 2026.4.14 via npm global on Ubuntu 24.04, systemd user service (root).
  2. Start openclaw-gateway.service.
  3. Run: inotifywait -m /root/.openclaw/openclaw.json
  4. Observe: within 60s, gateway writes openclaw.json via .tmp. atomic rename, adding plugins.allow entries (brave, anthropic, openai, memory-core) and plugins.entries.
  5. Gateway detects "config change detected (plugins.entries.brave.config.webSearch)" → schedules hot-reload.
  6. SIGUSR1 fires → gateway drains → restarts.
  7. Repeat from step 4 → infinite loop every 3-6 min.
  8. If Manifest (TypeORM/SQLite) is mid-write during SIGUSR1 → manifest.db corrupted.

Expected behavior

Gateway should not trigger hot-reload on self-induced config writes. The auto-provisioning of plugins (internal startup behavior) should be excluded from the hot-reload watcher, or writes should be fingerprinted to distinguish self-writes from user-writes.

Actual behavior

Gateway restarts every 3-6 min via SIGUSR1 triggered by its own openclaw.json writes. Confirmed by inotifywait: file written by gateway PID itself (.tmp. rename). Log shows: "[reload] config change detected (plugins.entries.brave.config.webSearch)" → SIGUSR1 → drain → restart. After 3-4 cycles, manifest.db shows "database disk image is malformed" — TypeORM/SQLite corrupted mid-write during SIGUSR1.

OpenClaw version

2026.4.14 (323493f)

Operating system

Ubuntu 24.04.4 LTS (VPS, systemd user service, running as root)

Install method

npm global

Model

manifest/auto (routes to openai-subscription GPT-5.4)

Provider / routing chain

openclaw → Manifest proxy (LiteLLM, port 2099) → openai-subscription (ChatGPT Plus OAuth)

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Affected: All Linux VPS users with systemd service + Manifest plugin enabled
Severity: High — blocks agents from responding (Discord silent), corrupts manifest.db
Frequency: Always — reproduces on every clean start within 3-6 min
Consequence: manifest.db corrupted every few hours → full service outage until DB deleted and recreated manually

Additional information

Observed consistently on v2026.3.24, v2026.3.28, and v2026.4.14 — bug predates this report.

Root cause confirmed via inotifywait: the .tmp. suffix on the temporary file directly identifies the gateway process as the writer. Two consecutive writes observed 18s apart, then SIGUSR1 fires.

Diff between consecutive openclaw.json snapshots shows the gateway adds:

  • plugins.allow: ["acpx"] → ["acpx", "anthropic", "openai", "brave", "memory-core"]
  • plugins.entries.anthropic: {enabled: true}
  • plugins.entries.openai: {enabled: true}
  • plugins.entries.brave: {enabled: true, config: {...webSearch...}}
  • plugins.entries.memory-core: {enabled: true}
  • meta.lastTouchedAt: bumped

Workaround: enable WAL mode on manifest.db (PRAGMA journal_mode=WAL) reduces corruption frequency but does not stop the reload loop. Schema migration writes (Discord streaming format) from v2026.4.14 appear to add extra writes on first boot after update.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions