Skip to content

fix(gateway): debounce restart with single-flight queue#248

Merged
1186258278 merged 1 commit intomainfrom
fix/p0-gateway-throttle
Apr 24, 2026
Merged

fix(gateway): debounce restart with single-flight queue#248
1186258278 merged 1 commit intomainfrom
fix/p0-gateway-throttle

Conversation

@1186258278
Copy link
Copy Markdown
Contributor

Problem

Root cause for #243 / #244 / #240: when users edit models on the config page, api.restartGateway() is called with only a 300ms debounce. Fast consecutive edits stack up restart calls, which leads to:

Fix — Layer A + B

Layer A (frontend, src/lib/gateway-restart-queue.js + src/pages/models.js)

  • New scheduleGatewayRestart with 3s debounce + single-flight lock
  • doAutoSave now writes config immediately but schedules restart via queue
  • User sees a single Apply now toast button instead of a restart storm
  • Event subscription pattern for unified success/failure UI

Layer B (backend, config.rs + dev-api.js)

  • Wrap restart_gateway / reload_gateway with tokio::sync::Mutex (single-flight)
  • 2s cooldown prevents bypass attacks (programmatic request bursts)
  • Web mode (dev-api.js) mirrors the same guard using in-flight promise reuse

Config

  • Cargo.toml: add tokio[sync] feature
  • src/locales/modules/models.js: new i18n keys configQueued + applyNow

Effects

Scenario Before After
10 rapid edits within 3s 10+ restarts + races 1 restart
Bypass throttle (concurrent requests) Backend parallel spawn → zombies Backend serializes + 2s cooldown
Failed restart retry storm Cascading retries Timestamp-guarded idle gap

Verification

  • npm run build passes
  • cargo check passes
  • cargo fmt --check passes
  • cargo clippy -D warnings passes
  • Manual test on macOS / Windows / Linux (pending)

Out of scope (follow-up PRs)

Layers C / D / E from the design doc will land separately:

  • C: conservative Windows zombie cleanup with /health retries
  • D: Linux user-systemd service preference
  • E: listener PID as truth + descendant PID ownership check

Refs

Closes #243 (pending verification)
Fixes part of #244 (Hidden-start repeats — reduced via restart frequency cap)
Fixes #240 (auto-restart failures on Ubuntu — reduced restart pressure)

Root cause for #243 / #244 / #240: model edits trigger
api.restartGateway() with only 300ms debounce. Fast consecutive
edits stack up restart calls, creating zombie Gateway processes,
failed restarts, and CPU fan spikes.

Layer A (frontend):
- New src/lib/gateway-restart-queue.js: 3s debounce + single-flight
  lock + reschedule on in-flight request
- Refactor src/pages/models.js doAutoSave: write config immediately,
  schedule restart via queue with 'Apply now' toast button
- Subscribe to queue state for unified success/failure toast
- Add i18n: models.configQueued, models.applyNow

Layer B (backend):
- src-tauri/src/commands/config.rs: wrap restart_gateway /
  reload_gateway with tokio::sync::Mutex + 2s cooldown
- Cargo.toml: add tokio 'sync' feature
- scripts/dev-api.js: same guard for Web mode (inflight promise
  reuse + 2s cooldown)

Effects:
- 10 rapid edits within 3s -> 1 restart (was 10+ with races)
- Backend serializes concurrent restart calls, no zombie spawns
- User sees single 'Apply now' toast instead of restart storm

Refs #243 #244 #240
@1186258278 1186258278 merged commit 5235853 into main Apr 24, 2026
3 checks passed
@1186258278 1186258278 deleted the fix/p0-gateway-throttle branch April 24, 2026 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant