Problem
When ConfigWatcherService detects a config file change, it immediately requests a daemon restart. Active sessions that are mid-turn (LLM call in flight, tool execution in progress) lose all in-flight state because turns hadn't been committed to persistence yet.
Incident Evidence
Session D0AC6CKBK5K/1774021483.588239 (0.7.0) — see #321 for full timeline:
- Bot overwrote
netclaw.json at 15:48:22.488
ConfigWatcherService detected change at 15:48:22.990 (500ms later)
- Daemon restarted at 15:48:24
- Session rehydrated with zero history — 4 minutes of conversation lost
Current Behavior
ConfigWatcherService detects change → validate config → immediate restart
All in-flight session state (tool call chains, buffered messages, transient skill context) is lost.
Proposed Behavior
ConfigWatcherService detects change → validate config → signal sessions to passivate → grace period → restart → relaunch active sessions → inject restart system message
- Signal active sessions to passivate: Send a message to all active session actors telling them to stop accepting new turns, complete or abort in-flight work, save snapshots
- Grace period: Allow N seconds (configurable, e.g., 10-30s) for sessions to flush
- Track active sessions: Record which sessions were active pre-restart (session IDs, turn counts, subscriber info)
- Restart daemon
- Relaunch active sessions: After restart, automatically rehydrate previously-active sessions
- Inject system message: Add a system nudge like
[system] The daemon restarted due to a configuration change. Your session state has been recovered from the last checkpoint.
Relevant Code
src/Netclaw.Daemon/Services/ConfigWatcherService.cs — config change detection and restart
src/Netclaw.Actors/Sessions/LlmSessionActor.cs:226-237 — idle timeout passivation (pattern to follow)
src/Netclaw.Actors/Sessions/LlmSessionActor.cs:1594-1603 — PreRestart handler
Problem
When
ConfigWatcherServicedetects a config file change, it immediately requests a daemon restart. Active sessions that are mid-turn (LLM call in flight, tool execution in progress) lose all in-flight state because turns hadn't been committed to persistence yet.Incident Evidence
Session
D0AC6CKBK5K/1774021483.588239(0.7.0) — see #321 for full timeline:netclaw.jsonat 15:48:22.488ConfigWatcherServicedetected change at 15:48:22.990 (500ms later)Current Behavior
All in-flight session state (tool call chains, buffered messages, transient skill context) is lost.
Proposed Behavior
[system] The daemon restarted due to a configuration change. Your session state has been recovered from the last checkpoint.Relevant Code
src/Netclaw.Daemon/Services/ConfigWatcherService.cs— config change detection and restartsrc/Netclaw.Actors/Sessions/LlmSessionActor.cs:226-237— idle timeout passivation (pattern to follow)src/Netclaw.Actors/Sessions/LlmSessionActor.cs:1594-1603— PreRestart handler