Skip to content

fix(control,cli): survive SSH drops with mid-turn autosave and SIGHUP save#3943

Merged
esengine merged 1 commit into
main-v2from
fix/3772-ssh-session-loss
Jun 11, 2026
Merged

fix(control,cli): survive SSH drops with mid-turn autosave and SIGHUP save#3943
esengine merged 1 commit into
main-v2from
fix/3772-ssh-session-loss

Conversation

@esengine

Copy link
Copy Markdown
Owner

Root cause (two gaps)

  1. No mid-turn persistence. The session saved only via the defer snapshotActivityIfChanged at turn end. The SSH workflow is exactly the one that breaks this: kick off a long autonomous turn, disconnect — the process dies mid-turn and everything since the previous turn is gone, which is why resume offered an old conversation.
  2. The chat TUI caught no signals. tea.NewProgram(m) ran bare; only run/serve had NotifyContext(os.Interrupt). An SSH drop delivers SIGHUP and the process died on the spot.

Fix

  • runGuarded now starts a per-turn autosave ticker: while a turn runs, the session snapshots every 30s. Session.Save copies under the session lock and lands via atomic tmp+rename, so racing the turn's appends can neither tear the read nor corrupt the file; a snapshot that ends on a dangling tool call is repaired by the existing resume-side pairing sanitizer. Worst case after a hard kill is now ~30s of one turn, not the whole turn. (This also narrows the desktop force-quit data-loss window — same mechanism.)
  • The chat TUI registers SIGHUP/SIGTERM: persist the conversation immediately, then p.Quit() so the normal close path runs.
  • run cancels on SIGTERM/SIGHUP too (its in-flight turn then saves via the turn-end defer); serve adds SIGTERM, matching what its own comment already claimed.

Interval is a package var; the new test shrinks it and asserts the session file exists while the turn is still blocked. The auto-resume-to-last-session ask from the thread is a separate UX feature and not covered here.

Closes #3772

@esengine esengine requested a review from SivanCola as a code owner June 11, 2026 04:23
@github-actions github-actions Bot added tui Terminal UI / CLI (internal/cli, internal/control) agent Core agent loop (internal/agent, internal/control) v2 Go rewrite (1.x) — main-v2 branch, active development labels Jun 11, 2026
@esengine esengine force-pushed the fix/3772-ssh-session-loss branch from 6b47fb9 to b34723a Compare June 11, 2026 04:29
… save

The session only persisted at turn end, so killing the process during
a long agent turn lost the whole turn, and the chat TUI caught no
signal at all - an SSH disconnect (SIGHUP) died without saving, which
is why resume offered a stale conversation (#3772).

- snapshot every 30s while a turn runs (atomic tmp+rename write, reads
  copy under the session lock, so racing the turn is safe)
- chat TUI: SIGHUP/SIGTERM persist the conversation, then quit through
  the normal close path
- run/serve contexts also cancel on SIGTERM (and SIGHUP for run)

Closes #3772
@esengine esengine force-pushed the fix/3772-ssh-session-loss branch from b34723a to f7bd990 Compare June 11, 2026 04:35
@esengine esengine merged commit 80fc50d into main-v2 Jun 11, 2026
13 checks passed
@esengine esengine deleted the fix/3772-ssh-session-loss branch June 11, 2026 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Core agent loop (internal/agent, internal/control) tui Terminal UI / CLI (internal/cli, internal/control) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: ssh断开后,会话丢失,即使resume也不是中途中断的会话

1 participant