Background jobs as detached processes: live log streaming, no default kill timer, reap on passivation by Aaronontheweb · Pull Request #1405 · netclaw-dev/netclaw

Aaronontheweb · 2026-06-12T20:53:00Z

Why

A production session hung trying to launch a Jekyll dev server to validate website changes: foreground shell_execute blocks until process exit, and background jobs blocked on EOF + WaitForExitAsync before reporting — so a process that runs indefinitely (dev server, watcher) could never be used by the agent. Investigation surfaced three more defects: a silent default kill timer on background routing (un-hinted jobs died early, and _timeout_seconds: 0 was normalized away — an un-timered job was unreachable by any input), the output log written only at process exit (the documented file_read/grep monitoring path was dead mid-run), and Lost jobs being silent on daemon restart.

OpenSpec change: background-jobs-detached-process-redesign (proposal, design, spec deltas for background-job-execution, tool-call-metadata, session-resume).

What changed

A background job is now a detached process with no expectation of completion. One unified path — no "server mode", no new tool schema.

Streaming output: new JobOutputLog pumps stdout/stderr to ~/.netclaw/jobs/{id}/output.log while the process runs — per-line secret redaction, 5 MB single-slot rotation (output.log + output.1.log), eager file creation so the path in the submit ACK is readable from t0. check_background_job tail queries switch to bounded seek-from-end. Output survives daemon crashes.
No default kill timer (BREAKING): omitted _timeout_seconds = no timer; a positive hint arms one. Protocol defaults (600s) removed from StartBackgroundJob/BackgroundJobDefinition.
Reap on passivation (BREAKING): passivating sessions send KillJobsForSession and await the ack before the final snapshot. New Reaped status (distinct from Cancelled); reaped jobs produce no delivery turn (would rehydrate the session being torn down) — they surface exactly once as status: reaped in [active-background-jobs] on next rehydration, then prune after the next completed turn. Handshake is bounded (5s) and fails loud-but-proceeds; kill is idempotent.
Latent drift fixed: SessionState.TrackBackgroundJob had no production caller — jobs were never tracked, so the active-jobs context block was always empty. Started jobs now flow back via ToolCallResult/ToolExecutionCompleted and the live turn-completion paths do the same bookkeeping as event replay (a live-path prune bug the integration test caught).
Lost notifications: reconcile delivers the standard termination turn (status, pre-crash log path) to owning sessions. Bounded by design — passivated sessions have no live jobs.
Approval deferral removed: sessions passivate with journaled approvals outstanding; the approval response rehydrates and re-drives the parked batch (existing session-resume machinery, now exercised routinely).

Docs / skills / evals

docs/runbooks/background-jobs.md rewritten for the new lifecycle (incl. termination table).
AGENTS.md identity template + netclaw-operations SKILL.md (v2.13.0) updated — no tool-schema changes, nothing new for the model to learn.
New eval regression case: background job submitted → monitored → cancelled (tool_background_job_lifecycle).

Testing

2406 tests green (Netclaw.Actors.Tests), including new coverage: JobOutputLogTests (streaming, rotation, redaction, write-failure drain), execution-actor mid-run observability, manager reap/no-delivery/Lost-notification tests, SessionState reap round-trip + prune, end-to-end reap-on-passivation handshake (incl. ack-timeout path), passivate-with-pending-approval resume.
dotnet slopwatch analyze: 0 issues. File headers verified.
⚠️ Eval suite run (./evals/run-evals.sh) still pending — the first run was invalidated by an environment hiccup; will re-run and report results on this PR before marking ready for review.
⚠️ Known flake to watch in CI: two process-spawning Jobs tests each failed once under cold parallel-suite load (fork pressure); a pre-reap Running guard was added for clearer diagnostics.

Deliberate trade-off

Jobs no longer outlive passivation — the long-build-while-idle use case is consciously traded away (product decision). The skill documents alternatives: check-back reminders keep the session warm; scheduled tasks for truly detached work.

Aaronontheweb · 2026-06-12T23:03:52Z

Eval suite results

Full suite against Qwen/Qwen3.6-35B-A3B-FP8 (spark2, openai-compatible): 49/54 cases (90.7%).

Green across Identity, Skill Activation (10/10 — the AGENTS.md/SKILL.md rewrites did not regress skill loading or tool steering), Grounding, Autonomy, Subagents, and Multi-Turn Conversation (7/7).

New regression case: `tool_background_job_lifecycle` — now 5/5

Initially 0/5, root-caused to instrumentation (per the repo's eval-debugging guidance — not the model):

Wrong harness primitive: the case used run_case, which treats multiple prompts as alternate phrasings (pick_variant) — sequential conversations need run_multi_turn_case. Each run saw only half the conversation.
Approval wall: even multi-turn, every background submission died at the approval gate — the headless container has no approval requester and sleep is not on the safe-command allowlist. The eval setup now pre-trusts the verb (netclaw approvals trust-verb sleep) against the bind-mounted tool-approvals.json, so the case exercises the real lifecycle.
Assertion tightened to require the actual _background":true submission, not just any shell_execute call.

Transcript from the passing run — exactly the flow the original hung session couldn't perform:

[tool:call] shell_execute({"Command":"sleep 120",...,"_background":true})
[tool:result] → Background job 04a28fe50b16 submitted. Output streams to
  ~/.netclaw/jobs/04a28fe50b16/output.log while the job runs — file_read/grep it to monitor.
[tool:call] check_background_job({"JobId":"04a28fe50b16"})
[tool:result] → Job 04a28fe50b16: running (5.2s elapsed)
[tool:call] check_background_job({"Cancel":true,"JobId":"04a28fe50b16"})
[tool:result] → Cancellation request sent for job 04a28fe50b16.
[tool:call] check_background_job({"JobId":"04a28fe50b16"})
[tool:result] → Job 04a28fe50b16: cancelled (15.3s elapsed)

Remaining failures — none owned by this PR

Case	Score	Assessment
`memory_identity_preference_routing`	0/5	Memory routing — untouched by this diff; no archived baseline exists (all prior runs were case-filtered). Needs separate investigation.
`skill_memory_knowledge`	3/5	`netclaw-memory` skill content — unchanged in this PR.
`approval_recovery_hint`	2/5	cwd-safe-spaces recovery hint — untouched surface.
`complex_write_and_run`	3/5	Known variance (prior runs: 0/1, 5/5, 5/5).

This was also the first unfiltered full-suite run in the archive, so it doubles as a baseline for future runs.

All Definition-of-Done items are now complete: behavior matches the OpenSpec change (background-jobs-detached-process-redesign, 4/4 artifacts, validates), 2406 tests green, slopwatch clean, headers verified, runbook + skills updated, eval suite run with the new regression case passing.

…ill timer, reap on passivation, Lost notifications A background job is now a detached process with no expectation of completion (OpenSpec: background-jobs-detached-process-redesign). Fixes the hung-session class where a dev server (jekyll serve / npm run dev) could never be used: both execution paths blocked on process exit. - Stream stdout/stderr to ~/.netclaw/jobs/{id}/output.log while the process runs (per-line secret redaction, 5MB single-slot rotation). The existing check_background_job tail query and file_read/grep monitoring now work mid-run; output survives daemon crashes. Completion tails read from disk. - Remove the silent default kill timer on background routing: omitted _timeout_seconds now means no timer (was: synchronous default, killing un-hinted jobs early). Submit ACK includes the output log path. - Reap on session passivation: KillJobsForSession handshake before the final snapshot; new Reaped status (distinct from Cancelled); no turn delivery on reap (would rehydrate the session being torn down); reaped entries surface exactly once in [active-background-jobs] on rehydration, then prune. - Wire up session-side job tracking (TrackBackgroundJob had no production caller — the active-jobs context block was always empty). - Daemon-restart reconciliation now delivers Lost notifications to owning sessions with the pre-crash log path. - Remove the vestigial pending-approval passivation deferral: approvals are journaled and the response path already rehydrates and resumes. - AGENTS.md template, netclaw-operations SKILL.md (v2.13.0), and the background-jobs runbook document the new lifecycle; eval suite gains a background-job lifecycle regression case.

…rb, tightened assertion The new tool_background_job_lifecycle case scored 0/5 for instrumentation reasons, not model behavior (per the eval-debugging guidance): 1. run_case treats multiple prompts as alternate phrasings (pick_variant) — sequential conversations need run_multi_turn_case, which resumes one session and accumulates stdout across turns for the assertion. 2. Even then, every background submission died at the approval gate: the headless eval container has no approval requester and 'sleep' is not on the safe-command allowlist. Passing runs were vacuous (the model probed check_background_job with a made-up ID while flailing). The eval setup now pre-trusts the sleep verb via 'netclaw approvals trust-verb' against the bind-mounted tool-approvals.json before the container starts, so the case exercises the real lifecycle: submit -> job id -> status -> cancel. 3. The assertion now requires the actual _background":true submission, not just any shell_execute call. Result: 5/5, with transcripts showing the genuine flow (job id returned, ACK steering to the streaming log path, live status with elapsed time, cancel confirmed).

…ocess job tests Two PR CI failures: 1. Slopwatch SW003 — the write-failure path in JobOutputLog had an empty inner catch with the rationale as a body comment instead of the repo's 'catch // slopwatch-ignore: SW003 <reason>' marker convention. (Passed locally because slopwatch 0.4.1 only scans the git diff vs local HEAD; CI's PR-merge scans the whole new file.) 2. Test-ubuntu-latest flake — KillJobsForSession_ReapsOwnedJobs and BackgroundJob_Completes_And_DeliversResult_ViaGateway intermittently failed with the owning manager's freshly-created jobs showing 'Lost'. Root cause (reproduced reliably by running the Jobs test classes together): under heavy parallel load, concurrent process/FS pressure makes a manager's message handler throw transiently, the actor restarts, and startup reconciliation correctly marks its in-flight jobs Lost — a spurious restart to induce in a unit test. Fix: serialize the three real-process-spawning job test classes via a DisableParallelization collection (repo's established pattern) so they don't mutually starve. Verified: full assembly 4/4 green, the prior ~Jobs repro 3/3 green. Also register TimeProvider in LlmSessionTestBase to mirror production DI (Daemon Program.cs) — WithNetclawActors() constructs the background-job and reminder managers via the DI resolver, which need it; without it they died with ActorInitializationException at startup, adding restart churn.

Resolves the 11 findings from the /code-review pass: #1 Multi-line secret redaction: per-line redaction in JobOutputLog misses secrets spanning lines (e.g. PEM blocks). Re-redact the assembled tail at every LLM-surface point (execution-actor completion, manager HandleQuery, NotifyLostJob) so multi-line secrets can't reach the model. #2 Journaled reap event (SessionBackgroundJobsReaped): reap marks were snapshot-only and lost on recovery when the passivation snapshot is skipped (parked approval), rehydrating killed jobs as 'running'. FinishJobReap now persists the reap; recovery replays it. Full serializer plumbing + round-trip test. netclaw-dev#3 Dispose the Process in BackgroundJobExecutionActor.PostStop — stops the kernel handle / wait-handle leak (amplified by the no-default-timeout). netclaw-dev#4 Audience-gate the [active-background-jobs] block (commands, rationales, and the output-log path) for Public, matching WorkingContext. netclaw-dev#5 JobOutputLog.ReadTail falls back to the rotated .1 file when the current log is momentarily absent mid-rotation, instead of returning an empty tail. netclaw-dev#6 A transient File.Move failure in Rotate() is non-fatal: capture continues on the current log and retries next threshold, rather than permanently going silent. netclaw-dev#7 Back WriteFailure with a volatile field (un-gated fast-path read crosses threads). netclaw-dev#8 Correlate reap Ask replies with an epoch so a late reply from a superseded passivation can't resolve a newer handshake. netclaw-dev#10 Centralize the reap-reply handler (CommandJobReapResolved) across all non-terminal phases so a future phase can't silently drop the reply. netclaw-dev#11 Apply(TurnRecorded) now delegates job dedup/prune to the single shared CompleteTurnBackgroundJobBookkeeping helper so replay and live paths can't drift. netclaw-dev#9 AutoFlush is kept (live monitoring requires per-line visibility; a write() to the page cache is cheap and a time-throttle risks an unflushed quiescent ready-line) — documented as a deliberate decision. Tests: +6 (reaped-event round-trip, ReadTail rotation fallback + rethrow, SessionBackgroundJobsReaped apply, Public/Personal active-jobs gating); updated RotationFailure test to the new non-fatal contract. Full Actors suite 2412 green x2; slopwatch + headers clean.

…inder_missing_trust_fields Root cause (per akka-net + dotnet-concurrency analysis): the legacy-schema alert is emitted synchronously inside the actor's PreStart, and the test waited for it with a fixed 5s AwaitAssertAsync poll. Under heavy parallel CI load the shared ThreadPool is saturated (many TestKit ActorSystems, WithSerializationVerification overhead), so the actor's PreStart can be scheduled later than the 5s budget and the poll gives up with an empty sink. Not a logic/visibility bug — the sink is lock-guarded and the store records the rejection synchronously in its constructor. Fix: await a deterministic readiness signal instead of polling a wall clock. An actor processes mailbox messages only after PreStart completes, so a successful Ask<ReminderHealthResponse>(GetReminderHealthQuery) reply guarantees the emit has run. This is the same readiness pattern already used elsewhere in this test file; the generous Ask timeout absorbs scheduling latency and returns as soon as the actor is ready (no wasted time in the common case). No existing GitHub issue covers this test. Does not reproduce locally even at full-assembly parallelism (CI-runner-only starvation).

Aaronontheweb

LGTM

Aaronontheweb · 2026-06-15T21:04:42Z

Aaronontheweb · 2026-06-15T21:05:09Z

+    public async Task RunningJob_OutputIsObservableOnDiskBeforeExit()
+    {
+        // The detached-process contract: a job that never exits (dev server)
+        // must still have its output readable from the log while it runs.


Aaronontheweb · 2026-06-15T21:08:37Z

+            return Task.CompletedTask;
+        }, duration: TimeSpan.FromSeconds(10), cancellationToken: TestContext.Current.CancellationToken);
+
+        var ack = await manager.Ask<SessionJobsReaped>(


LGTM - kills the jobs on purpose

Aaronontheweb · 2026-06-15T21:08:50Z

+            return Task.CompletedTask;
+        }, duration: TimeSpan.FromSeconds(10), cancellationToken: TestContext.Current.CancellationToken);
+
+        await gatewayProbe.ExpectNoMsgAsync(


Aaronontheweb · 2026-06-15T21:24:08Z

+    }
+
+    [Fact]
+    public async Task WriteLine_RedactsSecretsPerLine()


Aaronontheweb added 3 commits June 15, 2026 18:14

Aaronontheweb force-pushed the claude-wt-hung-session-external-shell branch from 5f2358a to f3e0096 Compare June 15, 2026 18:16

Aaronontheweb added 2 commits June 15, 2026 20:04

Merge branch 'dev' into claude-wt-hung-session-external-shell

593759c

Aaronontheweb added the tools Issues related to agent tools: file_read, web_search, shell_execute, image processing, etc. label Jun 15, 2026

Aaronontheweb marked this pull request as ready for review June 15, 2026 20:24

Aaronontheweb mentioned this pull request Jun 15, 2026

Flaky tests: actor-startup side effects awaited on fixed wall-clock budgets instead of deterministic signals (ThreadPool starvation under parallel CI) #1409

Closed

Aaronontheweb commented Jun 15, 2026

View reviewed changes

Aaronontheweb merged commit db98c15 into netclaw-dev:dev Jun 15, 2026
15 checks passed

Aaronontheweb mentioned this pull request Jun 16, 2026

Prepare release 0.24.0-beta.5 #1415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Background jobs as detached processes: live log streaming, no default kill timer, reap on passivation#1405

Background jobs as detached processes: live log streaming, no default kill timer, reap on passivation#1405
Aaronontheweb merged 6 commits into
netclaw-dev:devfrom
Aaronontheweb:claude-wt-hung-session-external-shell

Aaronontheweb commented Jun 12, 2026

Uh oh!

Aaronontheweb commented Jun 12, 2026

Uh oh!

Aaronontheweb left a comment

Uh oh!

Aaronontheweb Jun 15, 2026

Uh oh!

Aaronontheweb Jun 15, 2026

Uh oh!

Aaronontheweb Jun 15, 2026

Uh oh!

Aaronontheweb Jun 15, 2026

Uh oh!

Aaronontheweb Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented Jun 12, 2026

Why

What changed

Docs / skills / evals

Testing

Deliberate trade-off

Uh oh!

Aaronontheweb commented Jun 12, 2026

Eval suite results

New regression case: tool_background_job_lifecycle — now 5/5

Remaining failures — none owned by this PR

Uh oh!

Aaronontheweb left a comment

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New regression case: `tool_background_job_lifecycle` — now 5/5