Add headless prompt mode and source-generated tool schemas by Aaronontheweb · Pull Request #10 · netclaw-dev/netclaw

Aaronontheweb · 2026-02-23T00:54:52Z

Summary

Add -p/--prompt CLI flag for single-shot headless execution that streams tool calls, results, and assistant responses to stdout then exits — enables smoke-testing tool invocation against real LLMs without interactive TUI
Replace M.E.AI's AIFunctionFactory reflection-based tool schema generation with a Roslyn incremental source generator that produces JSON schemas at compile time, giving full control over schema format and argument normalization across providers
New Netclaw.Tools.Abstractions project: INetclawTool interface, NetclawTool<T> base class, [NetclawTool] attribute, ToolArgumentHelper
New Netclaw.Tools.Generators project: Roslyn incremental source generator emitting ParseArguments and JSON schemas from typed Params records
Migrate ShellTool, FileReadTool, FileWriteTool to the new [NetclawTool] pattern
Refactor ToolRegistry and DispatchingToolExecutor to use INetclawTool
Add ADR-001 documenting rationale for owning tool schemas

Test plan

dotnet build Netclaw.slnx — 0 warnings, 0 errors
dotnet test Netclaw.slnx — 110/110 tests passing
Headless smoke test: dotnet run --project src/Netclaw.App -- -p "list the files in /tmp" — LLM correctly called tools with PascalCase params from generated schema, tool executed, model responded

Add `-p`/`--prompt` CLI flag for single-shot headless execution that streams tool calls, results, and assistant responses to stdout then exits. This enables smoke-testing tool invocation against real LLMs without interactive TUI. Replace M.E.AI's AIFunctionFactory reflection-based tool schema generation with a Roslyn incremental source generator that produces JSON schemas at compile time. This gives us full control over the schema format (avoiding OllamaSharp nullable type incompatibilities) and argument normalization (handling JsonElement vs native CLR types from different providers). New projects: - Netclaw.Tools.Abstractions: INetclawTool interface, NetclawTool<T> base class, [NetclawTool] attribute, ToolArgumentHelper - Netclaw.Tools.Generators: Roslyn incremental source generator that emits ParseArguments and JSON schemas from typed Params records Migrated ShellTool, FileReadTool, FileWriteTool to the new pattern. Refactored ToolRegistry and DispatchingToolExecutor to use INetclawTool. Added ADR-001 documenting the rationale.

…ator Handle specific InvalidOperationException in process.Kill() catch block instead of bare catch (Slopwatch SW003). Add IsPackable=false to the source generator project since analyzer projects have no distributable NuGet content.

Add Debug.WriteLine to process.Kill catch block to satisfy Slopwatch empty catch detection. Use platform-appropriate commands in shell tests: cd instead of pwd on Windows, python one-liner instead of bash printf with brace expansion for output truncation test.

Resolves the 11 findings from the /code-review pass: #1 Multi-line secret redaction: per-line redaction in JobOutputLog misses secrets spanning lines (e.g. PEM blocks). Re-redact the assembled tail at every LLM-surface point (execution-actor completion, manager HandleQuery, NotifyLostJob) so multi-line secrets can't reach the model. #2 Journaled reap event (SessionBackgroundJobsReaped): reap marks were snapshot-only and lost on recovery when the passivation snapshot is skipped (parked approval), rehydrating killed jobs as 'running'. FinishJobReap now persists the reap; recovery replays it. Full serializer plumbing + round-trip test. netclaw-dev#3 Dispose the Process in BackgroundJobExecutionActor.PostStop — stops the kernel handle / wait-handle leak (amplified by the no-default-timeout). netclaw-dev#4 Audience-gate the [active-background-jobs] block (commands, rationales, and the output-log path) for Public, matching WorkingContext. netclaw-dev#5 JobOutputLog.ReadTail falls back to the rotated .1 file when the current log is momentarily absent mid-rotation, instead of returning an empty tail. netclaw-dev#6 A transient File.Move failure in Rotate() is non-fatal: capture continues on the current log and retries next threshold, rather than permanently going silent. netclaw-dev#7 Back WriteFailure with a volatile field (un-gated fast-path read crosses threads). netclaw-dev#8 Correlate reap Ask replies with an epoch so a late reply from a superseded passivation can't resolve a newer handshake. netclaw-dev#10 Centralize the reap-reply handler (CommandJobReapResolved) across all non-terminal phases so a future phase can't silently drop the reply. netclaw-dev#11 Apply(TurnRecorded) now delegates job dedup/prune to the single shared CompleteTurnBackgroundJobBookkeeping helper so replay and live paths can't drift. netclaw-dev#9 AutoFlush is kept (live monitoring requires per-line visibility; a write() to the page cache is cheap and a time-throttle risks an unflushed quiescent ready-line) — documented as a deliberate decision. Tests: +6 (reaped-event round-trip, ReadTail rotation fallback + rethrow, SessionBackgroundJobsReaped apply, Public/Personal active-jobs gating); updated RotationFailure test to the new non-fatal contract. Full Actors suite 2412 green x2; slopwatch + headers clean.

… kill timer, reap on passivation (#1405) * Background jobs as detached processes: stream logs live, no default kill timer, reap on passivation, Lost notifications A background job is now a detached process with no expectation of completion (OpenSpec: background-jobs-detached-process-redesign). Fixes the hung-session class where a dev server (jekyll serve / npm run dev) could never be used: both execution paths blocked on process exit. - Stream stdout/stderr to ~/.netclaw/jobs/{id}/output.log while the process runs (per-line secret redaction, 5MB single-slot rotation). The existing check_background_job tail query and file_read/grep monitoring now work mid-run; output survives daemon crashes. Completion tails read from disk. - Remove the silent default kill timer on background routing: omitted _timeout_seconds now means no timer (was: synchronous default, killing un-hinted jobs early). Submit ACK includes the output log path. - Reap on session passivation: KillJobsForSession handshake before the final snapshot; new Reaped status (distinct from Cancelled); no turn delivery on reap (would rehydrate the session being torn down); reaped entries surface exactly once in [active-background-jobs] on rehydration, then prune. - Wire up session-side job tracking (TrackBackgroundJob had no production caller — the active-jobs context block was always empty). - Daemon-restart reconciliation now delivers Lost notifications to owning sessions with the pre-crash log path. - Remove the vestigial pending-approval passivation deferral: approvals are journaled and the response path already rehydrates and resumes. - AGENTS.md template, netclaw-operations SKILL.md (v2.13.0), and the background-jobs runbook document the new lifecycle; eval suite gains a background-job lifecycle regression case. * Fix background-job lifecycle eval: multi-turn harness, pre-trusted verb, tightened assertion The new tool_background_job_lifecycle case scored 0/5 for instrumentation reasons, not model behavior (per the eval-debugging guidance): 1. run_case treats multiple prompts as alternate phrasings (pick_variant) — sequential conversations need run_multi_turn_case, which resumes one session and accumulates stdout across turns for the assertion. 2. Even then, every background submission died at the approval gate: the headless eval container has no approval requester and 'sleep' is not on the safe-command allowlist. Passing runs were vacuous (the model probed check_background_job with a made-up ID while flailing). The eval setup now pre-trusts the sleep verb via 'netclaw approvals trust-verb' against the bind-mounted tool-approvals.json before the container starts, so the case exercises the real lifecycle: submit -> job id -> status -> cancel. 3. The assertion now requires the actual _background":true submission, not just any shell_execute call. Result: 5/5, with transcripts showing the genuine flow (job id returned, ACK steering to the streaming log path, live status with elapsed time, cancel confirmed). * Fix CI: SW003 empty-catch marker, parallel-test isolation for real-process job tests Two PR CI failures: 1. Slopwatch SW003 — the write-failure path in JobOutputLog had an empty inner catch with the rationale as a body comment instead of the repo's 'catch // slopwatch-ignore: SW003 <reason>' marker convention. (Passed locally because slopwatch 0.4.1 only scans the git diff vs local HEAD; CI's PR-merge scans the whole new file.) 2. Test-ubuntu-latest flake — KillJobsForSession_ReapsOwnedJobs and BackgroundJob_Completes_And_DeliversResult_ViaGateway intermittently failed with the owning manager's freshly-created jobs showing 'Lost'. Root cause (reproduced reliably by running the Jobs test classes together): under heavy parallel load, concurrent process/FS pressure makes a manager's message handler throw transiently, the actor restarts, and startup reconciliation correctly marks its in-flight jobs Lost — a spurious restart to induce in a unit test. Fix: serialize the three real-process-spawning job test classes via a DisableParallelization collection (repo's established pattern) so they don't mutually starve. Verified: full assembly 4/4 green, the prior ~Jobs repro 3/3 green. Also register TimeProvider in LlmSessionTestBase to mirror production DI (Daemon Program.cs) — WithNetclawActors() constructs the background-job and reminder managers via the DI resolver, which need it; without it they died with ActorInitializationException at startup, adding restart churn. * Address code-review findings on background-jobs feature Resolves the 11 findings from the /code-review pass: #1 Multi-line secret redaction: per-line redaction in JobOutputLog misses secrets spanning lines (e.g. PEM blocks). Re-redact the assembled tail at every LLM-surface point (execution-actor completion, manager HandleQuery, NotifyLostJob) so multi-line secrets can't reach the model. #2 Journaled reap event (SessionBackgroundJobsReaped): reap marks were snapshot-only and lost on recovery when the passivation snapshot is skipped (parked approval), rehydrating killed jobs as 'running'. FinishJobReap now persists the reap; recovery replays it. Full serializer plumbing + round-trip test. #3 Dispose the Process in BackgroundJobExecutionActor.PostStop — stops the kernel handle / wait-handle leak (amplified by the no-default-timeout). #4 Audience-gate the [active-background-jobs] block (commands, rationales, and the output-log path) for Public, matching WorkingContext. #5 JobOutputLog.ReadTail falls back to the rotated .1 file when the current log is momentarily absent mid-rotation, instead of returning an empty tail. #6 A transient File.Move failure in Rotate() is non-fatal: capture continues on the current log and retries next threshold, rather than permanently going silent. #7 Back WriteFailure with a volatile field (un-gated fast-path read crosses threads). #8 Correlate reap Ask replies with an epoch so a late reply from a superseded passivation can't resolve a newer handshake. #10 Centralize the reap-reply handler (CommandJobReapResolved) across all non-terminal phases so a future phase can't silently drop the reply. #11 Apply(TurnRecorded) now delegates job dedup/prune to the single shared CompleteTurnBackgroundJobBookkeeping helper so replay and live paths can't drift. #9 AutoFlush is kept (live monitoring requires per-line visibility; a write() to the page cache is cheap and a time-throttle risks an unflushed quiescent ready-line) — documented as a deliberate decision. Tests: +6 (reaped-event round-trip, ReadTail rotation fallback + rethrow, SessionBackgroundJobsReaped apply, Public/Personal active-jobs gating); updated RotationFailure test to the new non-fatal contract. Full Actors suite 2412 green x2; slopwatch + headers clean. * Fix racy ReminderManagerActorTests.Startup_emits_alert_for_legacy_reminder_missing_trust_fields Root cause (per akka-net + dotnet-concurrency analysis): the legacy-schema alert is emitted synchronously inside the actor's PreStart, and the test waited for it with a fixed 5s AwaitAssertAsync poll. Under heavy parallel CI load the shared ThreadPool is saturated (many TestKit ActorSystems, WithSerializationVerification overhead), so the actor's PreStart can be scheduled later than the 5s budget and the poll gives up with an empty sink. Not a logic/visibility bug — the sink is lock-guarded and the store records the rejection synchronously in its constructor. Fix: await a deterministic readiness signal instead of polling a wall clock. An actor processes mailbox messages only after PreStart completes, so a successful Ask<ReminderHealthResponse>(GetReminderHealthQuery) reply guarantees the emit has run. This is the same readiness pattern already used elsewhere in this test file; the generous Ask timeout absorbs scheduling latency and returns as soon as the actor is ready (no wasted time in the common case). No existing GitHub issue covers this test. Does not reproduce locally even at full-assembly parallelism (CI-runner-only starvation).

Aaronontheweb added 3 commits February 22, 2026 18:52

Aaronontheweb merged commit 14af6ba into dev Feb 23, 2026
3 checks passed

Aaronontheweb deleted the feature/headless-mode-and-tool-source-generator branch February 23, 2026 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add headless prompt mode and source-generated tool schemas#10

Add headless prompt mode and source-generated tool schemas#10
Aaronontheweb merged 3 commits into
devfrom
feature/headless-mode-and-tool-source-generator

Aaronontheweb commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented Feb 23, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant