feat(subagents): single-file markdown format with contextual prompt#652
Merged
Conversation
Adopt the markdown-with-YAML-frontmatter shape used by Claude Code, OpenCode, and Netclaw's own skill system. Each subagent is now a single .md file under ~/.netclaw/agents/ where the frontmatter carries metadata (name, description, tools, modelRole, timeoutSeconds, visibility, emitStructuredFindings) and the body is the system prompt verbatim. Replaces the JSON + .md sidecar pair that was the only two-file artifact type left in Netclaw. Add an optional Context parameter to spawn_agent so the frontline LLM can pass per-invocation background (workspace details, the user's broader goal, facts the subagent would otherwise have to rediscover) without editing the agent file. Context is prefixed onto the subagent's first user message as a separate "Context:" block, keeping the agent's static system prompt verbatim and reproducible across calls. When Context is null the protocol is byte-identical to the pre-change shape. Enrich the [available-subagents] discovery context layer to emit each agent's full description, tool list, and a usage example that includes the new Context field — today's one-line form did not teach the model how or when to delegate. Key changes: - Add YamlDotNet PackageReference to Netclaw.Configuration - New SubAgentMarkdownParser with ExtractFrontmatter/ExtractBody matching the SkillScanner pattern; SubAgentFrontmatter DTO maps 1:1 to SubAgentProfile - Rewrite FileSubAgentDefinitionLoader to scan *.md, fail loud on malformed frontmatter, missing required fields (name/description/tools/body), unknown tool names for user-facing agents, duplicate names across files - Regenerate the three init-wizard seeds (research-assistant, code-analyst, summarizer) in the new format - Thread an optional runtimeContext parameter through SpawnAgentTool.Params → SubAgentSpawner.SpawnAsync → RunSubAgent → SubAgentActor initial user message - BuildUserMessage helper composes "Context:\n...\n\nTask:\n..." when context is present, raw task otherwise - Enriched discovery layer produces per-agent description/tools/timeout plus a "how to delegate" block with the context field example Tests: - 13 loader tests covering full frontmatter parsing, required-field failures, duplicate-name rejection, disallowed-tool rejection, visibility variants - 5 new SubAgentActor tests: BuildUserMessage unit cases (null/whitespace/ populated) + end-to-end RuntimeContext-is-prefixed + null-context-leaves- raw-task; captures FakeChatClient.LastReceivedMessages for assertion - All affected suites green: Configuration.Tests 189/189, Actors.Tests SubAgent+Session+ToolIndexUpdater filter 307/307, Daemon.Tests 436/436 Zero-migration observation: the three stock defaults are the only files that have ever existed in ~/.netclaw/agents/ on a fresh install, so the JSON → markdown migration carries no user-content risk. Closes #647
Aaronontheweb
added a commit
that referenced
this pull request
Apr 14, 2026
The loader test suite from #652 had nine near-identical tests that all reduced to `Assert.Empty(results)` after writing one malformed file. If the loader were rewritten to unconditionally return an empty list, all nine would still pass — they looked like they pinned nine distinct validation paths but actually pinned zero and never verified the "fail loud" contract (no assertion that a warning was logged, no check that the scanner actually reached the file). Replace with one consolidated test that drops a valid agent alongside eight kinds of invalid siblings in the same directory, asserts the valid agent still loads, and verifies each invalid sibling produced a specific warning naming both the file and the reason. That exercises both "don't abort the scan on one bad file" and "fail loud" in a single test. Also asserts that stray .json and .txt files in the directory are ignored at the glob layer — they must produce no warning at all because the scanner never touches them. Introduce a minimal `ListLogger<T>` that captures warning-level formatted messages so tests can assert on log contents. ~20 lines, lives in the same test file; if we need it elsewhere we can promote it. Separately, delete three `BuildUserMessage_*` unit tests on `SubAgentActor`. They were textbook examples of what CLAUDE.md says to delete ("if the test is just asserting that $"{a}/{b}" equals "a/b", delete it") — they pinned a 4-line string interpolation helper whose behavior is already covered by the two end-to-end `RuntimeContext_*` tests. With the unit tests gone, `BuildUserMessage` has no external caller, so tighten its visibility from `internal` to `private`. Net: Configuration.Tests 189 → 181, Actors.Tests SubAgent filter 30 → 27. Fewer tests, each covering real behavior. All green; slopwatch clean.
4 tasks
Aaronontheweb
added a commit
that referenced
this pull request
Apr 14, 2026
PR #652 migrated sub-agent definitions from JSON+MD sidecar pairs to single markdown files with YAML frontmatter, added an optional Context parameter to spawn_agent, and enriched the [available-subagents] discovery context layer to include per-agent description, tools, and an example call. None of that was reflected in the runbook — anyone following the existing "Defining subagents" section would write a .json file the loader rejects. Rewrite docs/runbooks/subagents.md end to end: - Discovery section shows the new enriched block (per-agent description + tools + timeout + a how-to-delegate block with the context field example) instead of the old one-line form. - Invocation section documents all three spawn_agent arguments (agent, task, context) with one null-context and one context-populated example. Clarifies that Context is prefixed onto the subagent's first user message as a Context:/Task: block, and that the agent's system prompt stays verbatim from disk (not mutated by runtime context). - Defining subagents section replaces the JSON schema + companion-file walkthrough with a single markdown-with-frontmatter template. Documents every frontmatter field with required/default columns: name, description, tools, modelRole, timeoutSeconds, visibility (hyphenated or PascalCase), emitStructuredFindings. Adds a loader-behavior-fail-loud subsection listing every rejection reason the scanner can log. - Creating a custom agent walkthrough now writes a single .md file with frontmatter instead of a JSON+MD pair. - Limitations section gains an entry noting that per-agent model selection is not yet supported and is blocked on the multi-model provider follow-on. Also update IMPLEMENTATION_PLAN.md line 969 — the "Subagent Roadmap" entry still listed the JSON file format. Updated minimally to reference the new format and cite both #226 (original disk-based design) and #652 (migration to frontmatter). Historical references in RELEASE_NOTES.md are left alone — they accurately describe what shipped at #226 and shouldn't be retconned.
Aaronontheweb
added a commit
that referenced
this pull request
Apr 14, 2026
The loader test suite from #652 had nine near-identical tests that all reduced to `Assert.Empty(results)` after writing one malformed file. If the loader were rewritten to unconditionally return an empty list, all nine would still pass — they looked like they pinned nine distinct validation paths but actually pinned zero and never verified the "fail loud" contract (no assertion that a warning was logged, no check that the scanner actually reached the file). Replace with one consolidated test that drops a valid agent alongside eight kinds of invalid siblings in the same directory, asserts the valid agent still loads, and verifies each invalid sibling produced a specific warning naming both the file and the reason. That exercises both "don't abort the scan on one bad file" and "fail loud" in a single test. Also asserts that stray .json and .txt files in the directory are ignored at the glob layer — they must produce no warning at all because the scanner never touches them. Introduce a minimal `ListLogger<T>` that captures warning-level formatted messages so tests can assert on log contents. ~20 lines, lives in the same test file; if we need it elsewhere we can promote it. Separately, delete three `BuildUserMessage_*` unit tests on `SubAgentActor`. They were textbook examples of what CLAUDE.md says to delete ("if the test is just asserting that $"{a}/{b}" equals "a/b", delete it") — they pinned a 4-line string interpolation helper whose behavior is already covered by the two end-to-end `RuntimeContext_*` tests. With the unit tests gone, `BuildUserMessage` has no external caller, so tighten its visibility from `internal` to `private`. Net: Configuration.Tests 189 → 181, Actors.Tests SubAgent filter 30 → 27. Fewer tests, each covering real behavior. All green; slopwatch clean.
Aaronontheweb
added a commit
that referenced
this pull request
Apr 14, 2026
The init wizard seeds an AGENTS.md "Subagent Delegation" section that tells the frontline LLM when to delegate to spawn_agent. The current guidance sets the bar too high — "deep web research requiring multiple searches and synthesis" as the threshold excludes the bulk of research tasks that would benefit from delegation. Live smoke testing earlier in the sub-agent work confirmed this: Qwen performed ~50k tokens of in-process research on a clearly delegable task because the guidance didn't reach the delegation trigger. Rewrite the section end to end, applying the proposal from #522 verbatim with two adjacent additions: 1. A context-window-protection bullet in "When to delegate". Delegation's biggest structural benefit isn't specialization or parallelism — it's keeping raw file/web content out of the main session's context window. The existing guidance never said that out loud. 2. A new "Per-call specialization" paragraph introducing the optional `context` argument that shipped in the single-file format work (#647 / #652). The discovery context layer advertises it, but AGENTS.md was still telling the model to call spawn_agent with only agent + task. The expanded "When to delegate" list lowers the bar to: - Research requiring 2+ sources or multiple searches (was: "deep research") - Parallelizable tasks (multiple independent queries concurrently) - Work that would otherwise pull large content into this session's context - Background prep work that doesn't block immediate response - Code analysis on large files or multiple files - Summarization of long documents or web pages - Preliminary passes on topics before diving deep The expanded "When NOT to delegate" list adds: - Tasks where coordination overhead outweighs parallelization benefits Adds a parallelization tip pointing out that independent topics should be spawned as concurrent subagents, not serialized. Note: this update affects the init-wizard seed template, so new installs pick it up on `netclaw init`. Existing users' AGENTS.md files are not mutated automatically — operators who want the new guidance should copy the new block into their `~/.netclaw/identity/AGENTS.md` manually. Closes #522
4 tasks
Aaronontheweb
added a commit
that referenced
this pull request
Apr 14, 2026
PR #652 migrated sub-agent definitions from JSON+MD sidecar pairs to single markdown files with YAML frontmatter, added an optional Context parameter to spawn_agent, and enriched the [available-subagents] discovery context layer to include per-agent description, tools, and an example call. None of that was reflected in the runbook — anyone following the existing "Defining subagents" section would write a .json file the loader rejects. Rewrite docs/runbooks/subagents.md end to end: - Discovery section shows the new enriched block (per-agent description + tools + timeout + a how-to-delegate block with the context field example) instead of the old one-line form. - Invocation section documents all three spawn_agent arguments (agent, task, context) with one null-context and one context-populated example. Clarifies that Context is prefixed onto the subagent's first user message as a Context:/Task: block, and that the agent's system prompt stays verbatim from disk (not mutated by runtime context). - Defining subagents section replaces the JSON schema + companion-file walkthrough with a single markdown-with-frontmatter template. Documents every frontmatter field with required/default columns: name, description, tools, modelRole, timeoutSeconds, visibility (hyphenated or PascalCase), emitStructuredFindings. Adds a loader-behavior-fail-loud subsection listing every rejection reason the scanner can log. - Creating a custom agent walkthrough now writes a single .md file with frontmatter instead of a JSON+MD pair. - Limitations section gains an entry noting that per-agent model selection is not yet supported and is blocked on the multi-model provider follow-on. Also update IMPLEMENTATION_PLAN.md line 969 — the "Subagent Roadmap" entry still listed the JSON file format. Updated minimally to reference the new format and cite both #226 (original disk-based design) and #652 (migration to frontmatter). Historical references in RELEASE_NOTES.md are left alone — they accurately describe what shipped at #226 and shouldn't be retconned.
Aaronontheweb
added a commit
that referenced
this pull request
Apr 14, 2026
…656) The init wizard seeds an AGENTS.md "Subagent Delegation" section that tells the frontline LLM when to delegate to spawn_agent. The current guidance sets the bar too high — "deep web research requiring multiple searches and synthesis" as the threshold excludes the bulk of research tasks that would benefit from delegation. Live smoke testing earlier in the sub-agent work confirmed this: Qwen performed ~50k tokens of in-process research on a clearly delegable task because the guidance didn't reach the delegation trigger. Rewrite the section end to end, applying the proposal from #522 verbatim with two adjacent additions: 1. A context-window-protection bullet in "When to delegate". Delegation's biggest structural benefit isn't specialization or parallelism — it's keeping raw file/web content out of the main session's context window. The existing guidance never said that out loud. 2. A new "Per-call specialization" paragraph introducing the optional `context` argument that shipped in the single-file format work (#647 / #652). The discovery context layer advertises it, but AGENTS.md was still telling the model to call spawn_agent with only agent + task. The expanded "When to delegate" list lowers the bar to: - Research requiring 2+ sources or multiple searches (was: "deep research") - Parallelizable tasks (multiple independent queries concurrently) - Work that would otherwise pull large content into this session's context - Background prep work that doesn't block immediate response - Code analysis on large files or multiple files - Summarization of long documents or web pages - Preliminary passes on topics before diving deep The expanded "When NOT to delegate" list adds: - Tasks where coordination overhead outweighs parallelization benefits Adds a parallelization tip pointing out that independent topics should be spawned as concurrent subagents, not serialized. Note: this update affects the init-wizard seed template, so new installs pick it up on `netclaw init`. Existing users' AGENTS.md files are not mutated automatically — operators who want the new guidance should copy the new block into their `~/.netclaw/identity/AGENTS.md` manually. Closes #522
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.mdfile under~/.netclaw/agents/where the frontmatter carries metadata (name,description,tools,modelRole,timeoutSeconds,visibility,emitStructuredFindings) and the body is the system prompt verbatim. Replaces the JSON +.mdsidecar pair that was the only two-file artifact type left in Netclaw.Contextparameter tospawn_agentso the frontline LLM can pass per-invocation background (workspace details, the user's broader goal, facts the subagent would otherwise have to rediscover) without editing the agent file. Context is prefixed onto the subagent's first user message as a separateContext:block, keeping the agent's static system prompt verbatim and reproducible across calls. WhenContextis null the protocol is byte-identical to the pre-change shape.[available-subagents]discovery context layer to emit each agent's full description, tool list, and a usage example that includes the newContextfield — today's one-line form did not teach the model how or when to delegate.Why
Netclaw's sub-agents were the only disk-based artifact in the install that still used a two-file JSON+MD pair. Claude Code, OpenCode, AgentSkills.io
SKILL.md, and Aaron's ownlocal-ai-preferences/agents/*.mdlibrary all use single markdown-with-frontmatter files. Adopting the de facto shape means existing libraries become drop-in portable (modulo tool-name translation, tracked separately) and new sub-agents are half the authoring cost.The
Contextparameter addresses a separate but related problem: Phase 0.2 smoke testing earlier in this session confirmed Qwen won't organically delegate on research-worthy prompts, and even when it does, the sub-agent gets a cold start with no workspace/goal awareness. Runtime context lets the parent session specialize a single profile per invocation rather than authoring N tightly-scoped profiles.Zero-migration observation: the three stock defaults are the only files that have ever existed in
~/.netclaw/agents/on a fresh install, so the JSON → markdown migration carries no user-content risk.Key changes
YamlDotNetPackageReference toNetclaw.ConfigurationSubAgentMarkdownParserwithExtractFrontmatter/ExtractBodymatching theSkillScannerpattern;SubAgentFrontmatterDTO maps 1:1 toSubAgentProfileFileSubAgentDefinitionLoaderto scan*.md, fail loud on malformed frontmatter, missing required fields (name/description/tools/body), unknown tool names for user-facing agents, duplicate names across filesresearch-assistant,code-analyst,summarizer) in the new formatruntimeContextparameter throughSpawnAgentTool.Params→SubAgentSpawner.SpawnAsync→RunSubAgent→SubAgentActorinitial user messageBuildUserMessagehelper composesContext:\n...\n\nTask:\n...when context is present, raw task otherwiseTest plan
FileSubAgentDefinitionLoadertests covering full frontmatter parsing, required-field failures, duplicate-name rejection, disallowed-tool rejection, hyphenated+PascalCase visibility variants, empty body, missing frontmatter, non-.mdfiles ignoredSubAgentActortests:BuildUserMessageunit cases (null/whitespace/populated) plus end-to-endRuntimeContext_is_prefixed_onto_first_user_messageandNull_RuntimeContext_leaves_first_user_message_as_raw_taskvia aFakeChatClient.LastReceivedMessagescaptureNetclaw.Configuration.Tests: 189/189 greenNetclaw.Actors.Tests(SubAgent + Session + ToolIndexUpdater filter): 307/307 greenNetclaw.Daemon.Tests: 436/436 greendotnet slopwatch analyze— 0 issuesRelated