Skip to content

test(subagents): consolidate anemic loader and actor tests#654

Merged
Aaronontheweb merged 1 commit into
devfrom
fix/consolidate-subagent-loader-tests
Apr 14, 2026
Merged

test(subagents): consolidate anemic loader and actor tests#654
Aaronontheweb merged 1 commit into
devfrom
fix/consolidate-subagent-loader-tests

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

  • Replace nine near-identical LoadAll_skips_X tests — each of which wrote one bad file and just asserted Assert.Empty(results) — with one consolidated LoadAll_loads_valid_agent_and_rejects_invalid_siblings_with_warnings that drops a valid agent alongside eight kinds of invalid siblings in the same directory and verifies (a) the valid agent still loads, (b) each invalid sibling produced a specific warning naming both the file and the reason, and (c) stray .json / .txt files produce no warning at all because the scanner never touches them.
  • Introduce a minimal ListLogger<T> (≈20 lines, inline in the test file) that captures warning-level formatted messages so tests can assert on log contents.
  • Delete three BuildUserMessage_* unit tests on SubAgentActor. They pinned a 4-line string interpolation helper whose behavior is already fully covered by the two end-to-end RuntimeContext_is_prefixed_onto_first_user_message and Null_RuntimeContext_leaves_first_user_message_as_raw_task tests. With the unit tests gone, BuildUserMessage has no external caller, so tighten its visibility from internal to private.

Why

The LoadAll_skips_X tests from #652 were a pattern the Netclaw constitution explicitly warns against: "If the test is just asserting that $"{a}/{b}" equals "a/b", delete it. Prefer fewer tests that cover real behavior over many tests that pad coverage." They looked like they pinned nine distinct validation paths but if the loader were rewritten to unconditionally return an empty list for every file, all nine would still pass. That's negative signal dressed up as coverage.

More importantly, the whole point of the format redesign was "fail loud on malformed files" (per CLAUDE.md's no-silent-fallbacks rule), and the old tests didn't actually verify any warning was logged. A silent-skip regression would have sailed through the suite.

The BuildUserMessage_* unit tests were the same smell applied to a string helper. The E2E tests exercise the same helper through the actor pipeline and catch any regression the unit tests would.

Test plan

  • dotnet test src/Netclaw.Configuration.Tests — 181/181 green (was 189 before; removed 8 redundant tests)
  • dotnet test src/Netclaw.Actors.Tests --filter FullyQualifiedName~SubAgent — 27/27 green (was 30; removed 3 redundant unit tests)
  • dotnet slopwatch analyze — 0 issues
  • GPG signed commit

Notes

The one consolidated test intentionally asserts on substring matches in captured warning messages rather than exact strings, so minor copy edits to the loader's warning text won't break the suite — but a silent skip (no warning at all) or a warning that omits the filename / reason absolutely will.

The loader test suite from #652 had nine near-identical tests that all reduced
to `Assert.Empty(results)` after writing one malformed file. If the loader
were rewritten to unconditionally return an empty list, all nine would still
pass — they looked like they pinned nine distinct validation paths but
actually pinned zero and never verified the "fail loud" contract (no
assertion that a warning was logged, no check that the scanner actually
reached the file).

Replace with one consolidated test that drops a valid agent alongside eight
kinds of invalid siblings in the same directory, asserts the valid agent
still loads, and verifies each invalid sibling produced a specific warning
naming both the file and the reason. That exercises both "don't abort the
scan on one bad file" and "fail loud" in a single test. Also asserts that
stray .json and .txt files in the directory are ignored at the glob layer —
they must produce no warning at all because the scanner never touches them.

Introduce a minimal `ListLogger<T>` that captures warning-level formatted
messages so tests can assert on log contents. ~20 lines, lives in the same
test file; if we need it elsewhere we can promote it.

Separately, delete three `BuildUserMessage_*` unit tests on `SubAgentActor`.
They were textbook examples of what CLAUDE.md says to delete ("if the test
is just asserting that $"{a}/{b}" equals "a/b", delete it") — they pinned a
4-line string interpolation helper whose behavior is already covered by the
two end-to-end `RuntimeContext_*` tests. With the unit tests gone,
`BuildUserMessage` has no external caller, so tighten its visibility from
`internal` to `private`.

Net: Configuration.Tests 189 → 181, Actors.Tests SubAgent filter 30 → 27.
Fewer tests, each covering real behavior. All green; slopwatch clean.
@Aaronontheweb Aaronontheweb added cleanup Code quality improvements and tech debt reduction sessions LLM session actor, turn lifecycle, pipelines labels Apr 14, 2026
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) April 14, 2026 02:58
@Aaronontheweb Aaronontheweb merged commit fe497d2 into dev Apr 14, 2026
4 of 5 checks passed
@Aaronontheweb Aaronontheweb deleted the fix/consolidate-subagent-loader-tests branch April 14, 2026 03:02
@Aaronontheweb Aaronontheweb added the subagents spawn_agent, SubAgentActor, definition loader, discovery context layer, and related features label Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Code quality improvements and tech debt reduction sessions LLM session actor, turn lifecycle, pipelines subagents spawn_agent, SubAgentActor, definition loader, discovery context layer, and related features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant