test: remove 169 change-detector tests (batch 1 of suite reduction) by teknium1 · Pull Request #11472 · NousResearch/hermes-agent

teknium1 · 2026-04-17T07:41:17Z

Summary

First batch of the test-suite reduction discussed with Teknium. Deletes 169 tests across 21 files that fell into confident change-detector patterns — tests that verify 'nothing was changed' rather than 'something works'.

Deletion categories

Source-grep tests (gateway/test_feishu.py, test_email.py) — tests that call inspect.getsource() on production modules and grep for string literals. Break on any refactor/rename even when behavior is correct.
Platform enum tautologies (every gateway/test_X.py) — Platform.X.value == 'x' duplicated across ~9 adapter test files.
Registry-presence checks (toolset/PLATFORM_HINTS/setup wizard) — tests that only verify a key exists in a dict. Data-layout tests, not behavior.
Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing_fallback) — parser.parse_args([...]) \u2192 assert args.field. Tests Python's argparse, not our code.
Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch) — patch handler, call dispatcher with matching action, assert mock called. Tests the if/elif chain.
Kwarg-to-mock verification (test_auxiliary_client ~45 tests, test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin) — mock external API client, call our function, assert exact kwargs. Break on refactor.
Schedule-internal "function-was-called" tests (acp/test_server scheduling tests) — patch own helper method, assert it was called.

What was kept

Error paths (pytest.raises)
Security tests (path traversal, SSRF, redaction, injection scanning)
Message alternation invariants
Provider API format conversion tests (Anthropic adapter, Bedrock adapter, Codex Responses) \u2014 entirely untouched
Streaming logic tests \u2014 entirely untouched
Memory provider contract tests \u2014 entirely untouched
Credential pool tests \u2014 entirely untouched
Real config load/merge tests (profile-awareness, root-level legacy fallback)

Metrics

	Before	After	Delta
Tests collected	12,522	12,353	-169
Empty classes	\u2014	38 removed
Test LOC	194,963	193,018	-1,945
CI test runtime (main, last run)	3m57s	TBD

Methodology

Three parallel subagents audited tests/gateway/, tests/hermes_cli/+tools/+cli/, tests/agent/+run_agent/+acp/+plugins/+cron/+skills/ for change-detector patterns.
Strict "when in doubt, keep" rule \u2014 each deletion has a specific 1-line reason in the manifest.
AST-based deletion script removed named test functions plus any class that became empty of tests (38 classes).
tests/run_agent/test_run_agent.py was OFF LIMITS throughout \u2014 core agent loop coverage preserved.

Test plan

All 21 modified files compile (py_compile).
Running the 21 affected files gives 988/991 passing locally; 3 failures are pre-existing cross-test pollution in TestSignalPhoneRedaction (caplog/logger state \u2014 same class as the flakes fixed in PR fix(tests): attach caplog to specific logger in 3 order-dependent tests #11453, unrelated to these deletions).
CI is the source of truth \u2014 will monitor the Tests job on this PR.

First pass of test-suite reduction to address flaky CI and bloat. Removed tests that fall into these change-detector patterns: 1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests that call inspect.getsource() on production modules and grep for string literals. Break on any refactor/rename even when behavior is correct. 2. Platform enum tautologies (every gateway/test_X.py): assertions like `Platform.X.value == 'x'` duplicated across ~9 adapter test files. 3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that only verify a key exists in a dict. Data-layout tests, not behavior. 4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing _fallback): tests that do parser.parse_args([...]) then assert args.field. Tests Python's argparse, not our code. 5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch cmd_X, call plugins_command with matching action, assert mock called. Tests the if/elif chain, not behavior. 6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests, test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests that mock the external API client, call our function, and assert exact kwargs. Break on refactor even when behavior is preserved. 7. Schedule-internal "function-was-called" tests (acp/test_server scheduling tests): tests that patch own helper method, then assert it was called. Kept behavioral tests throughout: error paths (pytest.raises), security tests (path traversal, SSRF, redaction), message alternation invariants, provider API format conversion, streaming logic, memory contract, real config load/merge tests. Net reduction: 169 tests removed. 38 empty classes cleaned up. Collected before: 12,522 tests Collected after: 12,353 tests

* merge-upstream-2026-04-17: (243 commits) fix(feishu): reduce CardKit streaming frequency and add backoff on errors fix(feishu): refine CardKit streaming card polish fix(feishu): prevent double finalize and add loading spinner icon fix(feishu): fix /stop regression and streaming card finalization feat(feishu): add CardKit streaming card output feat(feishu): split inbound policy and comment flow fix(feishu): fetch merge-forward submessages eagerly feat(feishu): add drive comment routing feat(feishu): add sender cache and rollout sync fix(feishu): hide opaque sender ids in merge forwards fix(feishu): prefer embedded sender names feat(feishu): add inbound bridge and media index refactor(feishu): extract inbound parse module feat(feishu): preserve merge-forward media context feat(feishu): hydrate quoted merge forwards fix(gateway): persist canonical quoted context feat(feishu): add inbound quoted context pipeline fix(feishu): render outbound messages as Card 2.0 for full markdown support test: remove 169 change-detector tests across 21 files (NousResearch#11472) fix(insights): hide cache read/write and cost metrics from display (NousResearch#11477) ...

…11472) First pass of test-suite reduction to address flaky CI and bloat. Removed tests that fall into these change-detector patterns: 1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests that call inspect.getsource() on production modules and grep for string literals. Break on any refactor/rename even when behavior is correct. 2. Platform enum tautologies (every gateway/test_X.py): assertions like `Platform.X.value == 'x'` duplicated across ~9 adapter test files. 3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that only verify a key exists in a dict. Data-layout tests, not behavior. 4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing _fallback): tests that do parser.parse_args([...]) then assert args.field. Tests Python's argparse, not our code. 5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch cmd_X, call plugins_command with matching action, assert mock called. Tests the if/elif chain, not behavior. 6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests, test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests that mock the external API client, call our function, and assert exact kwargs. Break on refactor even when behavior is preserved. 7. Schedule-internal "function-was-called" tests (acp/test_server scheduling tests): tests that patch own helper method, then assert it was called. Kept behavioral tests throughout: error paths (pytest.raises), security tests (path traversal, SSRF, redaction), message alternation invariants, provider API format conversion, streaming logic, memory contract, real config load/merge tests. Net reduction: 169 tests removed. 38 empty classes cleaned up. Collected before: 12,522 tests Collected after: 12,353 tests

teknium1 force-pushed the fix/test-reduction-batch-1 branch from e3583ab to 69440dd Compare April 17, 2026 07:54

teknium1 merged commit 2367c6f into main Apr 17, 2026
5 checks passed

teknium1 deleted the fix/test-reduction-batch-1 branch April 17, 2026 08:05

This was referenced Apr 18, 2026

fix(gemini): route Google AI Studio auth via x-goog-api-key header (#7893) #11961

Merged

fix: suppress Authorization: Bearer for Gemini provider to prevent HT… #8530

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: remove 169 change-detector tests (batch 1 of suite reduction)#11472

test: remove 169 change-detector tests (batch 1 of suite reduction)#11472
teknium1 merged 1 commit into
mainfrom
fix/test-reduction-batch-1

teknium1 commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 17, 2026

Summary

Deletion categories

What was kept

Metrics

Methodology

Test plan

Next

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant