ci: split Tests workflow into 4 parallel shards (target <2min) by teknium1 · Pull Request #11619 · NousResearch/hermes-agent

teknium1 · 2026-04-17T13:25:55Z

Summary

CI test wall time drops from ~4min to ~90s by running the suite as 4 parallel matrix jobs instead of a single job. Previously blocked by cross-test pollution; #11577 made the suite hermetic, unblocking this.

Changes

pyproject.toml: add pytest-split>=0.9,<1 to dev extras
.github/workflows/tests.yml: test job matrix-splits into 4 groups. Each shard runs pytest --splits 4 --group N, composes with -n auto xdist inside
fail-fast: false — all shards finish even if one fails

Validation

	Before	After (expected)
Wall time	243s test step + ~25s setup = ~4m	slowest shard ~60-90s + ~25s setup \u2248 ~90-115s
CPU parallelism	4 workers (1 job, -n auto)	16 workers (4 shards \u00d7 -n auto)
Tests per shard	12,098	~3,025

Shard 1 ran locally in 37s on my 16-core box; CI runners are 4-core so expect 60-90s per shard.

Context

#11566 closed — same approach but shard 3 hung at 97% due to cross-test pollution (sys.modules['dotenv'] stub bombs + env var leakage exposing hidden test dependencies). #11453 fixed the caplog flakes. #11577 made conftest hermetic and removed the dotenv stubs. This PR is only viable because that foundation is in place.

github-actions · 2026-04-17T13:26:08Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: CI/CD workflow files modified

Changes to workflow files can alter build pipelines, inject steps, or modify permissions. Verify no unauthorized actions or secrets access were added.

Files:

.github/workflows/tests.yml

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

Target: <2min CI test wall time. Runs the Tests workflow as a 4-way matrix instead of one job. Each shard runs ~3,000 tests on its own ubuntu-latest runner (4 cores) with -n auto xdist inside. Total effective parallelism: 16 workers across 4 machines (vs 4 workers on 1 machine today). Was previously tried in #11566 and closed — shard 3 hung at 97% complete for 100+ seconds with dozens of E/F markers. Root cause was cross-test pollution exposed by splitting test files across shards (e.g. the three test files that mutated sys.modules['dotenv'] at import time poisoned whichever shard they landed in). That's now fixed by #11453 and #11577: conftest is hermetic, the dotenv stub bombs are removed, and tests no longer depend on each other's env-var side effects. Changes: - pyproject.toml: add pytest-split>=0.9,<1 to dev extras - .github/workflows/tests.yml: 'test' job becomes matrix-split into 4 groups with fail-fast: false. Runs 'pytest --splits 4 --group N'. pytest-split composes with -n auto from pyproject addopts. e2e job is unchanged (already small, 20s). Expected timing: Before: ~4m total (243s test step + ~25s setup) After: ~90-115s total (shard wall time ~60-90s + ~25s setup) Hash-based split is deterministic; no .test_durations file needed yet. Can add one later via --store-durations for better shard balance.

Tests that construct AIAgent(api_key=..., ...) without base_url were relying on provider-resolver fallback state from other tests in the same xdist worker. When matrix-split distributed them to different shards, the resolver found no env vars and no config and raised 'No LLM provider configured'. Fix: add base_url='https://openrouter.ai/api/v1' to every AIAgent construction that passes api_key. AIAgent.__init__ with both args set takes the direct-construction path (line 960 in run_agent.py) and skips resolver fallback entirely, making these tests self-contained. 7 files, 16 call sites updated via AST-based fixup. One call site (test_none_base_url_passed_as_none) left alone — that test's intent is to verify base_url=None behavior, so adding base_url defeats the test. Validation: - tests/run_agent/ full run: 760 passed, 0 failed (was 1 failure under the AST script's over-application, now clean) - Matrix shard 3 local run: 3083 passed, 0 failed, 1m44s

github-actions · 2026-04-17T14:00:00Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: CI/CD workflow files modified

Changes to workflow files can alter build pipelines, inject steps, or modify permissions. Verify no unauthorized actions or secrets access were added.

Files:

.github/workflows/tests.yml

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

… tests Same root cause as previous commit — tests that construct AIAgent() without base_url rely on provider-resolver fallback state that doesn't exist in hermetic CI / shard-split runs. Previously hidden because other tests in the same xdist worker happened to prime module state. Covered by the previous fix: calls that passed api_key but not base_url. This covers calls that pass NEITHER (model=... only) — test_streaming.py especially (24 call sites). Plus test_860_dedup, test_compression_ persistence, test_create_openai_client_*, test_provider_parity. One call site (test_none_base_url_passed_as_none) remains explicitly unmodified — it asserts None/empty base_url behavior, so adding base_url would defeat the test's intent. Validation: - tests/run_agent/: 760 passed, 0 failed (local) - Matrix shard 3 subset: 3098 passed, 0 failed, 1m49s (local)

github-actions · 2026-04-17T14:18:19Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: CI/CD workflow files modified

Changes to workflow files can alter build pipelines, inject steps, or modify permissions. Verify no unauthorized actions or secrets access were added.

Files:

.github/workflows/tests.yml

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…rect path) My last fix added base_url but not api_key. AIAgent.__init__ takes the direct-construction path only when BOTH are set — with only base_url it still calls resolve_provider_client and fails in hermetic CI. Same 31 call sites, now with both kwargs.

github-actions · 2026-04-17T14:42:09Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: CI/CD workflow files modified

Changes to workflow files can alter build pipelines, inject steps, or modify permissions. Verify no unauthorized actions or secrets access were added.

Files:

.github/workflows/tests.yml

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

teknium1 · 2026-04-17T19:23:22Z

Closing — matrix split gets 3 of 4 shards fast (<2m) but shard 3 is 5x slower on GHA than locally (9m41s vs 1m48s), making total CI wall time worse than the 4m baseline. Landing just the AIAgent test-self-contained-ness fixes in a separate PR; will investigate the CI-specific slowdown in shard 3 before retrying matrix split.

…AIAgent CI on main had 7 failing tests. Five were stale test fixtures; one (agent cache spillover timeout) was covering up a real perf regression in AIAgent construction. The perf bug: every AIAgent.__init__ calls _check_compression_model_feasibility → resolve_provider_client('auto') → _resolve_api_key_provider which iterates PROVIDER_REGISTRY. When it hits 'zai', it unconditionally calls resolve_api_key_provider_credentials → _resolve_zai_base_url → probes 8 Z.AI endpoints with an empty Bearer token (all 401s), ~2s of pure latency per agent, even when the user has never touched Z.AI. Landed in 9e84416 (PR for credential-pool Z.AI auto-detect) — the short-circuit when api_key is empty was missing. _resolve_kimi_base_url had the same shape; fixed too. Test fixes: - tests/gateway/test_voice_command.py: _make_adapter helpers were missing self._voice_locks (added in PR #12644, 7 call sites — all updated). - tests/test_toolsets.py: test_hermes_platforms_share_core_tools asserted equality, but hermes-discord has discord_server (DISCORD_BOT_TOKEN-gated, discord-only by design). Switched to subset check. - tests/run_agent/test_streaming.py: test_tool_name_not_duplicated_when_resent_per_chunk missing api_key/base_url — classic pitfall (PR #11619 fixed 16 of these; this one slipped through on a later commit). - tests/tools/test_discord_tool.py: TestConfigAllowlist caplog assertions fail in parallel runs because AIAgent(quiet_mode=True) globally sets logging.getLogger('tools').setLevel(ERROR) and xdist workers are persistent. Autouse fixture resets the 'tools' and 'tools.discord_tool' levels per test. Validation: tests/cron + voice + agent_cache + streaming + toolsets + command_guards + discord_tool: 550/550 pass tests/hermes_cli + tests/gateway: 5713/5713 pass AIAgent construction without Z.AI creds: 2.2s → 0.24s (9x)

…AIAgent CI on main had 7 failing tests. Five were stale test fixtures; one (agent cache spillover timeout) was covering up a real perf regression in AIAgent construction. The perf bug: every AIAgent.__init__ calls _check_compression_model_feasibility → resolve_provider_client('auto') → _resolve_api_key_provider which iterates PROVIDER_REGISTRY. When it hits 'zai', it unconditionally calls resolve_api_key_provider_credentials → _resolve_zai_base_url → probes 8 Z.AI endpoints with an empty Bearer token (all 401s), ~2s of pure latency per agent, even when the user has never touched Z.AI. Landed in 0154413 (PR for credential-pool Z.AI auto-detect) — the short-circuit when api_key is empty was missing. _resolve_kimi_base_url had the same shape; fixed too. Test fixes: - tests/gateway/test_voice_command.py: _make_adapter helpers were missing self._voice_locks (added in PR NousResearch#12644, 7 call sites — all updated). - tests/test_toolsets.py: test_hermes_platforms_share_core_tools asserted equality, but hermes-discord has discord_server (DISCORD_BOT_TOKEN-gated, discord-only by design). Switched to subset check. - tests/run_agent/test_streaming.py: test_tool_name_not_duplicated_when_resent_per_chunk missing api_key/base_url — classic pitfall (PR NousResearch#11619 fixed 16 of these; this one slipped through on a later commit). - tests/tools/test_discord_tool.py: TestConfigAllowlist caplog assertions fail in parallel runs because AIAgent(quiet_mode=True) globally sets logging.getLogger('tools').setLevel(ERROR) and xdist workers are persistent. Autouse fixture resets the 'tools' and 'tools.discord_tool' levels per test. Validation: tests/cron + voice + agent_cache + streaming + toolsets + command_guards + discord_tool: 550/550 pass tests/hermes_cli + tests/gateway: 5713/5713 pass AIAgent construction without Z.AI creds: 2.2s → 0.24s (9x)

…AIAgent CI on main had 7 failing tests. Five were stale test fixtures; one (agent cache spillover timeout) was covering up a real perf regression in AIAgent construction. The perf bug: every AIAgent.__init__ calls _check_compression_model_feasibility → resolve_provider_client('auto') → _resolve_api_key_provider which iterates PROVIDER_REGISTRY. When it hits 'zai', it unconditionally calls resolve_api_key_provider_credentials → _resolve_zai_base_url → probes 8 Z.AI endpoints with an empty Bearer token (all 401s), ~2s of pure latency per agent, even when the user has never touched Z.AI. Landed in 9e84416 (PR for credential-pool Z.AI auto-detect) — the short-circuit when api_key is empty was missing. _resolve_kimi_base_url had the same shape; fixed too. Test fixes: - tests/gateway/test_voice_command.py: _make_adapter helpers were missing self._voice_locks (added in PR NousResearch#12644, 7 call sites — all updated). - tests/test_toolsets.py: test_hermes_platforms_share_core_tools asserted equality, but hermes-discord has discord_server (DISCORD_BOT_TOKEN-gated, discord-only by design). Switched to subset check. - tests/run_agent/test_streaming.py: test_tool_name_not_duplicated_when_resent_per_chunk missing api_key/base_url — classic pitfall (PR NousResearch#11619 fixed 16 of these; this one slipped through on a later commit). - tests/tools/test_discord_tool.py: TestConfigAllowlist caplog assertions fail in parallel runs because AIAgent(quiet_mode=True) globally sets logging.getLogger('tools').setLevel(ERROR) and xdist workers are persistent. Autouse fixture resets the 'tools' and 'tools.discord_tool' levels per test. Validation: tests/cron + voice + agent_cache + streaming + toolsets + command_guards + discord_tool: 550/550 pass tests/hermes_cli + tests/gateway: 5713/5713 pass AIAgent construction without Z.AI creds: 2.2s → 0.24s (9x)

…AIAgent CI on main had 7 failing tests. Five were stale test fixtures; one (agent cache spillover timeout) was covering up a real perf regression in AIAgent construction. The perf bug: every AIAgent.__init__ calls _check_compression_model_feasibility → resolve_provider_client('auto') → _resolve_api_key_provider which iterates PROVIDER_REGISTRY. When it hits 'zai', it unconditionally calls resolve_api_key_provider_credentials → _resolve_zai_base_url → probes 8 Z.AI endpoints with an empty Bearer token (all 401s), ~2s of pure latency per agent, even when the user has never touched Z.AI. Landed in d44cf13 (PR for credential-pool Z.AI auto-detect) — the short-circuit when api_key is empty was missing. _resolve_kimi_base_url had the same shape; fixed too. Test fixes: - tests/gateway/test_voice_command.py: _make_adapter helpers were missing self._voice_locks (added in PR NousResearch#12644, 7 call sites — all updated). - tests/test_toolsets.py: test_hermes_platforms_share_core_tools asserted equality, but hermes-discord has discord_server (DISCORD_BOT_TOKEN-gated, discord-only by design). Switched to subset check. - tests/run_agent/test_streaming.py: test_tool_name_not_duplicated_when_resent_per_chunk missing api_key/base_url — classic pitfall (PR NousResearch#11619 fixed 16 of these; this one slipped through on a later commit). - tests/tools/test_discord_tool.py: TestConfigAllowlist caplog assertions fail in parallel runs because AIAgent(quiet_mode=True) globally sets logging.getLogger('tools').setLevel(ERROR) and xdist workers are persistent. Autouse fixture resets the 'tools' and 'tools.discord_tool' levels per test. Validation: tests/cron + voice + agent_cache + streaming + toolsets + command_guards + discord_tool: 550/550 pass tests/hermes_cli + tests/gateway: 5713/5713 pass AIAgent construction without Z.AI creds: 2.2s → 0.24s (9x)

teknium1 added 2 commits April 17, 2026 06:59

teknium1 force-pushed the ci/matrix-split-v2 branch from 541a5db to c2559b8 Compare April 17, 2026 13:59

teknium1 closed this Apr 17, 2026

teknium1 mentioned this pull request Apr 17, 2026

fix(tests): make AIAgent constructor calls self-contained #11755

Merged

teknium1 mentioned this pull request Apr 20, 2026

fix(ci): unblock test suite + cut ~2s of dead Z.AI probes from every AIAgent #12765

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: split Tests workflow into 4 parallel shards (target <2min)#11619

ci: split Tests workflow into 4 parallel shards (target <2min)#11619
teknium1 wants to merge 4 commits into
mainfrom
ci/matrix-split-v2

teknium1 commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

teknium1 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 17, 2026

Summary

Changes

Validation

Context

Uh oh!

github-actions Bot commented Apr 17, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: CI/CD workflow files modified

⚠️ WARNING: Dependency manifest files modified

Uh oh!

github-actions Bot commented Apr 17, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: CI/CD workflow files modified

⚠️ WARNING: Dependency manifest files modified

Uh oh!

github-actions Bot commented Apr 17, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: CI/CD workflow files modified

⚠️ WARNING: Dependency manifest files modified

Uh oh!

github-actions Bot commented Apr 17, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: CI/CD workflow files modified

⚠️ WARNING: Dependency manifest files modified

Uh oh!

teknium1 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant