feat(tools/wot_engine): add Web-of-Thought multi-agent reasoning#20158
Open
Abd0r wants to merge 2 commits into
Open
feat(tools/wot_engine): add Web-of-Thought multi-agent reasoning#20158Abd0r wants to merge 2 commits into
Abd0r wants to merge 2 commits into
Conversation
Adds a self-contained multi-agent reasoning engine that coordinates 3-7 LLM agents through a shared message bus. Generic agents (no role taxonomy) talk to each other across four communication modes — parallel, streaming, sequential, queue — over any OpenAI-compatible backend. The engine is exposed as a single Hermes tool, `wot_chat`, registered under a new `wot` toolset. Caller passes agent specs (name + system_prompt) and a task; the engine orchestrates the conversation and returns a structured transcript for the outer agent to synthesize. Files: - tools/wot_engine.py — engine + tool registration (~1040 lines) - tests/tools/test_wot_engine.py — 36 unit tests, all green - skills/coordination/web-of-thought/SKILL.md — methodology guidance (when to invoke, how to design agents, mode selection, cost discipline) Engine specifics: - Backend probe at startup: detects llama.cpp / Ollama / vLLM / OpenAI-compat. Uses id_slot pinning + cache_prompt: true on llama.cpp for KV-cache reuse across agents. - Reasoning content extraction: handles delta.reasoning_content from thinking-mode models (DeepSeek-R1, QwQ, etc.) separately from content, so peer messages can choose to propagate raw / strip / summarize CoT. - Per-agent timeout via asyncio.wait_for, per-channel token budget, monotonic seq counter on Message envelope for stream debugging. - AgentSpec auto-sanitizes whitespace in names (real LLMs emit "Critical Thinker" / "Agent A"); raises only when sanitized name is empty. - /v1 suffix is stripped from base_url at client init so callers can pass either form (http://host:8088 OR http://host:8088/v1) without doubling. - Hermes tool registration is at module top-level (not wrapped in try/except) so tools/registry.py:_module_registers_tools picks it up via AST scan. Skill methodology: - When to invoke wot_chat (multi-perspective questions, tradeoffs, decisions with real downside) and when NOT to (lookups, single-fact, simple chat). - Agent design rule: minimal differentiating system_prompt, no scripted personas, no role-cargo names. Engine remains role-agnostic. - Mode selection: parallel (default) / streaming / sequential / queue. - Cost discipline: max_rounds: 2-3 for most cases, set token_budget for hard caps. - Reading the result: errors first, then agents_done, then transcript. Validated end-to-end on Ubuntu 24.04 + RTX 4050 with two model configurations: 1. Local llama.cpp + Qwen3-4B-Instruct-Q4_K_M (--parallel 4 --jinja): 5/5 sessions completed, 48 WoT messages across runs, 0 inner errors. 2. OpenRouter + DeepSeek-V4-Flash (with skill loaded): 5/5 sessions, skill methodology measurably moved model behavior toward leaner invocations (avg agent name ~10 chars vs ~22 unloaded; max_rounds explicitly set 5/5; 43% latency drop). License: MIT (auto per CONTRIBUTING.md).
Adds per-agent base_url + api_key fields to AgentSpec, enabling a single WoT session to mix backends (e.g. one agent on local Ollama, another on OpenRouter). _LLMClient caches backend probes per-base_url so each unique target is only probed once across the run. Engine changes: - AgentSpec: new optional fields base_url + api_key - _LLMClient: _probe_cache: Dict[str, BackendInfo], ensure_probed() now takes optional base_url_override and caches per-target - _resolve_target() helper composes the right URL + auth headers per call - _openai_payload_for(backend, ...) takes backend explicitly (so id_slot + cache_prompt only land when the THIS request actually targets llama-server) - complete() and stream() take base_url_override + api_key_override kwargs - _stream_openai and _stream_ollama_native take per-call base + headers - Agent.turn_batch + turn_streaming pass spec.base_url + spec.api_key - wot_chat_tool boundary strips model + base_url + api_key from outer-Hermes args (defensive: outer model hallucinates these); direct Python callers using AgentSpec(base_url=..., api_key=...) still work Tests: - 39/39 unit tests passing (up from 36) - New: MultiBackendMixTests verifies per-agent base_url threads to client - New: WotChatToolStripsCallerControlFields verifies tool boundary strips caller-supplied model/base_url/api_key Validated end-to-end: - One WoT session with 2 agents on different backends: - alpha on DeepSeek-V4-Flash via OpenRouter - beta on deepseek-r1:1.5b via local Ollama - Probe cache shows both targets: https://openrouter.ai/api → openai-compat http://127.0.0.1:11434 → ollama - 0 engine errors, both transcripts assembled with correct from-attribution
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a self-contained multi-agent reasoning engine that coordinates 3-7 LLM agents through a shared message bus. Generic agents (no role taxonomy hardcoded) talk to each other across four communication modes —
parallel,streaming,sequential,queue— over any OpenAI-compatible backend. Exposed as a single Hermes tool (wot_chat) under a newwottoolset, plus a methodology skill atskills/coordination/web-of-thought/.This is a separate concern from PRs #19607 / #19796 (free-tier search backends). They touch different surfaces and are independently reviewable.
Files
tools/wot_engine.py— engine + tool registration (1,034 lines)tests/tools/test_wot_engine.py— 36 unit tests, all greenskills/coordination/web-of-thought/SKILL.md— methodology guidance for callers (when to invoke, how to design agents, mode selection, cost discipline)Engine design
name + system_prompt. Engine is content-agnostic; the model decides agent personalities dynamically.parallel— all agents react to the task simultaneously, see peers' completed messages on round boundariesstreaming— same as parallel but agents see partial CoT tokens as they're generatedsequential— round-robin; each agent gets full prior transcriptqueue— tag-driven pull (agents declareinterests, only act when relevant tag appears)id_slotpinning +cache_prompt: trueon llama.cpp for KV-cache reuse across agents. Strips trailing/v1frombase_urlso callers can pass either form.delta.reasoning_contentseparately fromdelta.contentfor thinking-mode models (DeepSeek-R1, QwQ, Qwen3.5/3.6 with thinking on). Stored inMessage.reasoningso peer messages can choose to propagate raw / strip / summarize CoT (thepropagate_reasoningknob;summaryis currently stubbed tostrip— flagged in the docstring).token_budget, per-agentturn_timeoutviaasyncio.wait_for, monotonicseqper agent onMessageenvelope._, disallowed characters dropped. Raises only if the result is empty.modelfield is stripped from inner agent specs atwot_chat_toolboundary — outer Hermes tends to hallucinate names likegpt-4o. Engine usesLLM_DEFAULT_MODEL(env-driven) for all inner agents.Hermes integration
try/except) sotools/registry.py:_module_registers_toolsAST scanner picks it up.wottoolset is auto-created at module load time viatoolsets.create_custom_toolset(...)so-t wotvalidates without modifyingtoolsets.py.--skills coordination/web-of-thoughtand prefixes the system prompt with methodology guidance.How this fits next to existing Hermes multi-agent surfaces
Hermes already ships several multi-agent / delegation primitives. WoT is additive, not redundant — it fills a specific gap none of them serve.
delegate_taskmixture_of_agentswot_chat)@name-mentionsWhat WoT specifically adds: in-process live multi-agent reasoning where inner agents can address each other directly and the outer agent sees the full transcript as it forms. That's the niche the existing surfaces don't fill —
delegate_taskdeliberately hides intermediate output, MoA's reference models don't see each other, and Kanban's polling comments aren't real-time. Several long-open feature requests (#412 consensus/voting, #376 adversarial debate, #479 best-of-N + judge, #5876 multi-agent council) all reduce to this missing primitive.WoT does not replace any of the above. Compose: outer Hermes can call
delegate_taskfor durable cross-process work, dispatchwot_chatfor live debate within its own turn, and use Kanban for cross-session orchestration. They're complementary.Validated end-to-end
Setup: Ubuntu 24.04 + RTX 4050. Isolated Hermes install (separate
HERMES_HOME, no overlap with any production setup).1. Local llama.cpp + Qwen3-4B-Instruct-Q4_K_M (
--parallel 4 --jinja --ctx-size 65536):2. OpenRouter + DeepSeek-V4-Flash (with skill loaded):
errors[])max_rounds: 3explicitly set on 5/5,token_budgeton 3/5, ~43% latency drop vs no-skill baselineCoverage — honest framing
Integration-validated end-to-end (with V4 Flash via OpenRouter as inner agents, full session JSONL captured):
parallelmode — 5/5 sessions clean, 48 WoT messages, multi-round @-mention emergencestreamingmode — 22 streaming chunks + 3 final messages produced,stop_reason=all_donecleansequentialmode — agent ordering preserved across 2 rounds with cross-round @-mentions (round-2 alpha addresses round-1 beta)queuemode — interests tags drove tag-prefixed output ([design]→[code]→[review]), 3 rounds completedturn_timeout— standalone test, 2/2 agents timed out at 2.0s as configured, errors surfaced viaerrors[]modelstripping (saved a run when V4 Flash hallucinatedgpt-4o)/v1suffix doubling fix (caught the OpenRouter 404)Routing-validated (engine routes correctly; downstream model output quality is upstream's concern):
/api/chatpath for thinking models — backend probe identifieskind='ollama', request hits/api/chat(not/v1/), parses bothmessage.contentandmessage.thinkingfields. Tested live withdeepseek-r1:1.5banddeepseek-r1:7b. Output quality of small R1 distills + Ollama template handling is broken upstream (well-known) — engine correctly returns whatever Ollama emits.Unit-test only (no integration run on this PR):
Multi-backend mix — integration-validated (added in second commit
b1e8872):AgentSpecnow has optionalbase_url+api_keyfields for per-agent backend override_LLMClientcaches backend probes per-base_url so each unique target is only probed oncehttps://openrouter.ai/api/v1)deepseek-r1:1.5bvia local Ollama (http://127.0.0.1:11434)https://openrouter.ai/api → openai-compatandhttp://127.0.0.1:11434 → ollamafrom-attributionwot_chat_toolboundary stripsmodel+base_url+api_keyfrom outer-Hermes-supplied args (Hermes hallucinates them); direct Python callers usingAgentSpec(base_url=..., api_key=...)still workStubbed:
propagate_reasoning="summary"— currently behaves identically to"strip". A real summary mode would distill peer CoT through a small model; deferred to a follow-up. Docstring is honest about this.Linked issues
Closes (auto-close on merge):
parallelmode + a synthesizer agent delivers exactly the AgentWorkflows-style consensus pattern.sequentialmode with two agents is iterative-refinement debate.coordination/web-of-thoughtskill is the council methodology; the engine is the substrate.parallelmode with a synthesizer/judge agent is the Best-of-N pattern.Refs (does not auto-close — partial coverage):
Channelprimitive is the shared memory pool; CAMEL-AI-specific patterns are out of scope here.Channel) and per-agent persona (system_prompt).Test plan
pytest -p no:xdist tests/tools/test_wot_engine.py— 36/36 passing on this branchpytest tests/flags)Backwards compatibility
Pure-add. New tool, new toolset, new skill, new test file. Zero changes to existing code paths.
License
MIT (auto per
CONTRIBUTING.md).