Skip to content

docs(AGENTS): add quick reference section for common tasks and architecture decisions#4

Merged
Arvuno merged 7183 commits into
mainfrom
contrib/hermes-agent/agents-clarity
May 20, 2026
Merged

docs(AGENTS): add quick reference section for common tasks and architecture decisions#4
Arvuno merged 7183 commits into
mainfrom
contrib/hermes-agent/agents-clarity

Conversation

@Arvuno

@Arvuno Arvuno commented May 20, 2026

Copy link
Copy Markdown
Owner

Summary

Add Quick Reference section at the end of AGENTS.md with two tables:

  1. Common Tasks — run, test, add tool/skill/toolset/config, MCP, gateway, cron, kanban, delegate, skill curator, models, auth
  2. Architecture Decision Quick Links — pointers to key sections by approximate line number

Why this matters

AGENTS.md is 1104 lines. New contributors have to search through it to find the right section for common tasks. A quick-reference table at the end makes the guide more navigable without changing any existing content.

Validation

Content is factual (existing commands and file paths verified against codebase).

Risk

Low — documentation only, no code changes.

helix4u and others added 30 commits May 18, 2026 20:36
Salvages NousResearch#21585 by @helix4u. Documents the protocol_violation event
(worker exits successfully while task is still running), adds
--max-retries to the create flag list and --failure-limit to dispatch.
Salvages NousResearch#23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers
spawned via 'hermes -p <profile> chat -q ...') were not honouring the
profile's fallback_providers / fallback_model chain because oneshot.py
never read the config and never passed fallback_model= to AIAgent.

Reads cfg.get('fallback_providers') (new list format) or
cfg.get('fallback_model') (legacy single-dict) with the same
normalization cli.py applies, then forwards as fallback_model=_fb.
Salvages NousResearch#24050 by @kronexoi. The single-task PATCH already rejects
direct status='running' since it bypasses the dispatcher/claim invariant,
but the bulk-update endpoint still accepted it. Aligns bulk with single
by emitting an error result row for any 'running' entry.
Salvages NousResearch#27526 by @shunsuke-hikiyama. Adds an --initial-status flag
(running|blocked, default running) to 'kanban create', threaded through
kanban_db.create_task() and the kanban_create tool schema. 'blocked'
parks the task directly in the blocked column for R3 human-ops review,
skipping the brief running-to-blocked transition.

Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and
slash-handler error formatting changes that were bundled in the
original PR — those should ship as their own focused changes if still
wanted.
…etion

Salvages NousResearch#27369 by @LeonJS. complete_task() now calls _cleanup_workspace()
and _cleanup_worker_tmux() after marking a task complete.

Scratch workspaces (used by swarm agents) accumulate on disk — hundreds
of MB per task, never released. Stale tmux sessions from completed
agents also persist indefinitely.

Both gates are safe:
- workspace_kind == 'scratch' gate preserves user worktree/dir workspaces
- tmux #{pane_dead} == 1 gate only kills sessions where the worker has
  already exited
- best-effort: cleanup failures never block task completion
Salvages NousResearch#26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics
was using exact-match (severity == filter), so '--severity warning'
hid 'error' and 'critical' diagnostics. Adds severity_at_or_above()
helper to kanban_diagnostics and uses it in the dashboard endpoint
(CLI already used SEVERITY_ORDER comparison correctly).
Salvages the substantive part of NousResearch#22295 by @steezkelly. Adds the
missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK,
HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so
ambient developer-shell pins on those vars don't bleed into pytest runs.

The frozenset extraction + standalone regression test from the original
PR were dropped to keep the change minimal — main already maintains the
list inline.
Salvages NousResearch#22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config
that caps simultaneously running tasks. When the board already has N
running, dispatcher skips spawning so slow workers (local LLMs,
resource-constrained hosts) don't pile up and time out.

Threads through dispatch_once(max_in_progress=) and gateway dispatcher
config parsing with validation (warns on invalid/below-1 values).
Salvages NousResearch#23738 by @LeonSGP43. Wheel installs were missing skills/ and
optional-skills/ because pyproject's [tool.setuptools.packages.find]
only includes Python packages — the skills directories don't have
__init__.py so they were silently dropped from the wheel.

Adds setup.py with data_files spec emitting skills/* and optional-skills/*
under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper
in hermes_constants that discovers the wheel-installed location via
sysconfig before falling back to a source-checkout path. tools/skills_sync
uses the helper so 'hermes update' works for pip-installed users.
Salvages NousResearch#23302 by @Bartok9. Four independent one-area fixes:

1. kanban boards delete alias now hard-deletes (not archives) — the
   alias didn't carry --delete, so getattr(args, 'delete', False)
   returned False. Detect boards_action=='delete' explicitly.
2. Gateway auto-title failures no longer leak as user-visible
   warnings — debug-log only since they're not actionable.
3. Background process completion notification snaps truncation to
   the next newline boundary, prepends a marker when content is
   dropped.
4. _cprint() schedules the run_in_terminal coroutine via
   asyncio.ensure_future so output isn't silently dropped from
   background threads (fixes NousResearch#23185 Bug A). Skips the
   double-print fallback that would fire for mock paths.
Salvages NousResearch#24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens)
is session-static — the dispatcher decides at spawn time whether the
process is a kanban worker via the kanban_show tool's check_fn (gated
on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in
valid_tool_names and re-loading the reference on every system-prompt
rebuild (init + each context compression) is wasted work.

Caches the resolved string on agent._kanban_worker_guidance once in
agent_init and consumes it in system_prompt.build_system_prompt(),
with a getattr fallback for code paths that bypass agent_init.
Salvages NousResearch#25745 by @LizerAIDev. Adds --sort {created,created-desc,
priority,priority-desc,status,assignee,title,updated} to 'hermes kanban
list'. Validated against VALID_SORT_ORDERS map; invalid values raise
ValueError. Default behaviour (priority DESC, created ASC) is unchanged
when --sort is omitted.
… inspect)

Adds three read-only endpoints to the kanban dashboard plugin so the
SwitchUI workspace (and any other dashboard consumer) can track
workers across tasks without N+1 round-trips through /tasks/{task_id}.

- GET /workers/active
  Single SQL JOIN of task_runs + tasks where ended_at IS NULL,
  worker_pid IS NOT NULL, status='running'. Returns
  {workers: [...], count, checked_at}.

- GET /runs/{run_id}
  Direct lookup of any task_run row by id. Reuses existing
  kanban_db.get_run() helper and _run_dict() serialiser. 404 when
  not found. Mirrors GET /tasks/{task_id} 404 shape.

- GET /runs/{run_id}/inspect
  Live PID stats via psutil.Process.as_dict() — cpu_percent,
  memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status,
  create_time, cmdline. Short-circuits with alive:false when run
  has ended, has no worker_pid, the pid is gone, or psutil is
  unavailable. AccessDenied surfaces as alive:true with error
  rather than a 500.

11 new tests in tests/plugins/test_kanban_worker_runs.py cover the
empty-board case, running-task case, ended-run filtering,
missing-pid filtering, 404 paths, already-ended inspect, no-pid
inspect, dead-pid inspect, and live-pid inspect (psutil mocked).
All pass.

Companion termination endpoint (POST /runs/{run_id}/terminate) is
intentionally out of scope here — opening a separate issue first
since the RBAC and dispatcher-mediated soft-cancel design needs
maintainer input before code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng (NousResearch#26744)

- Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the
  enriched 409 detail names the blocking parent (id, quoted title, and
  current status), so the dashboard always has something actionable to
  render.
- New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline``
  pins the frontend wiring: the ``parseApiErrorMessage`` helper exists,
  the drag/drop banner runs through it, and the drawer maintains a
  visible ``patchErr`` state that's cleared between PATCHes and tasks.
…usResearch#27941)

Update the Codex app-server runtime guide's Kanban section to reflect
the new behaviour:

  * The sandbox override now adds the board DB directory plus every
    Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT,
    HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated,
    DB-dir first.
  * The motivation note now includes the cross-mount artifact-write
    scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate
    drive) and links to issue NousResearch#27941 so readers can find the original
    bug report.
Salvages substantive part of NousResearch#26490 by @aqilaziz. Detects corrupt board
DBs ("file is not a database" / "database disk image is malformed")
and disables them by fingerprint until they're repaired, instead of
flooding the gateway log with repeated logger.exception tracebacks every
tick.

Cherry-picked the substantive commit (ea5b4ec); the tip commit was
an unrelated _is_dir OSError fix for service-path lookup. Dropped a
small test reformat that was bundled in the same commit.
Salvages NousResearch#28199 by @bensargotest-sys. Aligns Kanban docs with current
tool registration: dispatcher-spawned task workers get task tools,
profiles that explicitly enable the kanban toolset get orchestrator
routing tools (kanban_list, kanban_unblock). Corrects failure-limit
text to current default of 2. Hardens the e2e subprocess script to
resolve repo root and use the spawnable default assignee. Updates the
diagnostics severity fixture to assert error below the critical
threshold.
Salvages NousResearch#26897 by @loicnico96. The per-task model_override DB column
already exists on main, but it wasn't exposed in user-facing surfaces.
This adds:
- 'kanban show' prints 'model: <name>' when model_override is set
- kanban_show / kanban_list tool responses include the model_override field

Original branch was stale (PR was authored against an older field name
'model'); applied the substantive surface exposure manually using the
current 'model_override' field name.
Salvages NousResearch#26791 by @Niraven. Adds 'hermes kanban swarm' to create a
durable Kanban Swarm v1 graph: a completed root/blackboard card,
parallel worker cards, a verifier gated on all workers, and a
synthesizer gated on the verifier. Stores shared swarm blackboard
updates as structured JSON comments on the root card.

Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring +
unit tests.
Salvages NousResearch#27598 by @nnnet. Adds optional 'board' parameter to all 9
kanban_* MCP tools via shared _connect helper. Backwards compatible —
omitting board keeps current pinned-board behavior. Useful for
orchestrator profiles that route across multiple boards.

Two-file scope: tools/kanban_tools.py + tests.
Salvages NousResearch#23208 by @awizemann. Tracks which chat session created a
kanban task so clients can render a per-session board without falling
back to tenant + time-window heuristics.

- Schema: tasks gains nullable session_id TEXT column with index
  (additive migration in _migrate_add_optional_columns).
- ACP: server.py exposes the originating session id via HERMES_SESSION_ID
  with save/restore around the agent loop.
- Tool: kanban_create reads HERMES_SESSION_ID (with explicit override).
- CLI: 'hermes kanban list --session <id>' filter; JSON output exposes
  session_id.
…olumn

Salvages NousResearch#23772 by @thewillhuang. Adds 'review' as a valid kanban task
status and extends dispatch_once to monitor the review column as a
second dispatch source (in addition to the existing ready column).

- Adds 'review' to VALID_STATUSES
- Adds claim_review_task() — atomically transitions review → running
- Adds has_spawnable_review() — health telemetry mirror
- Extends dispatch_once with a review column dispatch loop
- Review agents get 'sdlc-review' skill auto-loaded

Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state,
test file additions). Adapted claim_review_task to main's
ttl_seconds: Optional[int] = None convention (matches claim_task).
Salvages NousResearch#23790 by @thewillhuang. Adds detect_stale_running() to
the dispatcher cycle. Running tasks that have been started for longer
than dispatch_stale_timeout_seconds (default 14400 = 4h) without a
heartbeat in the last hour are auto-reclaimed to ready.

- New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables)
- New 'stale' field on DispatchResult
- detect_stale_running() in kanban_db.py with heartbeat freshness check
- Records outcome='stale' on run close + 'stale' event; ticks failure counter
- Wires config through gateway embedded dispatcher
- Updates _cmd_dispatch verbose/JSON output and daemon logging

Resolved test-file end-of-file conflict by appending both halves.
Salvages NousResearch#26745 by @nehaaprasaad. Exposes filtering for the existing
workflow_template_id and current_step_key columns:

- list_tasks() accepts workflow_template_id and current_step_key kwargs
- 'hermes kanban list' adds matching CLI flags
- dashboard plugin_api also exposes the filters

Resolved a small conflict in list_tasks signature alongside main's
session_id and order_by additions; combined all three into the single
filter list.
Salvages NousResearch#27484 by @fardoche6. Adds a respawn guard that skips worker
spawn for tasks where:
- a recent run already succeeded (recent_success — within guard window)
- the previous run hit a quota/auth error (blocker_auth, also auto-blocks)
- a recent task comment includes a GitHub PR URL (active_pr)

The guard prevents repeat worker storms on the same bug/task. Includes
the contributor's review-findings fixup (regex hardening, observability,
auth coverage).

Resolved a small DispatchResult conflict alongside main's 'stale' field;
kept both. Authorship preserved via rebase merge.
Salvages NousResearch#27568 by @SerenityTn. Dashboard cron page now lists cron
jobs from all profiles, with profile-aware filter UI and storage
routing. Includes test coverage for cross-profile listing, mutation,
deletion, and validation.

Also fixes orphan conflict markers in config.py left by an earlier
salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested
in HEAD/PR markers from NousResearch#28452 salvage of NousResearch#23790).
…rch#28458)

PR NousResearch#28452 (salvage of NousResearch#23790, stale detection) merged with leftover
git conflict markers in hermes_cli/config.py around the
`dispatch_stale_timeout_seconds` config block, breaking config import
and any code path that loads it. Cleans up the markers and keeps both
config blocks (worker log rotation/orchestrator + stale detection).

Resolves a self-introduced regression.
…rch#28459)

PR NousResearch#28454 (salvage of NousResearch#26745, workflow filter) merged with leftover
git conflict markers in hermes_cli/kanban.py at three sites:
- _task_to_dict() (session_id alongside workflow_template_id/current_step_key)
- p_list parser (--sort alongside --workflow-template-id/--step-key)
- _cmd_list (order_by alongside the new filter kwargs)

Cleans up the markers and keeps both halves at each site.

Resolves a self-introduced regression.
Julientalbot and others added 27 commits May 20, 2026 09:18
state.db now stores every message field the JSON snapshot stored. Removed
the method, all 7 call-sites, and ~13 test stubs that suppressed its file I/O.
Body is in git history if it ever needs to come back.
The attribute no longer exists; nothing to re-point.
Only caller was the removed _save_session_log. Also removes the unused
convert_scratchpad_to_think and has_incomplete_scratchpad imports from
run_agent.py (both still used elsewhere via their own imports).
…tespace

Adds TestNoSessionJsonSnapshot to lock the contract that session_log_file
attribute, _save_session_log method, and the per-session JSON snapshot
writer are gone. logs_dir is retained for request_dump_*.json.

Also cleans up stray trailing whitespace in test_run_agent_codex_responses
introduced when the _save_session_log stub line was deleted.
The email "jonny@nousresearch.com" belongs to @yoniebans (GitHub id
5584832, display name "jonny"), not to Jeffrey Quesnelle (@jquesnelle,
id 687076, who commits as emozilla@nousresearch.com).  Verified across
all 60 historical commits on the repo authored from this email — every
one of them was a yoniebans commit being mis-credited to jquesnelle in
the changelog.

Surfaced while salvaging PR NousResearch#29182 (yoniebans's session-log refactor).
PR NousResearch#29182 deleted the per-session JSON snapshot writer outright because
state.db is canonical and the snapshots had no in-tree consumer.  Some
users have external tooling that reads `~/.hermes/sessions/session_{sid}.json`
directly, so reintroduce the writer behind a config flag that defaults
to off.

- Add `sessions.write_json_snapshots` (default False) to DEFAULT_CONFIG
- Restore `AIAgent._save_session_log` + `_clean_session_content` as
  gated methods.  When the flag is off the call is a fast no-op; when
  on, the writer behaves as before (atomic write, truncation guard
  preserved, REASONING_SCRATCHPAD → think tag normalization)
- Re-derive the target path from `agent.session_id` on each call so
  `/branch` and `/compress` re-points happen automatically — no need
  to restore the explicit re-point bookkeeping at call sites
- Wire the single call site in `_persist_session` (the cleanup-on-exit
  hook).  Did NOT restore the 7 intra-turn calls the original PR deleted
  — those were redundant writes within the same turn that doubled disk
  I/O without adding any persistence guarantee `_persist_session` does
  not already provide
- Read the flag once at agent init via `load_config()`, cache as
  `agent._session_json_enabled`
- Update `TestNoSessionJsonSnapshot` → `TestSessionJsonSnapshotOptIn`
  to pin behavior: default off (no file), opt-in true (file written),
  no-op method on default agents, logs_dir retained unconditionally
- Update CONTRIBUTING.md and the bundled `hermes-agent` skill to
  document the flag and its default
…ousResearch#29426)

`splitReasoning()` strips paired `<think>…</think>` blocks first, then runs
an unclosed-trailing regex to catch reasoning that hasn't yet streamed its
closer. That second regex was unanchored and greedy:

    new RegExp(`<${tag}>([\\s\\S]*)$`, 'i')

So any literal `<think>` somewhere in prose — a model quoting the tag, a
code example, or a stream-mid-tag before the closer arrives — consumed
every paragraph after it to EOF. User-visible symptom: "TUI eats last
paragraph of output," both during streaming and on settled turns.

Real reasoning streams always lead the message (that's the only place an
unclosed opener can legitimately appear during streaming). Anchor the
regex to `^\s*` so mid-prose mentions of the tag are preserved.

Empirical repro before the fix:

    splitReasoning('final answer paragraph one.\n\n<think>internal note\n\nfinal answer paragraph two.')
    → text: 'final answer paragraph one.'        ← paragraph two GONE

After:

    → text: 'final answer paragraph one.\n\n<think>internal note\n\nfinal answer paragraph two.'

Updated the existing trailing-unclosed test to lead with `<think>` (the
real-world shape) and added a regression test pinning the mid-text case.

ui-tui type-check clean, 808/808 vitest pass.
NousResearch#28975)

Bumps [ws](https://github.com/websockets/ws) from 8.20.0 to 8.20.1.
- [Release notes](https://github.com/websockets/ws/releases)
- [Commits](websockets/ws@8.20.0...8.20.1)

---
updated-dependencies:
- dependency-name: ws
  dependency-version: 8.20.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…h#28889)

Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.5.6 to 7.6.0.
- [Release notes](https://github.com/protobufjs/protobuf.js/releases)
- [Changelog](https://github.com/protobufjs/protobuf.js/blob/protobufjs-v7.6.0/CHANGELOG.md)
- [Commits](protobufjs/protobuf.js@protobufjs-v7.5.6...protobufjs-v7.6.0)

---
updated-dependencies:
- dependency-name: protobufjs
  dependency-version: 7.6.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [idna](https://github.com/kjd/idna) from 3.11 to 3.15.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.md)
- [Commits](kjd/idna@v3.11...v3.15)

---
updated-dependencies:
- dependency-name: idna
  dependency-version: '3.15'
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ault (NousResearch#29021)

* fix(deps): bump pydantic to 2.13.4 to avoid pydantic-core thread segfault

pydantic-core 2.41.5 (pulled by pydantic==2.12.5) segfaults when the
OpenAI SDK's Responses API resource (client.responses.create /
client.responses.stream) is exercised from a non-main threading.Thread.

Hermes always dispatches codex_responses calls from a daemon thread in
agent/chat_completion_helpers.py:_call, so the crash is 100%
reproducible whenever the active provider is xai-oauth or openai-codex.
Symptom: `hermes -z "ping"` (or any oneshot path) dies with SIGSEGV /
exit 139 and zero output — hermes_cli/oneshot.py redirects stderr to
/dev/null, hiding the crash.

Bumping pydantic to 2.13.4 pulls in pydantic-core 2.46.4, which
eliminates the crash. Verified end-to-end: `hermes -z "ping"` against
xai-oauth/grok-4.3 now returns the expected response.

Minimal repro (any OpenAI base_url; not xAI-specific):

    import threading
    from openai import OpenAI
    cli = OpenAI(api_key="sk-bogus", base_url="https://api.openai.com/v1")
    def go():
        try: cli.responses.create(model="gpt-4o", input="ping")
        except BaseException as e: print(type(e).__name__)
    threading.Thread(target=go).start()
    # → SIGSEGV with pydantic-core 2.41.5; clean 401 with 2.46.4

* chore(deps): regenerate uv.lock for pydantic 2.13.4 bump
state.db is canonical. The 'use whichever source is longer' branch was
defensive code for the pre-DB migration; on every real DB it has not
fired (verified on a session corpus with 27 jsonl files / 950 sessions —
zero jsonl-bigger cases).

Test changes:
- TestLoadTranscriptCorruptLines: deleted (tested dead JSONL code path)
- TestLoadTranscriptPreferLongerSource: deleted (tested removed fallback)
- Replaced with TestLoadTranscriptDBOnly (DB-only reads)
- TestSessionStoreRewriteTranscript: fixture now creates DB session
- test_gateway_retry_replaces_last_user_turn: fixture uses real DB
Yuanbao's recall feature was reading the gateway JSONL directly to look up
messages by platform message_id, which state.db does not preserve. Migrated
to use load_transcript() which returns DB messages.

Recall branch A1 (message_id match) now falls through to A2 (content match)
or B (system note) for all sessions — a documented degradation. Follow-up
issue: add platform_message_id column to state.db messages to restore
exact-id matching.
…te_transcript

state.db is canonical. JSONL transcripts were a transition fallback;
the fallback was removed in the previous commit. Existing *.jsonl files
on disk are left untouched.
Mirror messages are persisted via _append_to_sqlite. JSONL writer was
a redundant dual-write. Updated test assertions from JSONL file checks
to SQLite mock verification.
…db writes

Fixtures that instantiate SessionStore() trigger SessionDB() with no args,
which resolves to ~/.hermes/state.db via the DEFAULT_DB_PATH module constant
(snapshot of get_hermes_home() at hermes_state import time).

The autouse _hermetic_environment fixture in tests/conftest.py monkeypatches
HERMES_HOME env, but DEFAULT_DB_PATH is already cached by then. Per-test
monkeypatch.setattr(hermes_state, 'DEFAULT_DB_PATH', tmp_path/'state.db')
forces the DB into tmp_path so the tests can't leak into the real profile.

Verified by counting u1-prefixed sessions in real state.db before/after:
delta=0.
…fixture

PR NousResearch#29211 review findings:

1. test_retry_replacement: pin DEFAULT_DB_PATH so SessionDB() doesn't write
   to the real ~/.hermes/state.db. Same fix as the other DB-only fixtures.

2. yuanbao recall branch A1 (message_id exact match) was structurally dead
   once load_transcript() became DB-only — state.db never preserves the
   platform message_id. Removed the dead loop, consolidated to a single
   content-match branch (renamed 'A: content match'). Branch B (system
   note) unchanged. Updated the test name + docstring to reflect this.

Note: self._lock is no longer taken in append_to_transcript (was guarding
the JSONL file append). SQLite append_message handles its own concurrency
via WAL mode, so this is safe; flagging for awareness.
… recall

PR NousResearch#29211 dropped JSONL gateway transcripts and noted that the platform's
own `message_id` field (used by Yuanbao's recall guard to redact a
message by exact platform id) was no longer preserved — falling back to
content-match.  That fallback works for the common case but redacts the
wrong row when two messages share text (or fails to match when content
is post-processed).

Restore exact-id matching by giving state.db a column for it:

- New `platform_message_id TEXT` column on the messages table
  (SCHEMA_VERSION bump 11 → 12; column added via declarative reconciler
  on existing DBs, no version-gated migration block needed)
- Partial index `idx_messages_platform_msg_id` on
  (session_id, platform_message_id) to keep recall's point-lookup cheap
  even on large sessions
- `append_message()` and `replace_messages()` accept the new value:
  the gateway-facing `append_to_transcript` in `gateway/session.py`
  forwards either `message["platform_message_id"]` or the legacy
  `message["message_id"]` key (yuanbao's existing convention)
- `get_messages_as_conversation()` surfaces the column back on the
  message dict as `message_id` so platform code reads the same shape
  it used to read from JSONL
- Yuanbao `_patch_transcript`: restore branch A1 (exact id match)
  ahead of A2 (content match) ahead of B (system-note).  Both branches
  log which one fired so operators can tell from gateway.log whether
  recall hit the canonical path or had to fall back.

Tests:
- New low-level round-trip tests in `test_hermes_state.py` for both
  `append_message` and `replace_messages` paths
- The PR's `test_yuanbao_recall_db_only.py` was rewritten to assert
  the new contract: branch A1 (id match) works against DB-only
  transcripts, and branch A2 (content match) still recovers rows that
  were observed without a platform id (e.g. agent-processed @bot
  messages where run.py doesn't carry msg_id through)
…ecture decisions

Add Quick Reference: Common Tasks table covering run, test, add tool/skill/toolset/config, MCP, gateway, cron, kanban, delegate, skill curator, models, auth.

Add Architecture Decision Quick Links table pointing to key sections by approximate line number.
@github-actions

Copy link
Copy Markdown

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py
setup.py
skills/productivity/google-workspace/scripts/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

@github-actions

Copy link
Copy Markdown

🔎 Lint report: contrib/hermes-agent/agents-clarity vs origin/main

ruff

Total: 0 on HEAD, 1121 on base (✅ -1121)

🆕 New issues: none

✅ Fixed issues (672):

Rule Count
F401 446
E402 73
F841 72
F541 39
F811 15
invalid-syntax 10
E741 7
E401 4
F821 3
E731 2
E702 1
First entries
../../../../../tmp/lint-base/tools/vision_tools.py:456: [E402] Module level import not at top of file
../../../../../tmp/lint-base/tools/tts_tool.py:717: [E402] Module level import not at top of file
../../../../../tmp/lint-base/tools/clarify_tool.py:129: [E402] Module level import not at top of file
../../../../../tmp/lint-base/tests/cron/test_jobs.py:324: [F841] Local variable `job` is assigned to but never used
../../../../../tmp/lint-base/tools/terminal_tool.py:1200: [F401] `daytona.Daytona` imported but unused
../../../../../tmp/lint-base/tools/web_tools.py:1214: [F541] f-string without any placeholders
../../../../../tmp/lint-base/tools/rl_training_tool.py:1362: [E402] Module level import not at top of file
../../../../../tmp/lint-base/gateway/platforms/whatsapp.py:27: [F401] `typing.List` imported but unused
../../../../../tmp/lint-base/tests/conftest.py:7: [F401] `tempfile` imported but unused
../../../../../tmp/lint-base/tools/delegate_tool.py:747: [E402] Module level import not at top of file
../../../../../tmp/lint-base/tests/agent/test_auxiliary_client.py:19: [F401] `agent.auxiliary_client._resolve_auto` imported but unused
../../../../../tmp/lint-base/tests/gateway/test_slack.py:59: [E402] Module level import not at top of file
../../../../../tmp/lint-base/tests/gateway/test_config_cwd_bridge.py:14: [F401] `pytest` imported but unused
../../../../../tmp/lint-base/tests/test_provider_parity.py:8: [F401] `os` imported but unused
../../../../../tmp/lint-base/tests/gateway/test_email.py:23: [F401] `unittest.mock.AsyncMock` imported but unused
../../../../../tmp/lint-base/tools/terminal_tool.py:1221: [F541] f-string without any placeholders
../../../../../tmp/lint-base/tests/test_insights.py:717: [F841] Local variable `text` is assigned to but never used
../../../../../tmp/lint-base/mini_swe_runner.py:536: [F541] f-string without any placeholders
../../../../../tmp/lint-base/hermes_cli/auth.py:25: [F401] `subprocess` imported but unused
../../../../../tmp/lint-base/tools/web_tools.py:1220: [E402] Module level import not at top of file
../../../../../tmp/lint-base/tests/test_interactive_interrupt.py:12: [F401] `io` imported but unused
../../../../../tmp/lint-base/tests/tools/test_skill_view_traversal.py:9: [F401] `pathlib.Path` imported but unused
../../../../../tmp/lint-base/hermes_cli/auth.py:21: [F401] `shutil` imported but unused
../../../../../tmp/lint-base/tools/code_execution_tool.py:768: [E402] Module level import not at top of file
../../../../../tmp/lint-base/tests/hermes_cli/test_session_browse.py:398: [F401] `argparse` imported but unused
... and 647 more

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8982 on HEAD, 2034 on base (🆕 +6948)

🆕 New issues (3915):

Rule Count
unresolved-import 1146
unresolved-attribute 1098
invalid-argument-type 832
invalid-assignment 405
unsupported-operator 96
invalid-method-override 71
not-subscriptable 69
invalid-parameter-default 43
invalid-return-type 31
no-matching-overload 26
call-non-callable 24
unresolved-reference 21
unused-type-ignore-comment 19
invalid-type-form 12
not-iterable 6
+6 more rules
First entries
tests/gateway/test_google_chat.py:2633: [unresolved-attribute] unresolved-attribute: Class `Platform` has no attribute `GOOGLE_CHAT`
tests/test_tui_gateway_server.py:3430: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["result"]` and `dict[Unknown, Unknown] | None`
gateway/platforms/api_server.py:2912: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `bound method Top[dict[Unknown, Unknown]].__getitem__(key: Never, /) -> object` cannot be called with key of type `Literal["content"]` on object of type `Top[dict[Unknown, Unknown]]`
tests/hermes_cli/test_model_switch_variant_tags.py:11: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
gateway/platforms/dingtalk.py:1238: [unresolved-attribute] unresolved-attribute: Attribute `RobotRecallEmotionRequestTextEmotion` is not defined on `None` in union `Unknown | None`
gateway/platforms/_http_client_limits.py:34: [unresolved-import] unresolved-import: Cannot resolve imported module `httpx`
tests/gateway/test_restart_notification.py:144: [invalid-assignment] invalid-assignment: Object of type `MagicMock` is not assignable to attribute `request_restart` of type `def request_restart(self, *, detached: bool = False, via_service: bool = False) -> bool`
tests/hermes_cli/test_destructive_slash_confirm_gate.py:26: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> Unknown, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[Unknown]]` cannot be called with key of type `Literal["destructive_slash_confirm"]` on object of type `list[Unknown]`
hermes_cli/nous_subscription.py:86: [no-matching-overload] no-matching-overload: No overload of `dict.__init__` matches arguments
tests/gateway/test_run_progress_topics.py:665: [call-non-callable] call-non-callable: Object of type `None` is not callable
tests/gateway/test_feishu.py:1738: [invalid-assignment] invalid-assignment: Object of type `AsyncMock` is not assignable to attribute `send_image_file` of type `def send_image_file(self, chat_id: str, image_path: str, caption: str | None = None, reply_to: str | None = None, metadata: dict[str, Any] | None = None, **kwargs) -> CoroutineType[Any, Any, SendResult]`
tests/gateway/test_telegram_topic_mode.py:11: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
hermes_cli/plugins_cmd.py:731: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to `<module 'yaml'>`
tools/web_tools.py:49: [unresolved-import] unresolved-import: Cannot resolve imported module `httpx`
gateway/platforms/mattermost.py:352: [invalid-method-override] invalid-method-override: Invalid override of method `send_image_file`: Definition is incompatible with `BasePlatformAdapter.send_image_file`
tests/agent/test_compressor_historical_media.py:215: [invalid-argument-type] invalid-argument-type: Argument to function `_strip_historical_media` is incorrect: Expected `list[dict[str, Any]]`, found `list[str | dict[str, str | list[dict[str, str] | dict[str, str | dict[str, str]]]] | dict[str, str]]`
tests/cli/test_cli_preloaded_skills.py:8: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tools/send_message_tool.py:815: [unresolved-import] unresolved-import: Cannot resolve imported module `telegram.constants`
gateway/platforms/bluebubbles.py:541: [invalid-method-override] invalid-method-override: Invalid override of method `send_voice`: Definition is incompatible with `BasePlatformAdapter.send_voice`
hermes_cli/kanban_db.py:4898: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to `def profile_exists(name: str) -> bool`
tests/cli/test_fast_command.py:481: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["service_tier"]` on object of type `str`
tests/gateway/test_feishu.py:1891: [invalid-assignment] invalid-assignment: Object of type `AsyncMock` is not assignable to attribute `_dispatch_inbound_event` of type `def _dispatch_inbound_event(self, event: MessageEvent) -> CoroutineType[Any, Any, None]`
tests/cli/test_cli_init.py:51: [invalid-assignment] invalid-assignment: Object of type `ModuleType` is not assignable to `<module 'cli'>`
cli.py:7084: [invalid-argument-type] invalid-argument-type: Argument to function `switch_model` is incorrect: Expected `str`, found `Unknown | None`
tests/cli/test_cli_provider_resolution.py:72: [unresolved-attribute] unresolved-attribute: Unresolved attribute `FileHistory` on type `ModuleType`
... and 3890 more

✅ Fixed issues (440):

Rule Count
unresolved-import 123
invalid-argument-type 106
unresolved-attribute 74
invalid-assignment 69
invalid-syntax 10
invalid-return-type 9
not-subscriptable 8
unsupported-operator 8
invalid-parameter-default 7
no-matching-overload 7
not-iterable 6
unresolved-reference 4
unknown-argument 2
invalid-method-override 2
too-many-positional-arguments 2
+3 more rules
First entries
tools/browser_tool.py:590: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `dict[str, str]`, found `dict[str, str | dict[str, bool]]`
tests/test_setup_model_selection.py:6: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
rl_cli.py:359: [invalid-assignment] invalid-assignment: Object of type `str | None` is not assignable to `str`
environments/benchmarks/terminalbench_2/terminalbench2_env.py:516: [invalid-syntax] invalid-syntax: Expected `except` or `finally` after `try` block
environments/agent_loop.py:217: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["extra_body"]` and value of type `dict[str, Any] & ~AlwaysFalsy` on object of type `dict[str, list[dict[str, Any]] | int | float]`
tests/honcho_integration/test_async_memory.py:417: [invalid-assignment] invalid-assignment: Object of type `def flaky_flush(s) -> Unknown` is not assignable to attribute `_flush_session` of type `def _flush_session(self, session: HonchoSession) -> bool`
tests/test_resume_display.py:13: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
environments/hermes_swe_env/hermes_swe_env.py:45: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.base`
tools/image_generation_tool.py:213: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `dict[str, Any]`, found `None`
acp_adapter/server.py:286: [unresolved-attribute] unresolved-attribute: Function `terminal_tool` has no attribute `set_approval_callback`
tests/test_run_agent_codex_responses.py:5: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tools/rl_training_tool.py:1130: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> dict[str, str | int | float], (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[dict[str, str | int | float]]]` cannot be called with key of type `Literal["max_token_length"]` on object of type `list[dict[str, str | int | float]]`
trajectory_compressor.py:50: [unresolved-import] unresolved-import: Cannot resolve imported module `dotenv`
environments/tool_call_parsers/__init__.py:25: [unresolved-import] unresolved-import: Cannot resolve imported module `openai.types.chat.chat_completion_message_tool_call`
gateway/config.py:126: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["token"]` and value of type `str & ~AlwaysFalsy` on object of type `dict[str, bool | dict[str, Any]]`
cli.py:294: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["env_type"]` and value of type `str | int | list[Unknown] | ... omitted 4 union elements` on object of type `dict[str, int]`
tools/session_search_tool.py:282: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `list[str | Exception]`, found `list[str | None | BaseException]`
environments/tool_call_parsers/qwen3_coder_parser.py:24: [unresolved-import] unresolved-import: Cannot resolve imported module `openai.types.chat.chat_completion_message_tool_call`
acp_adapter/session.py:203: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `list[dict[str, Any]]`, found `str | list[str] | bool`
cli.py:1432: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `str`, found `None`
tests/test_agent_loop_vllm.py:80: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
tools/environments/local.py:327: [invalid-argument-type] invalid-argument-type: Argument to function `_sanitize_subprocess_env` is incorrect: Expected `dict[Unknown, Unknown] | None`, found `_Environ[str]`
tools/rl_training_tool.py:1132: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> dict[str, str | int | float], (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[dict[str, str | int | float]]]` cannot be called with key of type `Literal["max_batches_offpolicy"]` on object of type `list[dict[str, str | int | float]]`
run_agent.py:2402: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["summary"]` and value of type `list[Unknown]` on object of type `dict[str, str]`
tools/environments/singularity.py:263: [unresolved-attribute] unresolved-attribute: Attribute `close` is not defined on `None` in union `IO[str] | None`
... and 415 more

Unchanged: 825 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@Arvuno Arvuno merged commit cbad173 into main May 20, 2026
17 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.