Skip to content

Add skills.loading: lazy config option (#2045)#13980

Open
AllenChen-Xingan wants to merge 1 commit into
NousResearch:mainfrom
AllenChen-Xingan:lazy-skill-loading
Open

Add skills.loading: lazy config option (#2045)#13980
AllenChen-Xingan wants to merge 1 commit into
NousResearch:mainfrom
AllenChen-Xingan:lazy-skill-loading

Conversation

@AllenChen-Xingan

Copy link
Copy Markdown

Fixes #2045.

Summary

Adds a skills.loading config option. When set to lazy, the <available_skills> block is replaced with a one-sentence hint that tells the agent to call skills_list() on demand. Default remains eager (current behavior) for backward compatibility.

Why

With N installed skills, the <available_skills> block costs ~30-50 tokens per skill on every turn — regardless of whether any skill will actually be used. On a 72-skill installation this is ~4,800 tokens per API call, re-sent every message. The cost scales linearly with skill count and becomes a hard ceiling on how many skills a user can reasonably install, especially for messaging-platform users (Telegram / Discord) where prompt caching is unavailable and every token is billed.

Issue #2045 proposed lazy loading: since skill_view() + skills_list() already exist, the index itself can be made on-demand. This PR implements that.

Measured impact (72 skills installed)

Mode Skill-index tokens / turn
eager (default) ~4,800
lazy ~120
Savings ~97%

Trade-off: the first domain-specific turn in a session spends one extra tool call (skills_list) before the agent picks a skill. Subsequent turns are cached. For chat-heavy workloads (personal assistants, messaging bots) the saving dominates; for one-shot CLI sessions where every installed skill could plausibly be needed, eager stays preferable — which is why it remains the default.

Changes

  • hermes_cli/config.py — new skills.loading key, default "eager".
  • agent/prompt_builder.py — add _is_skills_lazy() helper; when lazy, build_skills_system_prompt() returns a one-sentence hint instead of the full index.
  • website/docs/user-guide/features/skills.md — document the new option with token impact and trade-offs.

No existing behavior changes; the feature is opt-in.

Backward compatibility

  • Default is eager — existing installations behave exactly as before.
  • skills_list() and skill_view() already exist; no new tools added.
  • LRU prompt cache still works — lazy mode has a single cache key.

Test plan

  • Syntax check on modified Python files.
  • Manual end-to-end on a 72-skill install: system prompt size confirmed to drop from ~4,800 to ~120 tokens.
  • Agent correctly invokes skills_list() when asked domain-specific questions; correctly skips when asked casual/emotional questions.
  • Add unit test under tests/agent/test_prompt_builder.py to verify both modes — happy to add if maintainers prefer.

Future work

Not in this PR, but natural follow-ups:

  • Auto-switch to lazy when installed-skill count crosses a threshold (e.g. > 30).
  • platform_loading map so Telegram could be lazy while CLI stays eager for the same user.
  • A skills_search(query, k=5) tool that does semantic retrieval over skill descriptions — would further reduce first-turn latency in lazy mode.

Happy to iterate on wording, config key naming, or any of the above.

Implements the lazy-loading proposal from issue NousResearch#2045. When
`skills.loading: lazy` is set in config.yaml, the full
`<available_skills>` block is replaced with a one-sentence hint that
instructs the agent to call `skills_list()` on demand. This cuts
thousands of tokens per turn for users with 50+ skills installed while
keeping all skills discoverable.

Eager loading remains the default for backward compatibility.

Measured on a 72-skill install:
  eager: ~4,800 tokens per turn
  lazy:  ~120 tokens per turn (-97%)

Trade-off: the first domain-specific turn in a session spends one extra
tool call (`skills_list`) before the agent picks a skill; subsequent
turns are cached. For chat-heavy workloads (personal assistants,
messaging bots) the saving is substantial; for one-shot CLI sessions
where every installed skill could plausibly be needed, `eager` is
still preferable.

Changes:
  - hermes_cli/config.py: add `skills.loading` default ("eager")
  - agent/prompt_builder.py: `_is_skills_lazy()` helper +
    branch in `build_skills_system_prompt()`
  - website/docs/user-guide/features/skills.md: document the option
    with token impact + trade-offs

Refs: NousResearch#2045
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard tool/skills Skills system (list, view, manage) labels Apr 22, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #12379 — same feature (lazy skill loading via config option), same approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard P3 Low — cosmetic, nice to have tool/skills Skills system (list, view, manage) type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Lazy skill loading: remove skill listing from system prompt, use on-demand tool instead

2 participants