fix: enforce context_tokens budget on Honcho peer representation by AzothZephyr · Pull Request #1878 · NousResearch/hermes-agent

AzothZephyr · 2026-03-18T06:34:17Z

What does this PR do?

Enforces the user's configured contextTokens budget on Honcho's peer_representation and peer_card fields, which were previously injected into the system prompt at full size regardless of the token limit.

The Honcho SDK's session.context(tokens=N) parameter only limits message history retrieval. The peer_representation and peer_card fields — containing Explicit Observations, Deductive Observations, Inductive Observations, and the structured peer card — are returned in full by the Honcho server regardless of the tokens value. This meant setting contextTokens: 50 in ~/.honcho/config.json had zero effect, and users would see 4000-5000+ tokens of Honcho context injected every turn.

The fix adds client-side truncation in _honcho_prefetch() after assembling the full context block, capping it to context_tokens * 4 characters. Ideally the Honcho API itself should respect the tokens param for all returned fields, but until that's addressed server-side this prevents runaway token usage.

Related Issue

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

run_agent.py: After assembling the Honcho context block in _honcho_prefetch(), read self._honcho._context_tokens and truncate the assembled string to token_budget * 4 characters if it exceeds the budget. Adds a [… truncated to fit token budget] marker when truncation occurs.
tests/test_run_agent.py: Three new unit tests:
- test_honcho_prefetch_truncates_to_token_budget — verifies large context is truncated at budget
- test_honcho_prefetch_no_truncation_within_budget — verifies small context passes through intact
- test_honcho_prefetch_no_truncation_when_no_budget — verifies None budget means no truncation

How to Test

Set contextTokens: 500 in ~/.honcho/config.json (under hosts.hermes)
Start a conversation with Honcho enabled ("enabled": true)
Before fix: Honcho context block is 4000-5000+ tokens regardless of the setting
After fix: Honcho context block is truncated to ~2000 chars (500 tokens × 4 chars/token)
Run the new tests: pytest tests/test_run_agent.py::TestHonchoPrefetchScheduling::test_honcho_prefetch_truncates_to_token_budget tests/test_run_agent.py::TestHonchoPrefetchScheduling::test_honcho_prefetch_no_truncation_within_budget tests/test_run_agent.py::TestHonchoPrefetchScheduling::test_honcho_prefetch_no_truncation_when_no_budget -v

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(agent): enforce context_tokens budget on Honcho peer representation)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS (Apple Silicon)

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Before fix (contextTokens: 50, completely ignored):

# Honcho Memory (persistent cross-session context)
## User representation
## Explicit Observations
[2026-03-18 05:52:52] azoth is a senior software architect...
[2026-03-18 05:52:52] azoth wants to change the 'enabled' setting...
... (17 Explicit Observations)

## Deductive Observations
... (8 Deductive Observations)

## Inductive Observations
... (17 Inductive Observations, ~500 words each)

Name: Hermes
Role: Autonomous AI agent...
... (full peer card)

Total: ~5000 tokens injected every turn

After fix (contextTokens: 500, properly enforced):

# Honcho Memory (persistent cross-session context)
## User representation
## Explicit Observations
[2026-03-18 05:52:52] azoth is a senior software architect...
[2026-03-18 05:52:52] azoth prefers dark mode in all IDEs...
... (truncated to ~500 tokens)

[… truncated to fit token budget]

The Honcho SDK's context() tokens parameter only limits message history retrieval, not the peer_representation and peer_card fields which grow unbounded as Honcho accumulates observations about the user and AI peer. This meant setting contextTokens: 50 in ~/.honcho/config.json had no effect on the massive peer representation blocks (Explicit Observations, Deductive Observations, Inductive Observations, peer card) that were injected into every system prompt — often 4000-5000+ tokens. Fix: after assembling the Honcho context block in _honcho_prefetch(), truncate the total output to fit within the configured context_tokens budget (estimated at 4 chars per token). This ensures the user's configured budget is respected regardless of how much data Honcho's server returns. Ideally the Honcho API itself should respect the tokens param for all returned fields, but until then this client-side enforcement prevents runaway token usage.

The Honcho SDK's context() tokens parameter only limits message history retrieval, not the peer_representation and peer_card fields which grow unbounded as Honcho accumulates observations about the user and AI peer. This meant setting contextTokens in ~/.honcho/config.json had no effect on the massive peer representation blocks (Explicit Observations, Deductive Observations, Inductive Observations, peer card) injected into every system prompt — often 4000-5000+ tokens. Fix: after assembling the Honcho context block in _honcho_prefetch(), truncate the total output to fit within the configured context_tokens budget (estimated at 4 chars per token). Adds three unit tests covering truncation, within-budget passthrough, and no-budget (None) passthrough.

Original work from PR NousResearch#1878.

nidhishgajjar · 2026-04-15T17:06:12Z

Orb Code Review (powered by GLM 5.1 on Orb Cloud)

Summary

Enforces the context_tokens budget on Honcho's peer representation (peer_card + representation + dialectic), which the Honcho API's tokens parameter does not limit. Without this fix, the assembled Honcho block could grow unbounded and consume the entire LLM context window. The fix uses a character-based truncation (1 token ≈ 4 chars) after assembling the block.

Architecture

Cleanly placed at the return point of _honcho_prefetch(), after all parts are assembled. Reads _context_tokens from the existing HonchoSessionManager configuration (set at plugins/memory/honcho/session.py:92). The hasattr guard ensures backward compatibility when _honcho exists but _context_tokens is not set.

Issues

Warning — Character-based truncation can cut mid-UTF8 character or mid-sentence:

assembled = assembled[:char_budget] + "\n\n[… truncated to fit token budget]"

assembled[:char_budget] operates on Python string code points, so this won't split a UTF-8 byte sequence (Python strings are Unicode). However, it can cut mid-word or mid-sentence, which degrades the quality of the context fed to the LLM. Consider truncating at the last paragraph or sentence boundary:

if len(assembled) > char_budget:
    cut = assembled[:char_budget].rfind('\n\n')
    if cut > char_budget // 2:
        assembled = assembled[:cut]
    else:
        assembled = assembled[:char_budget]
    assembled += '\n\n[… truncated to fit token budget]'

This is a suggestion, not a blocker — the current approach works, just produces slightly rougher truncation boundaries.

Suggestion — The 1 token ≈ 4 chars heuristic is reasonable for English text but underestimates token count for code-heavy or multilingual content (where tokens can be 2-3 chars each). If the Honcho client has access to a tokenizer (e.g., tiktoken), using it would be more accurate. For now, the 4x multiplier provides generous headroom.

Suggestion — The private attribute access self._honcho._context_tokens works but couples to implementation details. A public property on HonchoSessionManager (e.g., context_tokens) would be cleaner. Not blocking.

Cross-file impact

None — _context_tokens is already set in plugins/memory/honcho/session.py and used in session.py:225 for session.context(tokens=...). This PR adds a second consumer in run_agent.py.

Assessment

approve ✅ — Straightforward fix for an unbounded-growth issue. Good test coverage (truncation, no-truncation-within-budget, no-truncation-when-no-budget). The character heuristic is reasonable for a first pass.

nidhishgajjar · 2026-04-15T17:06:38Z

Orb Code Review (powered by GLM 5.1 on Orb Cloud)

Summary

Enforces the context_tokens budget on Honcho's peer representation (peer_card + representation + dialectic), which the Honcho API's tokens parameter does not limit. Without this fix, the assembled Honcho block could grow unbounded and consume the entire LLM context window. The fix uses a character-based truncation (1 token ≈ 4 chars) after assembling the block.

Architecture

Cleanly placed at the return point of _honcho_prefetch(), after all parts are assembled. Reads _context_tokens from the existing HonchoSessionManager configuration (set at plugins/memory/honcho/session.py:92). The hasattr guard ensures backward compatibility when _honcho exists but _context_tokens is not set.

Issues

Warning — Character-based truncation can cut mid-UTF8 character or mid-sentence:

assembled = assembled[:char_budget] + "\n\n[… truncated to fit token budget]"

assembled[:char_budget] operates on Python string code points, so this won't split a UTF-8 byte sequence (Python strings are Unicode). However, it can cut mid-word or mid-sentence, which degrades the quality of the context fed to the LLM. Consider truncating at the last paragraph or sentence boundary:

if len(assembled) > char_budget:
    cut = assembled[:char_budget].rfind('\n\n')
    if cut > char_budget // 2:
        assembled = assembled[:cut]
    else:
        assembled = assembled[:char_budget]
    assembled += '\n\n[… truncated to fit token budget]'

This is a suggestion, not a blocker — the current approach works, just produces slightly rougher truncation boundaries.

Suggestion — The 1 token ≈ 4 chars heuristic is reasonable for English text but underestimates token count for code-heavy or multilingual content (where tokens can be 2-3 chars each). If the Honcho client has access to a tokenizer (e.g., tiktoken), using it would be more accurate. For now, the 4x multiplier provides generous headroom.

Suggestion — The private attribute access self._honcho._context_tokens works but couples to implementation details. A public property on HonchoSessionManager (e.g., context_tokens) would be cleaner. Not blocking.

Cross-file impact

None — _context_tokens is already set in plugins/memory/honcho/session.py and used in session.py:225 for session.context(tokens=...). This PR adds a second consumer in run_agent.py.

Assessment

approve ✅ — Straightforward fix for an unbounded-growth issue. Good test coverage (truncation, no-truncation-within-budget, no-truncation-when-no-budget). The character heuristic is reasonable for a first pass.

alt-glitch · 2026-05-03T04:21:42Z

Likely duplicate of #3265 (closed) which enforced contextTokens budget on Honcho prefetch. If that fix was merged, this may already be resolved.

alt-glitch · 2026-05-03T04:22:22Z

Likely duplicate of #3265 (closed) which enforced contextTokens budget on Honcho prefetch.

teknium1 · 2026-05-11T04:57:57Z

Already fixed on main. The Honcho integration was extracted into a plugin (PR #5295) and the same context_tokens * 4 char truncation now lives in plugins/memory/honcho/__init__.py _truncate_to_budget() (commit 9e0fc62 by @erosika via PR #3265). Your diagnosis was correct and submitted ~2 weeks earlier — credit to you both for spotting it. Closing as superseded by the plugin port.

Rafa-Ross added 2 commits March 18, 2026 05:49

This was referenced Mar 26, 2026

fix(honcho): enforce contextTokens budget on prefetched Honcho context #3265

Closed

Honcho PR map: open integration work across community #3276

Closed

erosika pushed a commit to erosika/hermes-agent that referenced this pull request Mar 27, 2026

fix(agent): enforce context_tokens budget on Honcho peer representation

ab00f0c

Original work from PR NousResearch#1878.

erosika mentioned this pull request Apr 24, 2026

fix(honcho): bug-fix consolidation — 7 upstream PRs, preserved authorship #15381

Merged

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins comp/agent Core agent loop, run_agent.py, prompt builder labels May 3, 2026

teknium1 closed this May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enforce context_tokens budget on Honcho peer representation#1878

fix: enforce context_tokens budget on Honcho peer representation#1878
AzothZephyr wants to merge 2 commits into
NousResearch:mainfrom
AzothZephyr:fix/honcho-context-token-limit

AzothZephyr commented Mar 18, 2026

Uh oh!

nidhishgajjar commented Apr 15, 2026

Uh oh!

nidhishgajjar commented Apr 15, 2026

Uh oh!

alt-glitch commented May 3, 2026

Uh oh!

alt-glitch commented May 3, 2026

Uh oh!

teknium1 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

AzothZephyr commented Mar 18, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

nidhishgajjar commented Apr 15, 2026

Summary

Architecture

Issues

Cross-file impact

Assessment

Uh oh!

nidhishgajjar commented Apr 15, 2026

Summary

Architecture

Issues

Cross-file impact

Assessment

Uh oh!

alt-glitch commented May 3, 2026

Uh oh!

alt-glitch commented May 3, 2026

Uh oh!

teknium1 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants