Skip to content

fix: enforce context_tokens budget on Honcho peer representation#1878

Closed
AzothZephyr wants to merge 2 commits into
NousResearch:mainfrom
AzothZephyr:fix/honcho-context-token-limit
Closed

fix: enforce context_tokens budget on Honcho peer representation#1878
AzothZephyr wants to merge 2 commits into
NousResearch:mainfrom
AzothZephyr:fix/honcho-context-token-limit

Conversation

@AzothZephyr

Copy link
Copy Markdown

What does this PR do?

Enforces the user's configured contextTokens budget on Honcho's peer_representation and peer_card fields, which were previously injected into the system prompt at full size regardless of the token limit.

The Honcho SDK's session.context(tokens=N) parameter only limits message history retrieval. The peer_representation and peer_card fields — containing Explicit Observations, Deductive Observations, Inductive Observations, and the structured peer card — are returned in full by the Honcho server regardless of the tokens value. This meant setting contextTokens: 50 in ~/.honcho/config.json had zero effect, and users would see 4000-5000+ tokens of Honcho context injected every turn.

The fix adds client-side truncation in _honcho_prefetch() after assembling the full context block, capping it to context_tokens * 4 characters. Ideally the Honcho API itself should respect the tokens param for all returned fields, but until that's addressed server-side this prevents runaway token usage.

Related Issue

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • run_agent.py: After assembling the Honcho context block in _honcho_prefetch(), read self._honcho._context_tokens and truncate the assembled string to token_budget * 4 characters if it exceeds the budget. Adds a [… truncated to fit token budget] marker when truncation occurs.
  • tests/test_run_agent.py: Three new unit tests:
    • test_honcho_prefetch_truncates_to_token_budget — verifies large context is truncated at budget
    • test_honcho_prefetch_no_truncation_within_budget — verifies small context passes through intact
    • test_honcho_prefetch_no_truncation_when_no_budget — verifies None budget means no truncation

How to Test

  1. Set contextTokens: 500 in ~/.honcho/config.json (under hosts.hermes)
  2. Start a conversation with Honcho enabled ("enabled": true)
  3. Before fix: Honcho context block is 4000-5000+ tokens regardless of the setting
  4. After fix: Honcho context block is truncated to ~2000 chars (500 tokens × 4 chars/token)
  5. Run the new tests: pytest tests/test_run_agent.py::TestHonchoPrefetchScheduling::test_honcho_prefetch_truncates_to_token_budget tests/test_run_agent.py::TestHonchoPrefetchScheduling::test_honcho_prefetch_no_truncation_within_budget tests/test_run_agent.py::TestHonchoPrefetchScheduling::test_honcho_prefetch_no_truncation_when_no_budget -v

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(agent): enforce context_tokens budget on Honcho peer representation)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS (Apple Silicon)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Before fix (contextTokens: 50, completely ignored):

# Honcho Memory (persistent cross-session context)
## User representation
## Explicit Observations
[2026-03-18 05:52:52] azoth is a senior software architect...
[2026-03-18 05:52:52] azoth wants to change the 'enabled' setting...
... (17 Explicit Observations)

## Deductive Observations
... (8 Deductive Observations)

## Inductive Observations
... (17 Inductive Observations, ~500 words each)

Name: Hermes
Role: Autonomous AI agent...
... (full peer card)

Total: ~5000 tokens injected every turn

After fix (contextTokens: 500, properly enforced):

# Honcho Memory (persistent cross-session context)
## User representation
## Explicit Observations
[2026-03-18 05:52:52] azoth is a senior software architect...
[2026-03-18 05:52:52] azoth prefers dark mode in all IDEs...
... (truncated to ~500 tokens)

[… truncated to fit token budget]

The Honcho SDK's context() tokens parameter only limits message history
retrieval, not the peer_representation and peer_card fields which grow
unbounded as Honcho accumulates observations about the user and AI peer.

This meant setting contextTokens: 50 in ~/.honcho/config.json had no
effect on the massive peer representation blocks (Explicit Observations,
Deductive Observations, Inductive Observations, peer card) that were
injected into every system prompt — often 4000-5000+ tokens.

Fix: after assembling the Honcho context block in _honcho_prefetch(),
truncate the total output to fit within the configured context_tokens
budget (estimated at 4 chars per token). This ensures the user's
configured budget is respected regardless of how much data Honcho's
server returns.

Ideally the Honcho API itself should respect the tokens param for all
returned fields, but until then this client-side enforcement prevents
runaway token usage.
The Honcho SDK's context() tokens parameter only limits message history
retrieval, not the peer_representation and peer_card fields which grow
unbounded as Honcho accumulates observations about the user and AI peer.

This meant setting contextTokens in ~/.honcho/config.json had no effect
on the massive peer representation blocks (Explicit Observations,
Deductive Observations, Inductive Observations, peer card) injected
into every system prompt — often 4000-5000+ tokens.

Fix: after assembling the Honcho context block in _honcho_prefetch(),
truncate the total output to fit within the configured context_tokens
budget (estimated at 4 chars per token).

Adds three unit tests covering truncation, within-budget passthrough,
and no-budget (None) passthrough.
@nidhishgajjar

Copy link
Copy Markdown

Orb Code Review (powered by GLM 5.1 on Orb Cloud)

Summary

Enforces the context_tokens budget on Honcho's peer representation (peer_card + representation + dialectic), which the Honcho API's tokens parameter does not limit. Without this fix, the assembled Honcho block could grow unbounded and consume the entire LLM context window. The fix uses a character-based truncation (1 token ≈ 4 chars) after assembling the block.

Architecture

Cleanly placed at the return point of _honcho_prefetch(), after all parts are assembled. Reads _context_tokens from the existing HonchoSessionManager configuration (set at plugins/memory/honcho/session.py:92). The hasattr guard ensures backward compatibility when _honcho exists but _context_tokens is not set.

Issues

Warning — Character-based truncation can cut mid-UTF8 character or mid-sentence:

assembled = assembled[:char_budget] + "\n\n[… truncated to fit token budget]"

assembled[:char_budget] operates on Python string code points, so this won't split a UTF-8 byte sequence (Python strings are Unicode). However, it can cut mid-word or mid-sentence, which degrades the quality of the context fed to the LLM. Consider truncating at the last paragraph or sentence boundary:

if len(assembled) > char_budget:
    cut = assembled[:char_budget].rfind('\n\n')
    if cut > char_budget // 2:
        assembled = assembled[:cut]
    else:
        assembled = assembled[:char_budget]
    assembled += '\n\n[… truncated to fit token budget]'

This is a suggestion, not a blocker — the current approach works, just produces slightly rougher truncation boundaries.

Suggestion — The 1 token ≈ 4 chars heuristic is reasonable for English text but underestimates token count for code-heavy or multilingual content (where tokens can be 2-3 chars each). If the Honcho client has access to a tokenizer (e.g., tiktoken), using it would be more accurate. For now, the 4x multiplier provides generous headroom.

Suggestion — The private attribute access self._honcho._context_tokens works but couples to implementation details. A public property on HonchoSessionManager (e.g., context_tokens) would be cleaner. Not blocking.

Cross-file impact

None — _context_tokens is already set in plugins/memory/honcho/session.py and used in session.py:225 for session.context(tokens=...). This PR adds a second consumer in run_agent.py.

Assessment

approve ✅ — Straightforward fix for an unbounded-growth issue. Good test coverage (truncation, no-truncation-within-budget, no-truncation-when-no-budget). The character heuristic is reasonable for a first pass.

1 similar comment
@nidhishgajjar

Copy link
Copy Markdown

Orb Code Review (powered by GLM 5.1 on Orb Cloud)

Summary

Enforces the context_tokens budget on Honcho's peer representation (peer_card + representation + dialectic), which the Honcho API's tokens parameter does not limit. Without this fix, the assembled Honcho block could grow unbounded and consume the entire LLM context window. The fix uses a character-based truncation (1 token ≈ 4 chars) after assembling the block.

Architecture

Cleanly placed at the return point of _honcho_prefetch(), after all parts are assembled. Reads _context_tokens from the existing HonchoSessionManager configuration (set at plugins/memory/honcho/session.py:92). The hasattr guard ensures backward compatibility when _honcho exists but _context_tokens is not set.

Issues

Warning — Character-based truncation can cut mid-UTF8 character or mid-sentence:

assembled = assembled[:char_budget] + "\n\n[… truncated to fit token budget]"

assembled[:char_budget] operates on Python string code points, so this won't split a UTF-8 byte sequence (Python strings are Unicode). However, it can cut mid-word or mid-sentence, which degrades the quality of the context fed to the LLM. Consider truncating at the last paragraph or sentence boundary:

if len(assembled) > char_budget:
    cut = assembled[:char_budget].rfind('\n\n')
    if cut > char_budget // 2:
        assembled = assembled[:cut]
    else:
        assembled = assembled[:char_budget]
    assembled += '\n\n[… truncated to fit token budget]'

This is a suggestion, not a blocker — the current approach works, just produces slightly rougher truncation boundaries.

Suggestion — The 1 token ≈ 4 chars heuristic is reasonable for English text but underestimates token count for code-heavy or multilingual content (where tokens can be 2-3 chars each). If the Honcho client has access to a tokenizer (e.g., tiktoken), using it would be more accurate. For now, the 4x multiplier provides generous headroom.

Suggestion — The private attribute access self._honcho._context_tokens works but couples to implementation details. A public property on HonchoSessionManager (e.g., context_tokens) would be cleaner. Not blocking.

Cross-file impact

None — _context_tokens is already set in plugins/memory/honcho/session.py and used in session.py:225 for session.context(tokens=...). This PR adds a second consumer in run_agent.py.

Assessment

approve ✅ — Straightforward fix for an unbounded-growth issue. Good test coverage (truncation, no-truncation-within-budget, no-truncation-when-no-budget). The character heuristic is reasonable for a first pass.

@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins comp/agent Core agent loop, run_agent.py, prompt builder labels May 3, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #3265 (closed) which enforced contextTokens budget on Honcho prefetch. If that fix was merged, this may already be resolved.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #3265 (closed) which enforced contextTokens budget on Honcho prefetch.

@teknium1

Copy link
Copy Markdown
Contributor

Already fixed on main. The Honcho integration was extracted into a plugin (PR #5295) and the same context_tokens * 4 char truncation now lives in plugins/memory/honcho/__init__.py _truncate_to_budget() (commit 9e0fc62 by @erosika via PR #3265). Your diagnosis was correct and submitted ~2 weeks earlier — credit to you both for spotting it. Closing as superseded by the plugin port.

@teknium1 teknium1 closed this May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants