Skip to content

feat(honcho): configurable prefetch cadence, injection granularity, and cost-efficient defaults #3422

@erosika

Description

@erosika

Problem

Every turn currently makes 2 unconditional Honcho API calls:

  1. honcho_session.context() — fetches user representation, peer card, AI representation, AI card
  2. peer.chat() — runs the dialectic with auto-bumped reasoning level

This is expensive and mostly redundant. The user's representation doesn't change between consecutive turns. The dialectic runs full inference even for "ok thanks." A user reported ~$20 in 3 days of normal usage — that's ~$200/mo on managed Honcho, which is prohibitive.

Additionally, _dynamic_reasoning_level auto-bumps from low to medium/high based on message length (>400 chars = +2 levels). Users cannot cap this. The dialecticReasoningLevel config sets a floor, not a ceiling.

Proposed changes

1. Configurable injection granularity

Let users choose which prefetch components get injected:

# honcho.yaml
prefetch:
  representation: true      # user conclusions/synthesis
  card: true                 # structured peer card
  aiRepresentation: false    # AI peer model (default off — rarely needed)
  aiCard: false              # AI peer card (default off)
  dialectic: true            # continuity synthesis

Community ask: "I like the observations but would prefer not to have the agent receiving the summary." Currently the only lever is recallMode: tools which disables everything. This gives per-component control.

2. Smart prefetch cadence

Replace unconditional per-turn fetching with event-driven refresh:

Context (representation + card):

  • Fetch once at session start
  • Refresh after observation/conclusion writes (the data actually changed)
  • Configurable interval fallback: prefetchInterval: 10 (every N turns, default 0 = session-start only)

Dialectic:

  • First turn of a session (always — continuity matters here)
  • Skip for short/trivial messages (<80 chars, no session-reference keywords)
  • Configurable: dialecticInterval: 0 (0 = first-turn only, N = every N turns)
  • dialecticTrigger: auto | first-turn | manual | every-turn for full control

3. Reasoning level cap

Add a ceiling to stop auto-bump cost escalation:

dialecticReasoningLevel: low    # floor (existing)
dialecticReasoningCap: low      # ceiling (new — pins it, no auto-bump)

When cap equals level, auto-bump is effectively disabled. When cap is higher, bump can go up to but not past it.

Default: cap: low (no auto-bump unless user opts in).

4. Cost-efficient defaults

Current defaults are optimized for quality. Proposed defaults optimize for usefulness-per-dollar:

Setting Current Proposed Why
dialecticReasoningLevel low low (same)
dialecticReasoningCap (none/unlimited) low stop auto-bump
aiRepresentation injected false rarely useful, costs tokens
aiCard injected false rarely useful, costs tokens
context fetch cadence every turn session start + on-write rep is stable within a session
dialectic cadence every turn first turn only continuity only matters at session start

This would reduce a typical session from 2 API calls/turn to 1 context fetch + 1 dialectic call for the entire session, with on-demand refresh when data actually changes.

5. Observability (follow-up)

  • Show Honcho token consumption in the status bar (separate from main context %)
  • hermes honcho tokens should show actual usage, not just config
  • Honcho-specific log level so users can tail -f without --verbose firehose

Files touched

  • honcho_integration/client.py — new config fields
  • honcho_integration/session.py — cadence logic in prefetch_context, prefetch_dialectic, _dynamic_reasoning_level
  • run_agent.py — injection filtering in _honcho_prefetch
  • honcho_integration/cli.pyhermes honcho tokens display updates

Backward compatibility

All new fields have defaults matching proposed behavior. Existing honcho.yaml files with no new fields get the cost-efficient defaults automatically. Users who want the current behavior can set:

prefetch:
  aiRepresentation: true
  aiCard: true
  dialecticInterval: 1      # every turn
  prefetchInterval: 1       # every turn
dialecticReasoningCap: high  # allow auto-bump

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/pluginsPlugin system and bundled pluginstype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions