Problem
Every turn currently makes 2 unconditional Honcho API calls:
honcho_session.context() — fetches user representation, peer card, AI representation, AI card
peer.chat() — runs the dialectic with auto-bumped reasoning level
This is expensive and mostly redundant. The user's representation doesn't change between consecutive turns. The dialectic runs full inference even for "ok thanks." A user reported ~$20 in 3 days of normal usage — that's ~$200/mo on managed Honcho, which is prohibitive.
Additionally, _dynamic_reasoning_level auto-bumps from low to medium/high based on message length (>400 chars = +2 levels). Users cannot cap this. The dialecticReasoningLevel config sets a floor, not a ceiling.
Proposed changes
1. Configurable injection granularity
Let users choose which prefetch components get injected:
# honcho.yaml
prefetch:
representation: true # user conclusions/synthesis
card: true # structured peer card
aiRepresentation: false # AI peer model (default off — rarely needed)
aiCard: false # AI peer card (default off)
dialectic: true # continuity synthesis
Community ask: "I like the observations but would prefer not to have the agent receiving the summary." Currently the only lever is recallMode: tools which disables everything. This gives per-component control.
2. Smart prefetch cadence
Replace unconditional per-turn fetching with event-driven refresh:
Context (representation + card):
- Fetch once at session start
- Refresh after observation/conclusion writes (the data actually changed)
- Configurable interval fallback:
prefetchInterval: 10 (every N turns, default 0 = session-start only)
Dialectic:
- First turn of a session (always — continuity matters here)
- Skip for short/trivial messages (<80 chars, no session-reference keywords)
- Configurable:
dialecticInterval: 0 (0 = first-turn only, N = every N turns)
dialecticTrigger: auto | first-turn | manual | every-turn for full control
3. Reasoning level cap
Add a ceiling to stop auto-bump cost escalation:
dialecticReasoningLevel: low # floor (existing)
dialecticReasoningCap: low # ceiling (new — pins it, no auto-bump)
When cap equals level, auto-bump is effectively disabled. When cap is higher, bump can go up to but not past it.
Default: cap: low (no auto-bump unless user opts in).
4. Cost-efficient defaults
Current defaults are optimized for quality. Proposed defaults optimize for usefulness-per-dollar:
| Setting |
Current |
Proposed |
Why |
dialecticReasoningLevel |
low |
low |
(same) |
dialecticReasoningCap |
(none/unlimited) |
low |
stop auto-bump |
aiRepresentation |
injected |
false |
rarely useful, costs tokens |
aiCard |
injected |
false |
rarely useful, costs tokens |
| context fetch cadence |
every turn |
session start + on-write |
rep is stable within a session |
| dialectic cadence |
every turn |
first turn only |
continuity only matters at session start |
This would reduce a typical session from 2 API calls/turn to 1 context fetch + 1 dialectic call for the entire session, with on-demand refresh when data actually changes.
5. Observability (follow-up)
- Show Honcho token consumption in the status bar (separate from main context %)
hermes honcho tokens should show actual usage, not just config
- Honcho-specific log level so users can
tail -f without --verbose firehose
Files touched
honcho_integration/client.py — new config fields
honcho_integration/session.py — cadence logic in prefetch_context, prefetch_dialectic, _dynamic_reasoning_level
run_agent.py — injection filtering in _honcho_prefetch
honcho_integration/cli.py — hermes honcho tokens display updates
Backward compatibility
All new fields have defaults matching proposed behavior. Existing honcho.yaml files with no new fields get the cost-efficient defaults automatically. Users who want the current behavior can set:
prefetch:
aiRepresentation: true
aiCard: true
dialecticInterval: 1 # every turn
prefetchInterval: 1 # every turn
dialecticReasoningCap: high # allow auto-bump
Problem
Every turn currently makes 2 unconditional Honcho API calls:
honcho_session.context()— fetches user representation, peer card, AI representation, AI cardpeer.chat()— runs the dialectic with auto-bumped reasoning levelThis is expensive and mostly redundant. The user's representation doesn't change between consecutive turns. The dialectic runs full inference even for "ok thanks." A user reported ~$20 in 3 days of normal usage — that's ~$200/mo on managed Honcho, which is prohibitive.
Additionally,
_dynamic_reasoning_levelauto-bumps fromlowtomedium/highbased on message length (>400 chars = +2 levels). Users cannot cap this. ThedialecticReasoningLevelconfig sets a floor, not a ceiling.Proposed changes
1. Configurable injection granularity
Let users choose which prefetch components get injected:
Community ask: "I like the observations but would prefer not to have the agent receiving the summary." Currently the only lever is
recallMode: toolswhich disables everything. This gives per-component control.2. Smart prefetch cadence
Replace unconditional per-turn fetching with event-driven refresh:
Context (representation + card):
prefetchInterval: 10(every N turns, default 0 = session-start only)Dialectic:
dialecticInterval: 0(0 = first-turn only, N = every N turns)dialecticTrigger: auto | first-turn | manual | every-turnfor full control3. Reasoning level cap
Add a ceiling to stop auto-bump cost escalation:
When
capequalslevel, auto-bump is effectively disabled. Whencapis higher, bump can go up to but not past it.Default:
cap: low(no auto-bump unless user opts in).4. Cost-efficient defaults
Current defaults are optimized for quality. Proposed defaults optimize for usefulness-per-dollar:
dialecticReasoningLevellowlowdialecticReasoningCaplowaiRepresentationfalseaiCardfalseThis would reduce a typical session from 2 API calls/turn to 1 context fetch + 1 dialectic call for the entire session, with on-demand refresh when data actually changes.
5. Observability (follow-up)
hermes honcho tokensshould show actual usage, not just configtail -fwithout--verbosefirehoseFiles touched
honcho_integration/client.py— new config fieldshoncho_integration/session.py— cadence logic inprefetch_context,prefetch_dialectic,_dynamic_reasoning_levelrun_agent.py— injection filtering in_honcho_prefetchhoncho_integration/cli.py—hermes honcho tokensdisplay updatesBackward compatibility
All new fields have defaults matching proposed behavior. Existing
honcho.yamlfiles with no new fields get the cost-efficient defaults automatically. Users who want the current behavior can set: