Skip to content

feat(honcho): configurable prefetch cadence, injection toggles, and reasoning cap#3425

Closed
erosika wants to merge 3 commits into
NousResearch:mainfrom
erosika:eri/honcho-cost-awareness
Closed

feat(honcho): configurable prefetch cadence, injection toggles, and reasoning cap#3425
erosika wants to merge 3 commits into
NousResearch:mainfrom
erosika:eri/honcho-cost-awareness

Conversation

@erosika

@erosika erosika commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Default prefetch cadence changed from every-turn to first-turn-only for both context and dialectic -- eliminates 2 redundant Honcho API calls per turn after session start
  • Per-component injection toggles: users can disable representation, card, AI peer data, or dialectic independently in honcho.yaml
  • Reasoning level cap prevents _dynamic_reasoning_level from auto-bumping past a configured ceiling, stopping cost escalation from long messages

Config

All new fields in honcho.yaml, with cost-efficient defaults:

# Cadence: "first-turn" (default), "every-turn", or integer N
contextCadence: first-turn
dialecticCadence: first-turn

# Injection toggles
injectRepresentation: true       # user conclusions
injectCard: true                 # structured peer card
injectAiRepresentation: false    # AI peer model (default off)
injectAiCard: false              # AI peer card (default off)
injectDialectic: true            # continuity synthesis

# Reasoning cap (default: same as floor = no auto-bump)
dialecticReasoningLevel: low
dialecticReasoningCap: low

Legacy behavior is one config change: set cadences to every-turn and cap to high.

Motivation

Community-reported cost: $20/3 days ($200/mo) on managed Honcho with normal chat usage. Root cause: unconditional per-turn context fetch + dialectic inference with auto-bumped reasoning levels. User rep doesn't change between consecutive turns; dialectic is only useful at session boundaries.

Closes #3422

Test plan

  • Verify default config produces first-turn-only prefetch (no API calls on turn 2+)
  • Verify dialecticCadence: every-turn restores legacy behavior
  • Verify dialecticReasoningCap: low pins reasoning level regardless of message length
  • Verify injectRepresentation: false suppresses user rep from system prompt
  • Verify existing honcho.yaml with no new fields works unchanged

erosika added 3 commits March 30, 2026 13:45
…easoning cap

Cost-awareness defaults for Honcho integration:

- Prefetch cadence: context and dialectic default to "first-turn" instead
  of unconditional per-turn fetches. Configurable via contextCadence /
  dialecticCadence in honcho.yaml ("first-turn", "every-turn", or int N).

- Injection toggles: per-component control over what gets injected into
  the system prompt. injectRepresentation, injectCard, injectDialectic
  default true; injectAiRepresentation, injectAiCard default false
  (rarely needed, saves tokens).

- Reasoning cap: dialecticReasoningCap (default: same as floor) prevents
  _dynamic_reasoning_level from auto-bumping past the configured ceiling.
  Stops cost escalation from long messages triggering high reasoning.

Existing honcho.yaml files with no new fields get cost-efficient defaults
automatically. Users who want legacy per-turn behavior can set cadence
to "every-turn" and raise the reasoning cap.

Closes NousResearch#3422
…s commands

hermes honcho status now shows: prefetch cadence, injection toggles,
reasoning level with cap info.

hermes honcho tokens now shows: cadence per component, injection
summary (enabled/suppressed), reasoning cap alongside level.

Users can see exactly what Honcho is doing per-turn and what's
costing them money.
…xt stays in prompt

Cadence controls when Honcho API calls fire. injectionFrequency controls
how many turns the cached result stays in the system prompt — this is
where the main LLM input token cost comes from.

Options: "every-turn" (default, legacy), "first-turn" (inject once then
drop), or integer N (inject for first N turns then suppress).

Surfaced in hermes honcho status and hermes honcho tokens.
@erosika erosika force-pushed the eri/honcho-cost-awareness branch from 0692ba2 to 0de4baa Compare March 30, 2026 17:45
zebster-cmd added a commit to zebster-cmd/hermes-agent that referenced this pull request Apr 7, 2026
…easoning cap (NousResearch#3425)

Ported from erosika's PR NousResearch#3425 to plugin architecture.
- Add dialectic_reasoning_cap config field (ceiling for auto-bump)
- Add dialectic_cadence and context_cadence (first-turn/every-turn/N)
- Add injection_frequency and per-component injection toggles
- Add _bool_opt helper for boolean config resolution
- Add _should_prefetch() and increment_turn() to session manager
- Update CLI status and tokens display with cadence/injection info
@erosika

erosika commented Apr 15, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #9884 — cadence, injection toggles, and reasoning cap are fully reimplemented in the new plugin architecture.

@erosika erosika closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(honcho): configurable prefetch cadence, injection granularity, and cost-efficient defaults

1 participant