feat(honcho): configurable prefetch cadence, injection toggles, and reasoning cap#3425
Closed
erosika wants to merge 3 commits into
Closed
feat(honcho): configurable prefetch cadence, injection toggles, and reasoning cap#3425erosika wants to merge 3 commits into
erosika wants to merge 3 commits into
Conversation
19f37b9 to
3e2d60a
Compare
b29c37e to
0692ba2
Compare
…easoning cap
Cost-awareness defaults for Honcho integration:
- Prefetch cadence: context and dialectic default to "first-turn" instead
of unconditional per-turn fetches. Configurable via contextCadence /
dialecticCadence in honcho.yaml ("first-turn", "every-turn", or int N).
- Injection toggles: per-component control over what gets injected into
the system prompt. injectRepresentation, injectCard, injectDialectic
default true; injectAiRepresentation, injectAiCard default false
(rarely needed, saves tokens).
- Reasoning cap: dialecticReasoningCap (default: same as floor) prevents
_dynamic_reasoning_level from auto-bumping past the configured ceiling.
Stops cost escalation from long messages triggering high reasoning.
Existing honcho.yaml files with no new fields get cost-efficient defaults
automatically. Users who want legacy per-turn behavior can set cadence
to "every-turn" and raise the reasoning cap.
Closes NousResearch#3422
…s commands hermes honcho status now shows: prefetch cadence, injection toggles, reasoning level with cap info. hermes honcho tokens now shows: cadence per component, injection summary (enabled/suppressed), reasoning cap alongside level. Users can see exactly what Honcho is doing per-turn and what's costing them money.
…xt stays in prompt Cadence controls when Honcho API calls fire. injectionFrequency controls how many turns the cached result stays in the system prompt — this is where the main LLM input token cost comes from. Options: "every-turn" (default, legacy), "first-turn" (inject once then drop), or integer N (inject for first N turns then suppress). Surfaced in hermes honcho status and hermes honcho tokens.
0692ba2 to
0de4baa
Compare
zebster-cmd
added a commit
to zebster-cmd/hermes-agent
that referenced
this pull request
Apr 7, 2026
…easoning cap (NousResearch#3425) Ported from erosika's PR NousResearch#3425 to plugin architecture. - Add dialectic_reasoning_cap config field (ceiling for auto-bump) - Add dialectic_cadence and context_cadence (first-turn/every-turn/N) - Add injection_frequency and per-component injection toggles - Add _bool_opt helper for boolean config resolution - Add _should_prefetch() and increment_turn() to session manager - Update CLI status and tokens display with cadence/injection info
Contributor
Author
|
Superseded by #9884 — cadence, injection toggles, and reasoning cap are fully reimplemented in the new plugin architecture. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_dynamic_reasoning_levelfrom auto-bumping past a configured ceiling, stopping cost escalation from long messagesConfig
All new fields in
honcho.yaml, with cost-efficient defaults:Legacy behavior is one config change: set cadences to
every-turnand cap tohigh.Motivation
Community-reported cost:
$20/3 days ($200/mo) on managed Honcho with normal chat usage. Root cause: unconditional per-turn context fetch + dialectic inference with auto-bumped reasoning levels. User rep doesn't change between consecutive turns; dialectic is only useful at session boundaries.Closes #3422
Test plan
dialecticCadence: every-turnrestores legacy behaviordialecticReasoningCap: lowpins reasoning level regardless of message lengthinjectRepresentation: falsesuppresses user rep from system prompt