feat(honcho): configurable prefetch cadence, injection granularity, and cost-efficient defaults

## Problem

Every turn currently makes 2 unconditional Honcho API calls:

1. `honcho_session.context()` — fetches user representation, peer card, AI representation, AI card
2. `peer.chat()` — runs the dialectic with auto-bumped reasoning level

This is expensive and mostly redundant. The user's representation doesn't change between consecutive turns. The dialectic runs full inference even for "ok thanks." A user reported ~$20 in 3 days of normal usage — that's ~$200/mo on managed Honcho, which is prohibitive.

Additionally, `_dynamic_reasoning_level` auto-bumps from `low` to `medium`/`high` based on message length (>400 chars = +2 levels). Users cannot cap this. The `dialecticReasoningLevel` config sets a floor, not a ceiling.

## Proposed changes

### 1. Configurable injection granularity

Let users choose which prefetch components get injected:

```yaml
# honcho.yaml
prefetch:
  representation: true      # user conclusions/synthesis
  card: true                 # structured peer card
  aiRepresentation: false    # AI peer model (default off — rarely needed)
  aiCard: false              # AI peer card (default off)
  dialectic: true            # continuity synthesis
```

Community ask: "I like the observations but would prefer not to have the agent receiving the summary." Currently the only lever is `recallMode: tools` which disables everything. This gives per-component control.

### 2. Smart prefetch cadence

Replace unconditional per-turn fetching with event-driven refresh:

**Context (representation + card):**
- Fetch once at session start
- Refresh after observation/conclusion writes (the data actually changed)
- Configurable interval fallback: `prefetchInterval: 10` (every N turns, default 0 = session-start only)

**Dialectic:**
- First turn of a session (always — continuity matters here)
- Skip for short/trivial messages (<80 chars, no session-reference keywords)
- Configurable: `dialecticInterval: 0` (0 = first-turn only, N = every N turns)
- `dialecticTrigger: auto | first-turn | manual | every-turn` for full control

### 3. Reasoning level cap

Add a ceiling to stop auto-bump cost escalation:

```yaml
dialecticReasoningLevel: low    # floor (existing)
dialecticReasoningCap: low      # ceiling (new — pins it, no auto-bump)
```

When `cap` equals `level`, auto-bump is effectively disabled. When `cap` is higher, bump can go up to but not past it.

Default: `cap: low` (no auto-bump unless user opts in).

### 4. Cost-efficient defaults

Current defaults are optimized for quality. Proposed defaults optimize for usefulness-per-dollar:

| Setting | Current | Proposed | Why |
|---|---|---|---|
| `dialecticReasoningLevel` | `low` | `low` | (same) |
| `dialecticReasoningCap` | (none/unlimited) | `low` | stop auto-bump |
| `aiRepresentation` | injected | `false` | rarely useful, costs tokens |
| `aiCard` | injected | `false` | rarely useful, costs tokens |
| context fetch cadence | every turn | session start + on-write | rep is stable within a session |
| dialectic cadence | every turn | first turn only | continuity only matters at session start |

This would reduce a typical session from 2 API calls/turn to 1 context fetch + 1 dialectic call for the entire session, with on-demand refresh when data actually changes.

### 5. Observability (follow-up)

- Show Honcho token consumption in the status bar (separate from main context %)
- `hermes honcho tokens` should show actual usage, not just config
- Honcho-specific log level so users can `tail -f` without `--verbose` firehose

## Files touched

- `honcho_integration/client.py` — new config fields
- `honcho_integration/session.py` — cadence logic in `prefetch_context`, `prefetch_dialectic`, `_dynamic_reasoning_level`
- `run_agent.py` — injection filtering in `_honcho_prefetch`
- `honcho_integration/cli.py` — `hermes honcho tokens` display updates

## Backward compatibility

All new fields have defaults matching proposed behavior. Existing `honcho.yaml` files with no new fields get the cost-efficient defaults automatically. Users who want the current behavior can set:

```yaml
prefetch:
  aiRepresentation: true
  aiCard: true
  dialecticInterval: 1      # every turn
  prefetchInterval: 1       # every turn
dialecticReasoningCap: high  # allow auto-bump
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(honcho): configurable prefetch cadence, injection granularity, and cost-efficient defaults #3422

Problem

Proposed changes

1. Configurable injection granularity

2. Smart prefetch cadence

3. Reasoning level cap

4. Cost-efficient defaults

5. Observability (follow-up)

Files touched

Backward compatibility

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Current	Proposed	Why
`dialecticReasoningLevel`	`low`	`low`	(same)
`dialecticReasoningCap`	(none/unlimited)	`low`	stop auto-bump
`aiRepresentation`	injected	`false`	rarely useful, costs tokens
`aiCard`	injected	`false`	rarely useful, costs tokens
context fetch cadence	every turn	session start + on-write	rep is stable within a session
dialectic cadence	every turn	first turn only	continuity only matters at session start

feat(honcho): configurable prefetch cadence, injection granularity, and cost-efficient defaults #3422

Description

Problem

Proposed changes

1. Configurable injection granularity

2. Smart prefetch cadence

3. Reasoning level cap

4. Cost-efficient defaults

5. Observability (follow-up)

Files touched

Backward compatibility

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions