Feature: Iteration Budget Pressure — Warn the LLM Before Max Iterations Hit

## Overview

When the agent approaches its max iteration limit, it currently gets no advance warning — it simply hits the wall and `_handle_max_iterations()` makes one final tool-less API call asking for a summary. This means the LLM has no opportunity to proactively wrap up, consolidate its findings, or produce a quality final response before being cut off.

This idea comes from [Utah (Inngest's agent harness)](https://github.com/inngest/utah), which implements a two-tier **budget pressure** system that injects system messages into the LLM context as iterations run low. The pattern is simple, zero-dependency, and addresses a real failure mode where agents exhaust iterations doing tool calls without ever producing a response.

---

## Research Findings

### How Utah's Budget Pressure Works

Utah injects ephemeral system messages at two tiers based on remaining iterations (default `maxIterations = 20`):

```typescript
// CAUTION tier — 10 iterations before the end
if (iterations >= maxIterations - 10) {
  "[SYSTEM: Iteration N/M. Start wrapping up — respond with text soon.]"
}

// WARNING tier — last 3 iterations
if (iterations >= maxIterations - 3) {
  "[SYSTEM: You are on iteration N of M. You MUST respond with your final
   answer NOW. Do not call any more tools.]"
}
```

Key design decisions:
- Messages are appended to `messagesForLLM` (the copy sent to the API), **not** to the persistent `messages` array — they don't pollute session history
- Two tiers provide graduated pressure: first a nudge, then urgency
- If the loop exhausts all iterations anyway, a static fallback response is returned: `"(Reached max iterations: 20)"`

### Current State in Hermes Agent

In `run_agent.py`, the agent loop (`while api_call_count < self.max_iterations`) has:
- **No pre-warning** to the LLM about approaching the limit
- A post-hoc `_handle_max_iterations()` (lines 2640-2757) that:
  - Injects a user message asking for a summary after the limit is hit
  - Makes one final API call with NO tools
  - Returns whatever the LLM produces or an error string
- `max_iterations` defaults to 60, displayed in progress output but never communicated to the LLM

The step_callback fires per-iteration (line 2941) and could be extended, but budget warnings are better handled as message injection into the API call.

---

## Implementation Plan

### Skill vs. Tool Classification

This is a **core codebase change** to `run_agent.py`, not a skill or tool. It modifies the agent loop's message preparation logic.

### What We'd Need

1. Configurable thresholds for warning tiers
2. Message injection into `api_messages` (not persisted `messages`)
3. Integration with existing `_handle_max_iterations()` as a fallback

### Phased Rollout

**Phase 1: Basic two-tier budget warnings**
- Add `BUDGET_CAUTION_THRESHOLD = 0.7` and `BUDGET_WARNING_THRESHOLD = 0.9` (fraction of max_iterations)
- Before each API call, check `api_call_count / self.max_iterations` against thresholds
- Inject ephemeral system messages into the messages sent to the API:
  - Caution (70%): `"[BUDGET: Iteration {N}/{max}. You have {remaining} iterations left. Start consolidating your work and prepare to provide a final response.]"`
  - Warning (90%): `"[BUDGET: Iteration {N}/{max}. You MUST provide your final response NOW. Do not make additional tool calls unless absolutely critical.]"`
- Messages injected into `api_messages` copy only, never persisted to `messages` or session DB
- Injection point: after line ~3000 in `run_agent.py`, before the API call

**Phase 2: Adaptive thresholds**
- Scale thresholds based on actual `max_iterations` value (a 10-turn session needs earlier warnings than a 60-turn one)
- Consider task complexity signals (number of tools called, context size) to adjust pressure timing
- Add config options in `cli.py` CLI_CONFIG for threshold customization

**Phase 3: Smart wrap-up behavior**
- When budget warning fires, optionally reduce the available toolset (e.g., remove heavy tools like delegate_task, browser)
- Track whether the LLM acknowledged the warning (produced text alongside tool calls) vs. ignored it
- If warning was ignored, escalate to injecting the message as a forced user turn on next iteration

---

## Pros & Cons

### Pros
- **Trivial to implement** — ~20 lines of code in the main loop, no new dependencies
- **Prevents silent exhaustion** — the most common failure mode where agents loop endlessly doing tool calls
- **Better response quality** — the LLM can thoughtfully conclude vs. being abruptly asked to summarize after cutoff
- **No architectural changes** — works within the existing loop structure
- **Ephemeral injection** — doesn't pollute session history or affect context compression

### Cons / Risks
- **May cause premature wrap-up** — aggressive thresholds might make the LLM stop working too early
- **Threshold tuning** — the right thresholds likely depend on task type; a fixed percentage may not be optimal for all cases
- **Token cost** — injected messages consume context tokens (minimal, but nonzero)

---

## Open Questions

- Should the budget message format be a system message or a user message? (Utah uses user-role messages, but system role avoids confusing the LLM about who's speaking)
- Should the thresholds be absolute (e.g., "last 5 iterations") or relative (e.g., "last 10%")? Relative works better across different `max_iterations` values
- Should Phase 3 tool reduction be opt-in or default?
- Should this interact with `_handle_max_iterations()` — e.g., if the budget warning successfully caused a response, skip the post-hoc summary call?

---

## References

- [Utah source: budget warnings in agent-loop.ts](https://github.com/inngest/utah/blob/main/src/agent-loop.ts) (lines 207-223)
- [Blog post: "Your Agent Needs a Harness, Not a Framework"](https://www.inngest.com/blog/your-agent-needs-a-harness-not-a-framework)
- Hermes current implementation: `run_agent.py` lines 2640-2757 (`_handle_max_iterations`), line 2919 (main loop condition)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Iteration Budget Pressure — Warn the LLM Before Max Iterations Hit #414

Overview

Research Findings

How Utah's Budget Pressure Works

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Iteration Budget Pressure — Warn the LLM Before Max Iterations Hit #414

Description

Overview

Research Findings

How Utah's Budget Pressure Works

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions