Skip to content

Feature: Granular Improvements from Roo Code Deep-Dive — Tool Output, Patch Refinements, Anti-Hallucination, Prompt Methodology #507

@teknium1

Description

@teknium1

Overview

A deep architectural comparison between Roo Code (Apache 2.0, TypeScript VS Code AI agent) and Hermes Agent reveals several subtle but high-impact patterns we should adopt. These are not new features — they are refinements to existing systems that improve reliability, quality, and developer experience.

Each section below is independently implementable. They're grouped in one issue because they share a common origin (the Roo Code comparison), but could be split into separate PRs.

Related issues with Roo Code context added via comments: #499 (Context Compaction — condensation prompt), #481 (Loop Guard — repetition detection), #476 (Agent Modes), #344 (Multi-Agent — orchestrator pattern), #452 (Checkpoints — shadow git approach)


1. Tool Output Truncation: Head/Tail Split (HIGH IMPACT)

Current state: run_agent.py:2606-2613 — Global 100K char limit, HEAD-ONLY truncation:

function_result = function_result[:MAX_TOOL_RESULT_CHARS]

Problem: Build errors, test failures, and stack traces appear at the END of output. Head-only truncation loses exactly the information the agent needs most. A make build with 80K chars of success output followed by a 5K error message will show only success output and lose the error.

Roo Code's approach: Head/tail buffer with 50/50 split. Full output spills to disk; model can retrieve via read_command_output tool with byte-offset pagination and regex search.

Proposed fix:

MAX_TOOL_RESULT_CHARS = 100_000
if len(function_result) > MAX_TOOL_RESULT_CHARS:
    head_size = MAX_TOOL_RESULT_CHARS * 2 // 5   # 40K head
    tail_size = MAX_TOOL_RESULT_CHARS * 3 // 5   # 60K tail
    function_result = (
        function_result[:head_size]
        + f"\n\n[... {len(function_result) - head_size - tail_size:,} chars truncated ...]\n\n"
        + function_result[-tail_size:]
    )

Effort: ~10 lines changed in run_agent.py. Zero risk.


2. Patch Tool Refinements (MEDIUM-HIGH IMPACT)

2a. Indentation Preservation on Fuzzy Match

Current state: When fuzzy_match.py matches via strategies 2-8 (non-exact), it maps positions back to the original content but replaces with the LLM's new_string literally. If the LLM's indentation differs from the file's, the replacement has wrong indentation.

Roo Code's approach (multi-search-replace.ts:466-500): After fuzzy matching, calculates the relative indentation between the search block's first line and the file's matched region, then adjusts all lines of the replacement to maintain the file's indentation pattern.

Example:

File has (4-space indent):
    def foo():
        return bar

LLM sends (2-space indent):
  def foo():
    return baz

With indentation preservation:
    def foo():
        return baz    # 4-space indent preserved

2b. Line Number Hint for Replace Mode

Current state: patch(mode="replace") searches the entire file for old_string. In large files with repeated patterns, this can match the wrong location.

Roo Code's approach: apply_diff accepts :start_line:N which narrows the search to a ±40 line buffer around line N. Dramatically reduces false matches in large files.

Proposed: Add optional start_line parameter to patch tool. When provided, search only within ±50 lines of that line number first, falling back to full-file search if not found. Zero breaking change — parameter is optional.

2c. CRLF Line Ending Preservation

Current state: No explicit CRLF handling. If a file uses \r\n line endings (Windows), edits may silently convert to \n.

Roo Code's approach (EditFileTool.ts): Detects original line endings before edit, normalizes for matching, then restores original endings after replacement.

2d. Unicode Normalization Strategy

Current state: fuzzy_match.py strategy 5 handles \\n/\\t/\\r escape sequences but not Unicode variants.

Roo Code's approach (seek-sequence.ts): Normalizes smart quotes (curly quotes to straight), em-dashes to hyphens, and non-breaking spaces to regular spaces. LLMs frequently generate these Unicode variants instead of ASCII equivalents.

2e. Detailed Error Messages with Recovery Suggestions

Current state: Patch failures return simple strings like "Could not find unique match for old_string."

Roo Code's approach (EditFileTool.ts): Returns structured error details with numbered recovery steps:

Error: Could not find match.
1. Verify the file exists and the content is current (use read_file)
2. Check that the search string matches exactly
3. The file may have been modified since you last read it
4. Try using a more unique search string with more context

2f. Consecutive Failure Tracking Per File

Current state: No per-file error tracking. The agent can fail to edit a file 10 times in a row with no escalating guidance.

Roo Code's approach: Maintains consecutiveMistakeCountForApplyDiff map keyed by file path. After 2+ consecutive failures on the same file, shows escalating warnings.

Proposed: Track per-file failure counts in the agent loop. After 3 consecutive failed edits to the same file, inject a system hint: "You've failed to edit {filename} 3 times. Consider: read the file first to verify current content, use a more unique search string, or try write_file to replace the entire file."


3. Anti-Hallucination Prompt Instructions (MEDIUM-HIGH IMPACT)

Current state: prompt_builder.py has NO anti-hallucination instructions. The system prompt describes the agent's identity and tool-specific guidance but never tells the model to verify tool outcomes or avoid fabrication.

Roo Code's approach: System prompt includes explicit rules:

  • "Do not assume the outcome of any tool use. Each step must be informed by the previous step's result."
  • "If one of the values for a required parameter is missing, DO NOT invoke the tool (not even with fillers for the missing params)."
  • "Do not use the ~ character or $HOME to refer to the home directory."

Proposed addition to prompt_builder.py:

TOOL_USE_RULES = """
Important rules for tool usage:
- Do not assume the outcome of any tool call. Each step must be informed by the previous step's result.
- Do not invoke tools with placeholder or filler values for required parameters.
- After editing a file, verify the edit succeeded before making dependent changes.
- Prefer calling multiple independent tools in a single response when possible.
"""

Effort: ~5 lines added to prompt_builder.py. These are proven instructions from a production agent used by millions.


4. Lightweight Task Methodology Prompt (MEDIUM IMPACT)

Current state: Hermes relies on the model's native reasoning capabilities. No explicit task methodology in the system prompt.

Roo Code's approach: OBJECTIVE section gives a 5-step methodology:

  1. Analyze the task and set clear goals
  2. Work through goals sequentially, using available tools
  3. Before each tool call, assess what information you have and what you need
  4. Verify each step's results before proceeding
  5. Present results when complete

Why this matters: Non-reasoning models (no CoT) and smaller models benefit significantly from explicit methodology guidance. Reasoning models can ignore it (they already do this internally), but it provides a floor for all models.

Proposed: Add a lightweight methodology section to the system prompt, conditional on whether the model has native reasoning support.


5. Behavioral Rules (LOW-MEDIUM IMPACT)

Current state: No explicit behavioral constraints. The model can start responses with filler words, end with unnecessary questions, or be overly conversational.

Roo Code's approach: Explicit rules like "NEVER start messages with affirmative phrases like 'Great', 'Certainly', 'Of course'" and "Your goal is to accomplish the user's task, NOT engage in back and forth conversation."

Consideration: These are opinionated and may conflict with Hermes's personality system. Could be added as defaults that personalities override, or as opt-in behavioral presets.


Implementation Priority

# Improvement Impact Effort Risk
1 Head/tail truncation HIGH Very Low (~10 lines) None
3 Anti-hallucination instructions HIGH Very Low (~5 lines) None
2f Per-file failure tracking MEDIUM Low (~30 lines) None
2e Detailed error messages MEDIUM Low None
2a Indentation preservation MEDIUM-HIGH Medium Low
2d Unicode normalization MEDIUM Low None
4 Task methodology prompt MEDIUM Low Low
2b Line number hint MEDIUM Medium None
2c CRLF preservation LOW-MEDIUM Low Low
5 Behavioral rules LOW Very Low Opinionated

References

  • Roo Code GitHub (Apache 2.0)
  • Roo Code source: src/core/diff/strategies/multi-search-replace.ts (indentation preservation), src/core/tools/EditFileTool.ts (error messages, CRLF), src/core/tools/apply-patch/seek-sequence.ts (Unicode normalization), src/core/prompts/sections/rules.ts (behavioral rules), src/core/prompts/sections/tool-use-guidelines.ts (anti-hallucination), src/core/prompts/sections/objective.ts (task methodology)
  • Hermes source: run_agent.py:2606-2613 (truncation), tools/fuzzy_match.py (fuzzy matching), tools/patch_parser.py (V4A parser), prompt_builder.py (system prompt), agent/context_compressor.py (compression prompt)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions