Skip to content

Refactor reflection system to use structured tool calls#17

Merged
v3g42 merged 1 commit intomainfrom
claude/reflection-tool-call-CRAjG
Mar 9, 2026
Merged

Refactor reflection system to use structured tool calls#17
v3g42 merged 1 commit intomainfrom
claude/reflection-tool-call-CRAjG

Conversation

@v3g42
Copy link
Copy Markdown
Contributor

@v3g42 v3g42 commented Feb 14, 2026

Summary

This PR refactors the reflection agent system to use structured tool calls instead of text pattern matching. The reflection agent now uses a dedicated reflect tool to report its analysis decision, making the system more robust and type-safe.

Key Changes

Reflection Configuration & Agent Definition

  • Added reflection_agent field to ReflectionConfig to allow custom reflection agents (defaults to built-in)
  • Replaced enable_reflection: bool with reflection: Option<ReflectionConfig> in StandardDefinition for more flexible configuration
  • Added validate_reflection_agent() method to ensure reflection agents have the "reflect" tool configured
  • Updated helper methods: is_reflection_enabled() and new reflection_config() getter

Reflect Tool Implementation

  • Implemented new ReflectTool as a built-in tool with structured parameters:
    • quality: enum (excellent/good/fair/poor)
    • completeness: enum (complete/partial/incomplete)
    • should_continue: boolean (retry decision)
    • reason: optional explanation
  • Tool stores its structured result as the final result in ExecutorContext for downstream processing
  • Added to builtin tools list in get_builtin_tools()

Reflection Agent Loop

  • Updated run_reflection_agent() to return ReflectionResult containing both text content and structured tool call result
  • Modified reflection decision logic to extract should_continue from the tool call output instead of text pattern matching ("Should Continue: YES/NO")
  • Removed reliance on string matching in reflection agent output

Reflection Agent Markdown

  • Updated reflection_agent.md to:
    • Configure the "reflect" tool in its tools.builtin
    • Instruct agent to use the reflect tool instead of text output
    • Update decision rules to match tool parameter values

Execution Result Formatting

  • Added truncation logic to ExecutionResult::as_observation() to prevent token bloat:
    • Text parts: max 1000 chars
    • Data/ToolResult parts: max 500 chars
    • Includes truncation indicator with total character count

Cleanup & Removals

  • Removed unused BrowserHooksConfig enum and related field from StandardDefinition
  • Removed browser_hooks configuration option

Test Improvements

  • Added test_store_config() helper to create in-memory SQLite databases for tests
  • Updated tests to use in-memory stores instead of filesystem dependencies
  • Added API key environment variable checks to skip tests when credentials unavailable
  • Fixed test isolation issues in prompt store and auth provider tests

Implementation Details

  • The reflection agent now signals its decision through a tool call rather than text parsing, improving reliability
  • Custom reflection agents can be specified but must explicitly include the "reflect" tool
  • The built-in reflection agent automatically gets the reflect tool
  • Execution results are now truncated in observations to prevent context window bloat while preserving full data in storage

https://claude.ai/code/session_019JhhSoRA6gQZNvpWsGWxNN

…x all tests

- Add ReflectTool as a builtin tool for structured reflection decisions
  instead of parsing "Should Continue" from agent text output
- Update reflection_agent.md to use the reflect tool
- Add reflection_agent field to ReflectionConfig for custom reflection agents
- Add validation for reflection agent having reflect tool configured
- Remove unused browser_hooks from StandardDefinition and BrowserHooksConfig
- Update agent_loop reflect() to extract should_continue from tool call result
- Return ReflectionResult with structured final_result from reflection agent

Test fixes:
- Add env var guards for tests requiring OPENAI_API_KEY or TAVILY_API_KEY
- Use in-memory SQLite (file:{uuid}?mode=memory) for all test orchestrators
- Fix a2a_types serialize test with round-trip instead of exact string match
- Fix parse_agent_definition assertion to match actual max_iterations=30
- Fix provider_registry test to load providers before querying
- Fix live_llm_execute test to use LlmExecuteOptions builder pattern
- Fix code execution test tracing_subscriber double-init panic
- Add truncation to execution as_observation() for large tool results
- Rewrite prompt store tests to use HashMapPromptStore (no filesystem deps)
- Guard typescript plugin test for missing sample directory

https://claude.ai/code/session_019JhhSoRA6gQZNvpWsGWxNN
@v3g42 v3g42 merged commit 0585d0e into main Mar 9, 2026
@v3g42 v3g42 deleted the claude/reflection-tool-call-CRAjG branch March 9, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants