Refactor reflection system to use structured tool calls#17
Merged
Conversation
…x all tests
- Add ReflectTool as a builtin tool for structured reflection decisions
instead of parsing "Should Continue" from agent text output
- Update reflection_agent.md to use the reflect tool
- Add reflection_agent field to ReflectionConfig for custom reflection agents
- Add validation for reflection agent having reflect tool configured
- Remove unused browser_hooks from StandardDefinition and BrowserHooksConfig
- Update agent_loop reflect() to extract should_continue from tool call result
- Return ReflectionResult with structured final_result from reflection agent
Test fixes:
- Add env var guards for tests requiring OPENAI_API_KEY or TAVILY_API_KEY
- Use in-memory SQLite (file:{uuid}?mode=memory) for all test orchestrators
- Fix a2a_types serialize test with round-trip instead of exact string match
- Fix parse_agent_definition assertion to match actual max_iterations=30
- Fix provider_registry test to load providers before querying
- Fix live_llm_execute test to use LlmExecuteOptions builder pattern
- Fix code execution test tracing_subscriber double-init panic
- Add truncation to execution as_observation() for large tool results
- Rewrite prompt store tests to use HashMapPromptStore (no filesystem deps)
- Guard typescript plugin test for missing sample directory
https://claude.ai/code/session_019JhhSoRA6gQZNvpWsGWxNN
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR refactors the reflection agent system to use structured tool calls instead of text pattern matching. The reflection agent now uses a dedicated
reflecttool to report its analysis decision, making the system more robust and type-safe.Key Changes
Reflection Configuration & Agent Definition
reflection_agentfield toReflectionConfigto allow custom reflection agents (defaults to built-in)enable_reflection: boolwithreflection: Option<ReflectionConfig>inStandardDefinitionfor more flexible configurationvalidate_reflection_agent()method to ensure reflection agents have the "reflect" tool configuredis_reflection_enabled()and newreflection_config()getterReflect Tool Implementation
ReflectToolas a built-in tool with structured parameters:quality: enum (excellent/good/fair/poor)completeness: enum (complete/partial/incomplete)should_continue: boolean (retry decision)reason: optional explanationget_builtin_tools()Reflection Agent Loop
run_reflection_agent()to returnReflectionResultcontaining both text content and structured tool call resultshould_continuefrom the tool call output instead of text pattern matching ("Should Continue: YES/NO")Reflection Agent Markdown
reflection_agent.mdto:Execution Result Formatting
ExecutionResult::as_observation()to prevent token bloat:Cleanup & Removals
BrowserHooksConfigenum and related field fromStandardDefinitionbrowser_hooksconfiguration optionTest Improvements
test_store_config()helper to create in-memory SQLite databases for testsImplementation Details
https://claude.ai/code/session_019JhhSoRA6gQZNvpWsGWxNN