Skip to content

New features#488

Merged
DoganK01 merged 10 commits intomasterfrom
new-features
Dec 2, 2025
Merged

New features#488
DoganK01 merged 10 commits intomasterfrom
new-features

Conversation

@DoganK01
Copy link
Copy Markdown
Member

  • DeepAgent refactored
  • Tool adding capability via Agent class
  • Gmail, Whatsapp, Slack custom tools and interfaces
  • Tool handling centralized and removed unncessary things
  • Added feature of KnowledgeBase to be injected as a tool. (RAG as tool)
  • Some test fixes
  • Groq client import fix
  • import_error added for missing places

@claude
Copy link
Copy Markdown

claude bot commented Nov 28, 2025

Pull Request Review: New Features (#488)

Overview

This PR introduces significant architectural improvements and new features to the Upsonic agent framework. The changes are substantial (+7591/-2681 lines) and touch core components of the system.

✅ Strengths

1. Architecture Improvements

  • DeepAgent Refactoring: The new modular architecture in src/upsonic/agent/deepagent/ is well-structured with clear separation of concerns (backends, tools, constants).
  • Tool Management Centralization: The ToolManager improvements provide better organization and the new add_tools() method on the Agent class is a clean API.
  • KnowledgeBase as Tool: Inheriting from ToolKit (line 24 in knowledge_base.py) is an elegant design that allows RAG capabilities to be injected as tools.

2. Security Enhancements

  • Tool Policy Manager: New ToolPolicyManager (src/upsonic/agent/tool_policy_manager.py) provides robust pre/post-execution validation for tools - excellent security addition.
  • Policy Integration: Proper integration with existing safety engine through tool_policy_pre and tool_policy_post parameters.
  • Async Validation: Proper async/await pattern in policy validation (lines 85-188, 189-288 in tool_policy_manager.py).

3. New Integrations

  • Gmail, WhatsApp, Slack Tools: Well-documented with clear setup instructions and environment variable handling.
  • Error Handling: Proper use of error_log when credentials are missing (whatsapp.py:36, 40).

⚠️ Issues & Concerns

1. Critical Bugs

Event Loop Management Issue (agent.py:514-541)

try:
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()
    validation_result = asyncio.run(...)  # ❌ DANGEROUS
except RuntimeError:
    validation_result = asyncio.run(...)

Problem: Calling asyncio.run() while inside a running event loop will fail even with nest_asyncio. This appears in both sequential and parallel tool execution paths.

Solution: Use await directly or loop.create_task():

validation_result = await self.tool_policy_pre_manager.execute_tool_validation_async(...)

Duplicate Validation Logic (agent.py:787-816, 847-873)

The tool call validation code is duplicated for sequential and parallel execution. This violates DRY and makes maintenance harder.

Recommendation: Extract to a helper method:

async def _validate_tool_call(self, tool_call, tool_def):
    # validation logic here
    pass

2. Potential Performance Issues

Tool Registration on Every Task (agent.py:560-621)

Pre-execution tool validation runs synchronously for ALL tools on EVERY task execution. For agents with many tools, this could cause significant latency.

Recommendations:

  • Cache validation results per tool definition hash
  • Make validation truly async to avoid blocking
  • Consider moving to agent initialization for agent-level tools

Agent Tools Re-registration (agent.py:324-334, 483-506)

_register_agent_tools() is called in both __init__ and add_tools(), which could lead to duplicate registrations if not handled carefully in ToolManager.register_tools().

3. Breaking Changes Not Documented

  • Removed create_deep_agent function: This was in the public API (__all__) but is now removed without deprecation warning.
  • DeepAgent Internal API Changes: The old DeepAgentState class and related internal APIs are completely changed, which could break existing code.

Recommendation: Add deprecation warnings or migration guide in the PR description.

4. Security Concerns

Credential Handling (gmail.py, whatsapp.py, slack.py)

  • ✅ Good: Uses environment variables
  • ⚠️ Warning: No validation of credential format before use
  • ⚠️ Missing: No guidance on secure storage (e.g., using secrets managers)

Tool Policy Validation Timing

The naming is confusing:

  • tool_policy_pre runs during _setup_task_tools() which is AFTER agent init
  • tool_policy_post runs BEFORE tool execution (not after)

Recommendation: Rename for clarity:

  • tool_policy_pretool_registration_policy
  • tool_policy_posttool_execution_policy

5. Test Coverage Concerns

Looking at tests/unit_tests/agent/test_deep_agent.py:

  • ✅ Good: Comprehensive unit tests for DeepAgent
  • ⚠️ Many tests are skipped or have pass implementations (lines 165, 262-266, 268-272, 286-290)
  • ❌ Missing: Tests for the new tool policy managers
  • ❌ Missing: Integration tests for Gmail/WhatsApp/Slack tools
  • ❌ Missing: Tests for add_tools() method
  • ❌ Missing: Tests for KnowledgeBase as a tool

6. Code Quality Issues

Import Error Handling

The PR description mentions "import_error added for missing places" but I don't see consistent try/except ImportError blocks for optional dependencies (Gmail, WhatsApp, Slack tools).

Recommendation: Wrap optional imports:

try:
    import google.auth
except ImportError:
    raise ImportError("Gmail tools require 'google-auth' package. Install with: pip install upsonic[gmail]")

Type Hints

  • Missing return type hint for add_tools() (agent.py:510)
  • Missing type hints in several new tool files

Documentation

  • Missing docstrings for tool_policy_pre and tool_policy_post parameters in many places
  • The new tools parameter on Agent class (line 145) could use examples in the docstring

🧪 Testing Recommendations

  1. Add tests for:

    • Tool policy managers (pre and post execution)
    • Agent.add_tools() with various tool types
    • KnowledgeBase registration as a tool
    • Error cases for custom tools (missing credentials, etc.)
  2. Integration tests:

    • End-to-end test with tool policies blocking malicious tools
    • Test tool validation performance with 50+ tools
  3. Fix skipped tests:

    • Complete the implementations in test_deep_agent.py or remove if not applicable

📋 Recommendations Summary

Must Fix (Blocking Issues)

  1. Fix async/await bug in tool validation (agent.py:514-541, 787-816, 847-873)
  2. Add tests for new tool policy managers
  3. Remove duplicate validation logic (DRY violation)

Should Fix (Important)

  1. ⚠️ Add deprecation warnings for removed create_deep_agent API
  2. ⚠️ Clarify naming: tool_policy_pre/posttool_registration_policy/execution_policy
  3. ⚠️ Add import error handling for optional dependencies
  4. ⚠️ Complete or remove skipped tests in test_deep_agent.py

Nice to Have

  1. 💡 Add caching for tool validation results
  2. 💡 Add type hints consistently across new files
  3. 💡 Add migration guide for DeepAgent API changes

📈 Overall Assessment

This is a substantial and valuable contribution that adds important features (tool policies, custom integrations, better tool management). However, there are critical bugs (async/await issues) and missing test coverage that should be addressed before merging.

Recommendation: Request changes for the async bug fixes and test additions, then approve.


Code Quality: ⭐⭐⭐⭐ (4/5)
Security: ⭐⭐⭐⭐ (4/5)
Test Coverage: ⭐⭐ (2/5)
Documentation: ⭐⭐⭐ (3/5)

Overall: ⭐⭐⭐ (3.25/5) - Good work, but needs critical fixes before merge.

@claude
Copy link
Copy Markdown

claude bot commented Nov 28, 2025

Pull Request Review - New Features (#488)

Overview

This is a substantial PR introducing several valuable features. The overall direction is good, but there are some critical issues that need to be addressed before merging.


🎯 Major Changes Reviewed

✅ Positive Changes

  1. DeepAgent Refactoring: The refactoring from deep_agent/ to deepagent/ with ToolKit-based architecture is a solid improvement. The new structure is cleaner and more maintainable.

  2. Agent-Level Tools: The ability to add tools directly to agents via add_tools() and tools parameter is a great UX improvement that provides more flexibility.

  3. Tool Safety Framework: The new ToolPolicyManager with pre/post-execution validation adds important security layers.

  4. KnowledgeBase as ToolKit: Making KnowledgeBase inherit from ToolKit is elegant and allows seamless RAG integration as a tool.

  5. Import Error Handling: Consistent use of import_error() helper across the codebase improves user experience for missing dependencies.


🔴 Critical Issues

1. nest_asyncio Usage is Problematic (src/upsonic/agent/agent.py:595-610)

Issue: The code unconditionally imports and applies nest_asyncio when detecting an existing event loop:

try:
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()  # Applied every time!
    validation_result = asyncio.run(...)

Problems:

  • nest_asyncio.apply() patches the global event loop and should only be called once
  • This code calls it on every tool validation, causing potential side effects
  • The logic is backwards: if a loop is already running, you should await the coroutine, not use asyncio.run()
  • asyncio.run() will fail if called within an existing event loop, even with nest_asyncio

Recommended Fix:

# Option 1: Just await if in async context
try:
    loop = asyncio.get_running_loop()
    validation_result = await self.tool_policy_pre_manager.execute_tool_validation_async(...)
except RuntimeError:
    # No event loop - safe to use asyncio.run()
    validation_result = asyncio.run(...)

# Option 2: If you must use nest_asyncio, apply it once at module level

Impact: High - this could cause subtle bugs in async code and performance issues.


2. Naming Confusion: tool_policy_pre vs tool_policy_post

Issue: The parameters are named tool_policy_pre and tool_policy_post, but:

  • "Pre-execution" validation happens during tool registration (setup time)
  • "Post-execution" validation happens before tool execution (not after)

This is confusing. The validation labeled "post" actually blocks execution, so it's still "pre-execution" in the runtime sense.

Recommendation: Consider renaming to clarify intent:

  • tool_registration_policy (validates at registration time)
  • tool_invocation_policy (validates before each call)

Or add clear documentation explaining the terminology.


3. Missing Error Handling for String Task Auto-Conversion

Issue: In agent.py:1464-1465 and 1575-1576, string tasks are auto-converted to Task objects:

if isinstance(task, str):
    task = TaskClass(description=task)

Problems:

  • No validation that the string is non-empty
  • No handling of edge cases (very long strings, special characters, etc.)
  • Silent conversion may hide user errors

Recommended Fix:

if isinstance(task, str):
    if not task.strip():
        raise ValueError("Task description cannot be empty")
    task = TaskClass(description=task)

4. SlackTools Missing ToolKit Inheritance

Issue: SlackTools class (src/upsonic/tools/custom_tools/slack.py:17) does not inherit from ToolKit, unlike WhatsAppTools and GmailTools.

Impact: This inconsistency means SlackTools won't work with the new tool processing system that expects ToolKit instances.

Fix: Change class SlackTools: to class SlackTools(ToolKit):


5. Tool Registration Duplication Risk

Issue: In agent.py:_register_agent_tools(), agent tools are registered every time add_tools() is called:

def add_tools(self, tools):
    self.tools.extend(tools)
    self._register_agent_tools()  # Re-registers ALL tools, not just new ones

The _register_agent_tools() method processes all self.tools, meaning previously registered tools get processed again.

Impact: Could lead to duplicate tool registrations or wasted processing.

Recommended Fix: Either:

  1. Make register_tools() idempotent (detect duplicates)
  2. Only register the new tools in add_tools()

⚠️ Security Concerns

1. Gmail OAuth Token Storage

The Gmail implementation stores OAuth tokens in token.json with no path specification or security guidance. Consider:

  • Documenting secure storage practices
  • Adding option to customize token storage location
  • Warning about token security in production

2. WhatsApp/Slack Credential Exposure

Both tools accept credentials as parameters and fall back to environment variables. Ensure:

  • Clear documentation about not hardcoding credentials
  • Consider adding validation that credentials aren't being passed as literals in code

🧪 Test Coverage Concerns

Issues Found:

  1. No Tests for New Features: Couldn't identify tests for:

    • Agent-level tools (add_tools(), get_tool_defs())
    • Tool policy pre/post validation
    • String-to-Task auto-conversion
    • New Gmail/WhatsApp/Slack tools
  2. DeepAgent Test Updates: Noticed test_deep_agent.py in modified files - please verify these tests cover the refactored architecture

  3. Integration Tests: Consider adding integration tests that verify:

    • Agent + KnowledgeBase as tool
    • Agent + custom tool toolkits
    • Tool policy enforcement

📝 Code Quality Issues

1. Inconsistent Documentation

Some new parameters lack proper docstring documentation:

  • tool_policy_pre and tool_policy_post in Agent.init
  • The behavior of enable_subagents auto-detection could be clearer

2. Magic Numbers

tool_call_limit: int = 20 in DeepAgent - consider making this a constant or explaining why 20 is chosen

3. Type Hints

Some return types are missing or use str when they should be more specific (e.g., JSON strings could be typed as str with a comment)


🚀 Performance Considerations

  1. Tool Validation Overhead: Pre-execution tool validation runs synchronously during _setup_task_tools(). For agents with many tools, this could be slow. Consider:

    • Making this async
    • Caching validation results
    • Making it optional with a flag
  2. nest_asyncio Impact: If the nest_asyncio issue isn't fixed, it could cause event loop overhead


✅ Recommendations Summary

Must Fix Before Merge:

  1. ✅ Fix nest_asyncio usage pattern (Critical)
  2. ✅ Add ToolKit inheritance to SlackTools (Critical)
  3. ✅ Add validation for empty task strings (Important)
  4. ✅ Fix tool registration duplication in add_tools() (Important)

Should Address:

  1. Add tests for new features
  2. Clarify tool_policy naming/documentation
  3. Add security documentation for OAuth tokens
  4. Document magic numbers and design decisions

Nice to Have:

  1. Optimize tool validation performance
  2. Improve type hints
  3. Add integration tests

📊 Overall Assessment

Code Quality: 7/10 - Good architecture, but implementation issues
Security: 6/10 - Basic security in place, needs documentation
Test Coverage: 4/10 - Many new features lack tests
Documentation: 6/10 - Some areas need clarification

Recommendation: Request Changes - Address critical issues before merging, especially the nest_asyncio bug and SlackTools inheritance.


Great work on this feature set! The tool system improvements are valuable. Please address the critical issues above, and this will be ready to merge. Let me know if you need clarification on any feedback.

 and rag as tool
@claude
Copy link
Copy Markdown

claude bot commented Dec 2, 2025

Pull Request Review: New Features (#488)

Overview

This is a substantial PR with 11,029 additions and 3,889 deletions (+7,140 net), introducing several major features:

  • DeepAgent refactoring with modular architecture
  • Dynamic tool management via Agent class
  • Custom communication tools (Gmail, WhatsApp, Slack)
  • Tool safety policies (pre/post execution validation)
  • RAG as a tool capability
  • Centralized tool handling

Code Quality & Best Practices

✅ Strengths

  1. Excellent Architecture Refactoring

    • DeepAgent moved from monolithic structure to modular toolkits (Planning, Filesystem, Subagent)
    • New backend abstraction (BackendProtocol, StateBackend, MemoryBackend) provides flexibility
    • Clean separation of concerns with dedicated ToolPolicyManager for safety validation
  2. Dynamic Tool Management

    • New add_tools() and remove_tools() methods enable runtime tool configuration
    • Proper tool deduplication and lifecycle management
    • Support for multiple tool types (functions, classes, agents, MCP handlers)
  3. Enhanced Safety

    • Tool policy validation at both pre-execution (registration) and post-execution (call-time)
    • Dedicated ToolPolicyManager with proper async handling
    • Good separation between user/agent/tool policies
  4. Better Developer Experience

    • Support for string tasks: agent.do("task description") auto-converts to Task
    • Version bump to 0.68.0 appropriately reflects major changes

⚠️ Issues & Concerns

1. Critical: Async Event Loop Handling Bug (src/upsonic/agent/agent.py:518-538)

try:
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()
    validation_result = asyncio.run(...)  # ❌ This will ALWAYS fail
except RuntimeError:
    validation_result = asyncio.run(...)

Problem: If get_running_loop() succeeds, calling asyncio.run() inside that loop will raise RuntimeError even with nest_asyncio. The correct pattern is:

try:
    loop = asyncio.get_running_loop()
    validation_result = await self.tool_policy_pre_manager.execute_tool_validation_async(...)
except RuntimeError:
    validation_result = asyncio.run(...)

Impact: Tool validation will likely fail in async contexts, potentially blocking all tool registration.

Recommendation: Either make _validate_tools_with_policy_pre async, or use a proper sync wrapper.

2. Dependency Missing (pyproject.toml:77)

The PR adds outlines>=1.2.9 as a dependency but I don't see it being imported anywhere in the diff. Is this:

  • A premature addition for future features?
  • Used in unchanged parts of the codebase?
  • Actually needed for this PR?

Please clarify or remove if unused.

3. Test Coverage Concerns

The PR description mentions "Some test fixes" but:

  • No new test files are added for the major new features
  • Dynamic tool management (add_tools, remove_tools) needs tests
  • Tool policy validation flows need coverage
  • DeepAgent refactoring should have migration tests

Recommendation: Add comprehensive tests for:

  • add_tools() / remove_tools() edge cases
  • Tool policy validation (pre/post execution)
  • DeepAgent with different backend configurations
  • String-to-Task conversion

4. Breaking API Changes

The DeepAgent refactoring moves from deep_agent/ to deepagent/ and removes create_deep_agent():

# Old (removed)
from upsonic.agent.deep_agent import create_deep_agent

# New
from upsonic.agent.deepagent import DeepAgent

Recommendations:

  • Add deprecation warnings in old module before removing
  • Update documentation and migration guide
  • Consider keeping create_deep_agent() as an alias for one release cycle

5. Documentation Missing

For a PR this large, the description is quite brief. Missing:

  • Migration guide for existing DeepAgent users
  • Examples of new tool management APIs
  • Documentation for tool safety policies
  • How to use the new communication tools (Gmail, WhatsApp, Slack)

6. Code Style Issues

Inconsistent tool list handling (agent.py:560-561):

final_tools = list(self.tools)  # Defensive copy

# Later...
tools_to_add = list(tools)  # Redundant if tools is already a list

Magic string usage (agent.py:508):

if tool_def.name == 'plan_and_execute':  # Use constant
    continue

Potential None attribute access (agent.py:843):

task_builtin_tools = getattr(task, 'task_builtin_tools', [])

Better to initialize task.task_builtin_tools explicitly when task is created.

Security Concerns

⚠️ Moderate Issues

  1. Tool Policy Bypass Risk
    The policy validation happens AFTER tool registration in ToolManager. If registration has side effects, they occur before validation:

    newly_registered = self.tool_manager.register_tools(...)  # ← Side effects here
    self._validate_tools_with_policy_pre(...)  # ← Validation after

    Recommendation: Validate before registration or ensure registration is idempotent.

  2. Exception Handling in Parallel Tool Execution
    In _execute_tool_calls(), if a tool call validation fails in parallel execution, other parallel calls may still proceed. Consider whether this is desired behavior.

  3. Missing Input Sanitization
    The new string-to-Task conversion doesn't validate or sanitize the input:

    if isinstance(task, str):
        task = TaskClass(description=task)  # What if task contains injection attempts?

Performance Considerations

  1. Redundant Tool Extraction
    extract_builtin_tools() is called multiple times for the same tools in add_tools(). Consider caching.

  2. O(n²) Tool Deduplication
    Checking if tool.unique_id not in existing_ids for each tool. For large tool sets, use set intersection.

  3. Synchronous Policy Validation in Hot Path
    Tool policy validation using asyncio.run() blocks the thread. Consider making the entire flow async or using a thread pool.

Recommendations

High Priority

  1. Fix the async event loop handling bug (Critical)
  2. Add comprehensive test coverage for new features
  3. Provide migration guide and deprecation warnings
  4. Clarify/remove the outlines dependency if unused

Medium Priority

  1. Document the new tool management APIs with examples
  2. Add validation before tool registration (security)
  3. Make tool policy validation properly async
  4. Initialize task.task_builtin_tools explicitly

Low Priority

  1. Use constants instead of magic strings
  2. Optimize tool deduplication
  3. Add comprehensive docstrings for new methods

Overall Assessment

This PR introduces valuable features with a well-architected refactoring. The core design decisions are sound:

  • Modular DeepAgent architecture
  • Dynamic tool management
  • Layered safety policies

However, the async event loop handling bug is critical and will likely cause runtime failures. The lack of tests for such a large refactoring is also concerning.

Recommendation: Fix the critical bug and add test coverage before merging. Consider splitting this PR into smaller chunks (DeepAgent refactor, tool management, communication tools) for easier review and testing.


Review generated using Claude Code. Let me know if you need clarification on any points!

@DoganK01 DoganK01 merged commit 98c78f2 into master Dec 2, 2025
4 checks passed
@DoganK01 DoganK01 deleted the new-features branch December 2, 2025 15:45
@claude
Copy link
Copy Markdown

claude bot commented Dec 2, 2025

Pull Request Review: New Features (PR 488) - This is a major refactoring PR with approximately 8000 lines of changes across 103 files. The PR introduces significant architectural improvements to the Upsonic agent framework, including centralized tool management, new interfaces, enhanced safety policies, and a complete DeepAgent rewrite. STRENGTHS: (1) Excellent Tool Management Architecture - centralized registration, dynamic tool management, smart deduplication, clear separation; (2) Robust Safety Engine Integration - pre/post-execution validation, LLM-powered analysis; (3) Clean DeepAgent Refactoring - modular architecture, better encapsulation; (4) Professional Gmail Interface - OAuth 2.0, API protection, error handling; (5) Import Error Handling across 73 files. CRITICAL ISSUES: (1) Gmail Input Validation at gmail.py:88-92 - email validation too permissive, no length validation (DoS risk), missing validation for lists. Recommendation: Strengthen with RFC 5321 compliance. (2) Agent String Task Conversion at agent.py:1591-1593 - auto-conversion happens silently, breaking API change. Recommendation: Add deprecation warning and documentation. MEDIUM ISSUES: (1) Async Event Loop Handling at agent.py:522-545 - nest_asyncio modifies global state. Use create_task instead. (2) Excessive Tool Validation at agent.py:487-555 - no caching, O(n) overhead. Recommendation: Cache by unique_id. (3) Missing Test Coverage - only 1 test file found. Need tests for tool policy, dynamic tools, Gmail interface. (4) Breaking Changes Not Documented - need migration guide. SECURITY: Email validation needs strengthening, no rate limiting on Gmail, consider input sanitization. BEFORE MERGE REQUIRED: Fix email validation, fix async event loop, add core tests, document breaking changes, add deprecation warning for string-to-Task. ASSESSMENT: Quality 4/5, Architecture 5/5, Security 3/5, Test Coverage 2/5. This is high-quality architectural work with excellent design decisions. However, critical security issues and insufficient test coverage must be addressed before merge. RECOMMENDATION: Approve with changes - address critical issues and add tests, then merge. Great work on this major refactoring!

DoganK01 added a commit that referenced this pull request Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant