Skip to content

refactor: refactor agent printing hierarchy#515

Merged
DoganK01 merged 3 commits intomasterfrom
printing-fix-v2
Jan 31, 2026
Merged

refactor: refactor agent printing hierarchy#515
DoganK01 merged 3 commits intomasterfrom
printing-fix-v2

Conversation

@DoganK01
Copy link
Copy Markdown
Member

  • Adjust agent printing order

@claude
Copy link
Copy Markdown

claude bot commented Jan 30, 2026

Code Review: Refactor Agent Printing Hierarchy

Overview

This PR refactors the agent printing mechanism to establish a clear hierarchy for the print flag. The implementation is generally well-designed, but there are several areas that need attention.


Strengths

  1. Clear Hierarchy: The three-level priority system (ENV > constructor param > method default) is well-documented and logical
  2. Backward Compatibility: Good attempt to maintain compatibility by keeping the self.print attribute
  3. Code Organization: The _resolve_print_flag() helper method is clean and well-documented
  4. Utility Function: get_env_bool_optional() is a useful addition for distinguishing between "not set" and "false"

🐛 Critical Issues

1. Breaking Behavioral Change in print_do() (HIGH PRIORITY)

The old implementation explicitly printed the result:

# Old behavior
def print_do(self, task, ...):
    result = self.do(task, ...)
    success_log(f"{result}", "Agent")  # Explicit print
    return result

The new implementation delegates to do_async() with _print_method_default=True, relying on internal printing logic. This is a breaking change if the internal printing behavior differs from the old explicit success_log() call.

Impact: Users upgrading may see different output formatting or missing output entirely.

Recommendation: Add regression tests to verify print_do() produces identical output to the old implementation, or document this as a breaking change in release notes.


2. State Mutation Side Effect (HIGH PRIORITY)

# Line 2529 in do_async()
self.print = self._resolve_print_flag(_print_method_default)

Problem: This mutates instance state on every do_async() call, creating race conditions in concurrent scenarios:

agent = Agent(print=True)
# Thread 1 calls agent.do() -> sets self.print = False
# Thread 2 calls agent.print_do() -> sets self.print = True  
# Thread 1 uses self.print (now True!) -> unexpected behavior

Recommendation: Either:

  • Option A (Preferred): Pass the resolved print flag as a parameter through the call chain instead of mutating state
  • Option B: Document that Agent instances are not thread-safe and recommend creating separate instances per thread
  • Option C: Use thread-local storage for the print flag

3. Inconsistent Default Value (MEDIUM PRIORITY)

# Line 274
self.print: bool = False  # Default is False

But the old behavior was:

# Old code
self.print = get_env_bool("UPSONIC_AGENT_PRINT", default=True)

Impact: If code accesses agent.print directly before calling do() or print_do(), it will get False instead of the environment variable value or True.

Recommendation: Set an initial value using _resolve_print_flag(method_default=True) or document this as a breaking change.


⚠️ Design Concerns

4. Code Duplication (MEDIUM PRIORITY)

Lines 2712-2719 and 2680-2684 are nearly identical:

# Duplicated in both do() and print_do()
from upsonic.tasks.tasks import Task as TaskClass
if isinstance(task, str):
    task = TaskClass(description=task)

task.price_id_ = None
_ = task.price_id
task._tool_calls = []

Recommendation: Extract this into a helper method like _prepare_task() to follow DRY principles.


5. Unclear Method Return Behavior (LOW PRIORITY)

The print_do() docstring says it returns task content or AgentRunOutput based on return_output, but the old implementation always returned the full result without a return_output parameter.

Recommendation: Add migration notes if this changes user-facing behavior.


🧪 Test Coverage Concerns

  1. Missing Tests: No tests verify the new hierarchy behavior (ENV > param > method)
  2. Missing Edge Cases:
    • What happens when UPSONIC_AGENT_PRINT=false (string "false")?
    • Concurrent calls to do() and print_do()
    • Direct access to agent.print attribute

Recommendation: Add tests for:

def test_print_hierarchy_env_overrides_all():
    os.environ['UPSONIC_AGENT_PRINT'] = 'false'
    agent = Agent(print=True)
    # Should use ENV value (False) not constructor param (True)
    
def test_print_hierarchy_param_overrides_method():
    agent = Agent(print=False)
    # agent.print_do() should still respect print=False

🔒 Security & Performance

  • No security issues identified
  • Performance: The _resolve_print_flag() method is called on every do_async() execution. This is negligible overhead but could be optimized by caching when ENV/param don't change.

📝 Documentation & Style

  1. Mixed Language Comments: Line 117 has Turkish comments (Environment variable'dan boolean değer al) while line 123-127 uses English. Should be consistent (preferably English for open source).
  2. Missing Type Hints: Line 2505 _print_method_default could specify it's for internal use more clearly with a leading underscore in docstring.

🎯 Recommendations Summary

Must Fix Before Merge:

  1. Address the state mutation race condition (Issue Added splitting long audios #2)
  2. Add tests for the new hierarchy behavior
  3. Verify backward compatibility of print_do() output format

Should Fix:

  1. Fix inconsistent self.print default value (Issue Added input box and button #3)
  2. Extract duplicated task preparation code (Issue Added icon #4)
  3. Standardize comments to English

Nice to Have:

  1. Document thread-safety considerations
  2. Add comprehensive edge case tests

Overall, this is a well-intentioned refactor that improves the print flag architecture, but needs fixes for thread-safety and backward compatibility before merging. The concept is solid, execution needs refinement.

Suggested Action: Request changes for Issues #2 and #3, plus test coverage before approval.

@claude
Copy link
Copy Markdown

claude bot commented Jan 31, 2026

Comprehensive Code Review: PR #515 - Refactor Agent Printing Hierarchy

Executive Summary

This PR introduces a well-designed printing hierarchy system but has critical thread-safety issues and several design concerns that must be addressed before merging. The concept is solid, but the implementation needs refinement.


🎯 Architecture & Design

✅ Strengths

  1. Clear Three-Level Hierarchy: The ENV → constructor → method default priority is well-documented and intuitive
  2. Type-Safe Optional Handling: get_env_bool_optional() correctly distinguishes between unset and false values
  3. Improved Test Coverage: Excellent comprehensive memory tests with separate direct/streaming flow coverage
  4. Documentation: Good inline comments explaining the hierarchy at src/upsonic/agent/agent.py:267-270

⚠️ Design Concerns

1. State Mutation Creates Race Conditions (🔴 CRITICAL)

Location: src/upsonic/agent/agent.py:2488

self.print = self._resolve_print_flag(_print_method_default)

Problem: Mutating instance state on every method call breaks thread-safety:

agent = Agent(print=True)

# Concurrent scenario:
# Thread 1: agent.do(task1) → sets self.print = False
# Thread 2: agent.print_do(task2) → sets self.print = True
# Thread 1: uses self.print (now True!) → wrong behavior

Impact:

  • Multi-threaded applications get unpredictable printing behavior
  • Agent instances cannot be safely shared across async tasks
  • Violates principle of immutable configuration

Recommendation:

# Option A (Preferred): Pass resolved value through pipeline
async def do_async(self, task, ..., _print_method_default: bool = False):
    print_flag = self._resolve_print_flag(_print_method_default)
    # Pass print_flag as parameter instead of mutating self.print
    
# Option B: Use local variable and avoid self.print mutation
# Option C: Add @synchronized decorator or locks (adds complexity)

2. Inconsistent Default Behavior (🟡 MEDIUM)

Location: src/upsonic/agent/agent.py:274

self.print: bool = False  # Always False initially

Old Behavior:

self.print = get_env_bool("UPSONIC_AGENT_PRINT", default=True)  # Respects ENV, defaults to True

Impact:

  • Code that reads agent.print before calling methods gets False instead of ENV/True
  • Breaking change for introspection use cases
  • Example broken code:
    agent = Agent()
    if agent.print:  # Always False now, was True before
        configure_logger()

Recommendation: Initialize with resolved value:

self.print: bool = self._print_env if self._print_env is not None else (print if print is not None else True)

3. Code Duplication (🟡 MEDIUM)

Locations:

  • src/upsonic/agent/agent.py:2680-2686 (do method)
  • src/upsonic/agent/agent.py:2712-2719 (print_do method)

Identical task preparation logic repeated:

from upsonic.tasks.tasks import Task as TaskClass
if isinstance(task, str):
    task = TaskClass(description=task)
task.price_id_ = None
_ = task.price_id
task._tool_calls = []

Recommendation: Extract to helper method:

def _prepare_task_for_execution(self, task: Union[str, "Task"]) -> "Task":
    """Prepare task object for execution."""
    if isinstance(task, str):
        from upsonic.tasks.tasks import Task as TaskClass
        task = TaskClass(description=task)
    task.price_id_ = None
    _ = task.price_id
    task._tool_calls = []
    return task

🐛 Potential Bugs

4. Memory Test Helper Missing Error Handling (🟡 MEDIUM)

Location: tests/smoke_tests/memory/test_storage_full_memory.py:448-494

The consume_stream() helper doesn't validate storage/session parameters:

async def consume_stream(agent, task, storage=None, session_id=None):
    # What if storage is None but session_id is provided?
    # What if session fails to deserialize?

Recommendation: Add defensive checks:

if storage and session_id:
    session_before = storage.get_session(...)
    if not session_before:
        logger.warning(f"Session {session_id} not found")

5. Pipeline Step Comment Mismatch (🟢 LOW)

Location: src/upsonic/agent/pipeline/steps.py:2627-2629

# Add tool results to chat_history (response already added above)
context.chat_history.append(ModelRequest(parts=tool_results))

Comment says "response already added above" but this was refactored. Should update comment for clarity.


🧪 Test Coverage

✅ Excellent Additions

  1. Comprehensive Memory Tests: Great separation of direct vs streaming flows (lines 503-949)
  2. Helper Functions: count_message_types(), assert_valid_message_pairs() are well-designed
  3. Mixed Flow Testing: test_mixed_direct_and_streaming_flows() catches edge cases

❌ Missing Test Coverage

  1. No tests for the new print hierarchy:

    # Missing tests like:
    def test_env_overrides_constructor_param(monkeypatch):
        monkeypatch.setenv("UPSONIC_AGENT_PRINT", "false")
        agent = Agent(print=True)
        # Verify ENV wins
    
    def test_constructor_overrides_method_default():
        agent = Agent(print=False)
        # Verify print_do respects constructor param
  2. No concurrent execution tests for race condition verification

  3. Smoke tests don't verify actual output:
    tests/smoke_tests/agent/test_agent_print_behavior.py just calls methods without asserting output

Recommendation: Add assertions like:

def test_print_do_default_prints(capsys):
    agent = Agent()
    task = Task("test")
    agent.print_do(task)
    captured = capsys.readouterr()
    assert len(captured.out) > 0, "print_do should produce output"

🔒 Security & Performance

Security

✅ No security vulnerabilities identified

Performance

🟡 Minor Concern: _resolve_print_flag() called on every do_async() invocation

  • Current overhead: ~3 conditional checks per call (negligible)
  • Could optimize with caching if ENV/param never change, but current approach is simpler and safer

📝 Code Quality & Style

1. Type Hints

Good type annotations throughout. Minor improvement:

# Current (line 271)
self._print_env: Optional[bool] = get_env_bool_optional("UPSONIC_AGENT_PRINT")

# Could add return type to _resolve_print_flag for completeness (already has params typed)

2. Documentation 🟡

  • Method docstrings updated appropriately
  • Missing migration notes for breaking changes
  • print_do() behavior change not documented in CHANGELOG

3. Naming

  • _print_env, _print_param clearly indicate internal use
  • _resolve_print_flag follows conventions
  • _print_method_default parameter name is descriptive

🎯 Recommendations Summary

🔴 Must Fix Before Merge

  1. Fix thread-safety issue: Remove state mutation or add synchronization (Issue Added reset system #1)
  2. Fix inconsistent default: Initialize self.print correctly (Issue Added splitting long audios #2)
  3. Add hierarchy tests: Verify ENV > param > method priority works correctly

🟡 Should Fix

  1. Extract duplicated task preparation code (Issue Added input box and button #3)
  2. Add output verification to smoke tests (missing coverage Added input box and button #3)
  3. Add defensive error handling in test helpers (Issue Added icon #4)

🟢 Nice to Have

  1. Add CHANGELOG/migration notes for breaking changes
  2. Document thread-safety guarantees in Agent docstring
  3. Consider caching resolved print flag for performance

📊 Overall Assessment

Category Rating Notes
Code Quality 7/10 Well-structured but has duplication
Test Coverage 8/10 Excellent memory tests, missing hierarchy tests
Security 10/10 No issues identified
Thread-Safety 3/10 Critical state mutation issue
Documentation 6/10 Good inline comments, missing migration notes
Overall 6.5/10 Good concept, needs fixes before merge

🚦 Verdict: ⚠️ REQUEST CHANGES

Blocking Issues:

  1. State mutation race condition (Issue Added reset system #1) - MUST FIX
  2. Missing test coverage for print hierarchy - MUST ADD
  3. Inconsistent default behavior (Issue Added splitting long audios #2) - SHOULD FIX

Recommended Action: Fix the critical thread-safety issue, add tests for the new behavior, then re-review. The architecture is sound, but implementation needs refinement for production readiness.


References

  • CLAUDE.md guidelines followed ✅
  • Testing structure adheres to tests/ organization ✅
  • uv/pytest conventions maintained ✅

Great work on improving the print flag architecture! The hierarchical approach is the right direction. Looking forward to the fixes. 🚀

@claude
Copy link
Copy Markdown

claude bot commented Jan 31, 2026

Pull Request Review: Refactor Agent Printing Hierarchy

Summary

This PR refactors the agent printing hierarchy to establish a clear priority order (ENV > Agent param > Method default) and fixes memory state mutation issues in streaming mode. The changes span 6 files with 933 additions and 111 deletions.


✅ Strengths

1. Clear Hierarchy and Design

The new three-tier priority system is well-documented and intuitive:

  • ENV variable (highest priority)
  • Agent constructor print parameter
  • Method name (print_do=True, do=False)

The _resolve_print_flag() method cleanly implements this hierarchy.

2. Thread Safety Improvements

  • Storing print_flag on AgentRunOutput (line 321 in output.py) instead of mutating self.print prevents race conditions in concurrent scenarios
  • Local resolution of the print flag per-run is a solid architectural choice

3. Comprehensive Test Coverage

  • New test_agent_print_behavior.py covers all priority combinations
  • Extensive memory tests (717 additions!) validate both direct and streaming flows
  • Tests verify messages vs chat_history semantics correctly

4. Memory Bug Fix

The fix in steps.py:228-236 ensures chat_history is set to historical messages BEFORE adding new messages, which prevents the race condition where new_messages() would incorrectly capture historical messages.

5. Streaming Response Tracking

Adding context.chat_history.append(final_response) at line 277 in steps.py ensures streaming responses are properly tracked in session memory.


⚠️ Issues & Concerns

1. Missing Test File Newline (Minor)

test_agent_print_behavior.py is missing a newline at end of file (line 92). This will cause warnings from pre-commit hooks.

Fix:

echo "" >> tests/smoke_tests/agent/test_agent_print_behavior.py

2. Test Assertions Are Weak (Medium)

The tests in test_agent_print_behavior.py don't actually verify that output was printed or not printed - they just call the methods. These are smoke tests that would pass even if the printing logic is broken.

Recommendation:
Consider capturing stdout/stderr or checking output visibility:

import io
import sys

def test_print_do_default_prints():
    captured_output = io.StringIO()
    sys.stdout = captured_output
    try:
        my_agent = Agent()
        task = Task("test task")
        my_agent.print_do(task)
        output = captured_output.getvalue()
        assert len(output) > 0, "Expected output to be printed"
    finally:
        sys.stdout = sys.__stdout__

3. Potential Breaking Change (High - Security/Behavior)

The new hierarchy reverses the previous priority. Before, the Agent constructor print parameter overrode the ENV variable. Now ENV overrides everything.

Previous behavior:

os.environ["UPSONIC_AGENT_PRINT"] = "false"
agent = Agent(print=True)  # Would print (param overrides ENV)

New behavior:

os.environ["UPSONIC_AGENT_PRINT"] = "false"
agent = Agent(print=True)  # Won't print (ENV overrides param)

This could break existing user code that relies on the parameter to override environment settings. Consider:

  • Documenting this as a breaking change in release notes
  • Adding a deprecation warning in the transition period
  • Or reconsidering if ENV should truly override explicit constructor parameters

4. Memory Test Complexity (Medium)

The memory tests are comprehensive but very complex (717 additions). Consider:

  • Breaking into smaller test files by feature area (streaming, direct, mixed)
  • Some helper functions like count_message_types() could be moved to a test utilities module
  • The StreamingRunResult class could be in a shared test fixtures file

5. Missing Type Hints (Low)

Line 122 in logging_config.py:

def get_env_bool_optional(key: str) -> "Optional[bool]":

The quotes around Optional[bool] suggest forward references, but Optional should be imported from typing. Verify the import is present.

6. Code Duplication in print_do Methods (Medium)

Lines 126-164 in agent.py show significant duplication between print_do() and print_do_async(). The sync version manually handles the event loop and calls do_async(). Consider if this could be simplified or if comments explaining the necessity would help.

7. State Mutation Warning Needed (Low)

The code correctly avoids mutating self.print during resolution, but a comment explaining why would help future maintainers:

# Store locally - don't mutate self.print to ensure thread-safety
resolved_print_flag = self._resolve_print_flag(_print_method_default)

🔒 Security Considerations

✅ No security concerns identified. The changes don't introduce:

  • Command injection vulnerabilities
  • SQL injection risks
  • XSS vulnerabilities
  • Sensitive data exposure

⚡ Performance Considerations

Positive Impact:

  • Thread-safe design prevents lock contention
  • No new synchronous blocking operations introduced

⚠️ Minor Concern:

  • _resolve_print_flag() is called per-run (lines 90, 109, 204) - this is acceptable overhead but could be optimized if profiling shows it's a hotspot

📊 Test Coverage

New Tests Added:

  • 7 print behavior tests in test_agent_print_behavior.py
  • 15+ new memory tests covering direct/streaming flows

Missing Tests:

  • No tests for concurrent access to print_flag
  • No tests verifying actual printed output
  • Edge case: What if ENV var is set to invalid value like "maybe"? (Currently returns False, which is correct)

Test Quality: ⭐⭐⭐⭐ (4/5)

  • Comprehensive coverage of memory scenarios
  • Print tests could be stronger with actual output verification

🎯 Recommendations

Must Fix Before Merge:

  1. ✅ Add newline at end of test_agent_print_behavior.py
  2. 📝 Document the breaking change in ENV/param priority
  3. 🧪 Consider strengthening print tests to verify actual output

Should Consider:

  1. 📚 Add comments explaining the thread-safety rationale
  2. 🔧 Verify Optional import in logging_config.py
  3. 📖 Update user documentation about the print hierarchy

Nice to Have:

  1. ♻️ Refactor test helpers into shared utilities
  2. 🧪 Add concurrent access tests
  3. 📊 Profile _resolve_print_flag() performance in high-throughput scenarios

Overall Assessment

Code Quality: ⭐⭐⭐⭐ (4/5)

  • Clean implementation
  • Good separation of concerns
  • Thread-safe design

Risk Level: 🟡 MEDIUM

  • Breaking change in priority hierarchy needs communication
  • Test assertions could be stronger

Recommendation:APPROVE with minor changes

The refactoring is solid and fixes real concurrency issues. The main concern is the breaking change in priority order - ensure this is intentional and well-documented. Once the newline is added and the breaking change is acknowledged, this is ready to merge.


Great work on the comprehensive test coverage and the thread-safety improvements! 🚀

@DoganK01 DoganK01 merged commit fd729d1 into master Jan 31, 2026
5 checks passed
@DoganK01 DoganK01 deleted the printing-fix-v2 branch January 31, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant