refactor: refactor agent printing hierarchy by DoganK01 · Pull Request #515 · Upsonic/Upsonic

DoganK01 · 2026-01-30T11:14:17Z

Adjust agent printing order

claude · 2026-01-30T11:16:07Z

Code Review: Refactor Agent Printing Hierarchy

Overview

This PR refactors the agent printing mechanism to establish a clear hierarchy for the print flag. The implementation is generally well-designed, but there are several areas that need attention.

✅ Strengths

Clear Hierarchy: The three-level priority system (ENV > constructor param > method default) is well-documented and logical
Backward Compatibility: Good attempt to maintain compatibility by keeping the self.print attribute
Code Organization: The _resolve_print_flag() helper method is clean and well-documented
Utility Function: get_env_bool_optional() is a useful addition for distinguishing between "not set" and "false"

🐛 Critical Issues

1. Breaking Behavioral Change in `print_do()` (HIGH PRIORITY)

The old implementation explicitly printed the result:

# Old behavior
def print_do(self, task, ...):
    result = self.do(task, ...)
    success_log(f"{result}", "Agent")  # Explicit print
    return result

The new implementation delegates to do_async() with _print_method_default=True, relying on internal printing logic. This is a breaking change if the internal printing behavior differs from the old explicit success_log() call.

Impact: Users upgrading may see different output formatting or missing output entirely.

Recommendation: Add regression tests to verify print_do() produces identical output to the old implementation, or document this as a breaking change in release notes.

2. State Mutation Side Effect (HIGH PRIORITY)

# Line 2529 in do_async()
self.print = self._resolve_print_flag(_print_method_default)

Problem: This mutates instance state on every do_async() call, creating race conditions in concurrent scenarios:

agent = Agent(print=True)
# Thread 1 calls agent.do() -> sets self.print = False
# Thread 2 calls agent.print_do() -> sets self.print = True  
# Thread 1 uses self.print (now True!) -> unexpected behavior

Recommendation: Either:

Option A (Preferred): Pass the resolved print flag as a parameter through the call chain instead of mutating state
Option B: Document that Agent instances are not thread-safe and recommend creating separate instances per thread
Option C: Use thread-local storage for the print flag

3. Inconsistent Default Value (MEDIUM PRIORITY)

# Line 274
self.print: bool = False  # Default is False

But the old behavior was:

# Old code
self.print = get_env_bool("UPSONIC_AGENT_PRINT", default=True)

Impact: If code accesses agent.print directly before calling do() or print_do(), it will get False instead of the environment variable value or True.

Recommendation: Set an initial value using _resolve_print_flag(method_default=True) or document this as a breaking change.

⚠️ Design Concerns

4. Code Duplication (MEDIUM PRIORITY)

Lines 2712-2719 and 2680-2684 are nearly identical:

# Duplicated in both do() and print_do()
from upsonic.tasks.tasks import Task as TaskClass
if isinstance(task, str):
    task = TaskClass(description=task)

task.price_id_ = None
_ = task.price_id
task._tool_calls = []

Recommendation: Extract this into a helper method like _prepare_task() to follow DRY principles.

5. Unclear Method Return Behavior (LOW PRIORITY)

The print_do() docstring says it returns task content or AgentRunOutput based on return_output, but the old implementation always returned the full result without a return_output parameter.

Recommendation: Add migration notes if this changes user-facing behavior.

🧪 Test Coverage Concerns

Missing Tests: No tests verify the new hierarchy behavior (ENV > param > method)
Missing Edge Cases:
- What happens when UPSONIC_AGENT_PRINT=false (string "false")?
- Concurrent calls to do() and print_do()
- Direct access to agent.print attribute

Recommendation: Add tests for:

def test_print_hierarchy_env_overrides_all():
    os.environ['UPSONIC_AGENT_PRINT'] = 'false'
    agent = Agent(print=True)
    # Should use ENV value (False) not constructor param (True)
    
def test_print_hierarchy_param_overrides_method():
    agent = Agent(print=False)
    # agent.print_do() should still respect print=False

🔒 Security & Performance

No security issues identified
Performance: The _resolve_print_flag() method is called on every do_async() execution. This is negligible overhead but could be optimized by caching when ENV/param don't change.

📝 Documentation & Style

Mixed Language Comments: Line 117 has Turkish comments (Environment variable'dan boolean değer al) while line 123-127 uses English. Should be consistent (preferably English for open source).
Missing Type Hints: Line 2505 _print_method_default could specify it's for internal use more clearly with a leading underscore in docstring.

🎯 Recommendations Summary

Must Fix Before Merge:

Address the state mutation race condition (Issue Added splitting long audios #2)
Add tests for the new hierarchy behavior
Verify backward compatibility of print_do() output format

Should Fix:

Fix inconsistent self.print default value (Issue Added input box and button #3)
Extract duplicated task preparation code (Issue Added icon #4)
Standardize comments to English

Nice to Have:

Document thread-safety considerations
Add comprehensive edge case tests

Overall, this is a well-intentioned refactor that improves the print flag architecture, but needs fixes for thread-safety and backward compatibility before merging. The concept is solid, execution needs refinement.

Suggested Action: Request changes for Issues #2 and #3, plus test coverage before approval.

claude · 2026-01-31T11:47:28Z

Comprehensive Code Review: PR #515 - Refactor Agent Printing Hierarchy

Executive Summary

This PR introduces a well-designed printing hierarchy system but has critical thread-safety issues and several design concerns that must be addressed before merging. The concept is solid, but the implementation needs refinement.

🎯 Architecture & Design

✅ Strengths

Clear Three-Level Hierarchy: The ENV → constructor → method default priority is well-documented and intuitive
Type-Safe Optional Handling: get_env_bool_optional() correctly distinguishes between unset and false values
Improved Test Coverage: Excellent comprehensive memory tests with separate direct/streaming flow coverage
Documentation: Good inline comments explaining the hierarchy at src/upsonic/agent/agent.py:267-270

⚠️ Design Concerns

1. State Mutation Creates Race Conditions (🔴 CRITICAL)

Location: src/upsonic/agent/agent.py:2488

self.print = self._resolve_print_flag(_print_method_default)

Problem: Mutating instance state on every method call breaks thread-safety:

agent = Agent(print=True)

# Concurrent scenario:
# Thread 1: agent.do(task1) → sets self.print = False
# Thread 2: agent.print_do(task2) → sets self.print = True
# Thread 1: uses self.print (now True!) → wrong behavior

Impact:

Multi-threaded applications get unpredictable printing behavior
Agent instances cannot be safely shared across async tasks
Violates principle of immutable configuration

Recommendation:

# Option A (Preferred): Pass resolved value through pipeline
async def do_async(self, task, ..., _print_method_default: bool = False):
    print_flag = self._resolve_print_flag(_print_method_default)
    # Pass print_flag as parameter instead of mutating self.print
    
# Option B: Use local variable and avoid self.print mutation
# Option C: Add @synchronized decorator or locks (adds complexity)

2. Inconsistent Default Behavior (🟡 MEDIUM)

Location: src/upsonic/agent/agent.py:274

self.print: bool = False  # Always False initially

Old Behavior:

self.print = get_env_bool("UPSONIC_AGENT_PRINT", default=True)  # Respects ENV, defaults to True

Impact:

Code that reads agent.print before calling methods gets False instead of ENV/True
Breaking change for introspection use cases

Example broken code:

agent = Agent()
if agent.print:  # Always False now, was True before
    configure_logger()

Recommendation: Initialize with resolved value:

self.print: bool = self._print_env if self._print_env is not None else (print if print is not None else True)

3. Code Duplication (🟡 MEDIUM)

Locations:

src/upsonic/agent/agent.py:2680-2686 (do method)
src/upsonic/agent/agent.py:2712-2719 (print_do method)

Identical task preparation logic repeated:

from upsonic.tasks.tasks import Task as TaskClass
if isinstance(task, str):
    task = TaskClass(description=task)
task.price_id_ = None
_ = task.price_id
task._tool_calls = []

Recommendation: Extract to helper method:

def _prepare_task_for_execution(self, task: Union[str, "Task"]) -> "Task":
    """Prepare task object for execution."""
    if isinstance(task, str):
        from upsonic.tasks.tasks import Task as TaskClass
        task = TaskClass(description=task)
    task.price_id_ = None
    _ = task.price_id
    task._tool_calls = []
    return task

🐛 Potential Bugs

4. Memory Test Helper Missing Error Handling (🟡 MEDIUM)

Location: tests/smoke_tests/memory/test_storage_full_memory.py:448-494

The consume_stream() helper doesn't validate storage/session parameters:

async def consume_stream(agent, task, storage=None, session_id=None):
    # What if storage is None but session_id is provided?
    # What if session fails to deserialize?

Recommendation: Add defensive checks:

if storage and session_id:
    session_before = storage.get_session(...)
    if not session_before:
        logger.warning(f"Session {session_id} not found")

5. Pipeline Step Comment Mismatch (🟢 LOW)

Location: src/upsonic/agent/pipeline/steps.py:2627-2629

# Add tool results to chat_history (response already added above)
context.chat_history.append(ModelRequest(parts=tool_results))

Comment says "response already added above" but this was refactored. Should update comment for clarity.

🧪 Test Coverage

✅ Excellent Additions

Comprehensive Memory Tests: Great separation of direct vs streaming flows (lines 503-949)
Helper Functions: count_message_types(), assert_valid_message_pairs() are well-designed
Mixed Flow Testing: test_mixed_direct_and_streaming_flows() catches edge cases

❌ Missing Test Coverage

No tests for the new print hierarchy:

# Missing tests like:
def test_env_overrides_constructor_param(monkeypatch):
    monkeypatch.setenv("UPSONIC_AGENT_PRINT", "false")
    agent = Agent(print=True)
    # Verify ENV wins

def test_constructor_overrides_method_default():
    agent = Agent(print=False)
    # Verify print_do respects constructor param

No concurrent execution tests for race condition verification
Smoke tests don't verify actual output:
tests/smoke_tests/agent/test_agent_print_behavior.py just calls methods without asserting output

Recommendation: Add assertions like:

def test_print_do_default_prints(capsys):
    agent = Agent()
    task = Task("test")
    agent.print_do(task)
    captured = capsys.readouterr()
    assert len(captured.out) > 0, "print_do should produce output"

🔒 Security & Performance

Security

✅ No security vulnerabilities identified

Performance

🟡 Minor Concern: _resolve_print_flag() called on every do_async() invocation

Current overhead: ~3 conditional checks per call (negligible)
Could optimize with caching if ENV/param never change, but current approach is simpler and safer

📝 Code Quality & Style

1. Type Hints ✅

Good type annotations throughout. Minor improvement:

# Current (line 271)
self._print_env: Optional[bool] = get_env_bool_optional("UPSONIC_AGENT_PRINT")

# Could add return type to _resolve_print_flag for completeness (already has params typed)

2. Documentation 🟡

Method docstrings updated appropriately
Missing migration notes for breaking changes
print_do() behavior change not documented in CHANGELOG

3. Naming ✅

_print_env, _print_param clearly indicate internal use
_resolve_print_flag follows conventions
_print_method_default parameter name is descriptive

🎯 Recommendations Summary

🔴 Must Fix Before Merge

Fix thread-safety issue: Remove state mutation or add synchronization (Issue Added reset system #1)
Fix inconsistent default: Initialize self.print correctly (Issue Added splitting long audios #2)
Add hierarchy tests: Verify ENV > param > method priority works correctly

🟡 Should Fix

Extract duplicated task preparation code (Issue Added input box and button #3)
Add output verification to smoke tests (missing coverage Added input box and button #3)
Add defensive error handling in test helpers (Issue Added icon #4)

🟢 Nice to Have

Add CHANGELOG/migration notes for breaking changes
Document thread-safety guarantees in Agent docstring
Consider caching resolved print flag for performance

📊 Overall Assessment

Category	Rating	Notes
Code Quality	7/10	Well-structured but has duplication
Test Coverage	8/10	Excellent memory tests, missing hierarchy tests
Security	10/10	No issues identified
Thread-Safety	3/10	Critical state mutation issue
Documentation	6/10	Good inline comments, missing migration notes
Overall	6.5/10	Good concept, needs fixes before merge

🚦 Verdict: ⚠️ REQUEST CHANGES

Blocking Issues:

State mutation race condition (Issue Added reset system #1) - MUST FIX
Missing test coverage for print hierarchy - MUST ADD
Inconsistent default behavior (Issue Added splitting long audios #2) - SHOULD FIX

Recommended Action: Fix the critical thread-safety issue, add tests for the new behavior, then re-review. The architecture is sound, but implementation needs refinement for production readiness.

References

CLAUDE.md guidelines followed ✅
Testing structure adheres to tests/ organization ✅
uv/pytest conventions maintained ✅

Great work on improving the print flag architecture! The hierarchical approach is the right direction. Looking forward to the fixes. 🚀

…vior

claude · 2026-01-31T12:18:39Z

Pull Request Review: Refactor Agent Printing Hierarchy

Summary

This PR refactors the agent printing hierarchy to establish a clear priority order (ENV > Agent param > Method default) and fixes memory state mutation issues in streaming mode. The changes span 6 files with 933 additions and 111 deletions.

✅ Strengths

1. Clear Hierarchy and Design

The new three-tier priority system is well-documented and intuitive:

ENV variable (highest priority)
Agent constructor print parameter
Method name (print_do=True, do=False)

The _resolve_print_flag() method cleanly implements this hierarchy.

2. Thread Safety Improvements

Storing print_flag on AgentRunOutput (line 321 in output.py) instead of mutating self.print prevents race conditions in concurrent scenarios
Local resolution of the print flag per-run is a solid architectural choice

3. Comprehensive Test Coverage

New test_agent_print_behavior.py covers all priority combinations
Extensive memory tests (717 additions!) validate both direct and streaming flows
Tests verify messages vs chat_history semantics correctly

4. Memory Bug Fix

The fix in steps.py:228-236 ensures chat_history is set to historical messages BEFORE adding new messages, which prevents the race condition where new_messages() would incorrectly capture historical messages.

5. Streaming Response Tracking

Adding context.chat_history.append(final_response) at line 277 in steps.py ensures streaming responses are properly tracked in session memory.

⚠️ Issues & Concerns

1. Missing Test File Newline (Minor)

test_agent_print_behavior.py is missing a newline at end of file (line 92). This will cause warnings from pre-commit hooks.

Fix:

echo "" >> tests/smoke_tests/agent/test_agent_print_behavior.py

2. Test Assertions Are Weak (Medium)

The tests in test_agent_print_behavior.py don't actually verify that output was printed or not printed - they just call the methods. These are smoke tests that would pass even if the printing logic is broken.

Recommendation:
Consider capturing stdout/stderr or checking output visibility:

import io
import sys

def test_print_do_default_prints():
    captured_output = io.StringIO()
    sys.stdout = captured_output
    try:
        my_agent = Agent()
        task = Task("test task")
        my_agent.print_do(task)
        output = captured_output.getvalue()
        assert len(output) > 0, "Expected output to be printed"
    finally:
        sys.stdout = sys.__stdout__

3. Potential Breaking Change (High - Security/Behavior)

The new hierarchy reverses the previous priority. Before, the Agent constructor print parameter overrode the ENV variable. Now ENV overrides everything.

Previous behavior:

os.environ["UPSONIC_AGENT_PRINT"] = "false"
agent = Agent(print=True)  # Would print (param overrides ENV)

New behavior:

os.environ["UPSONIC_AGENT_PRINT"] = "false"
agent = Agent(print=True)  # Won't print (ENV overrides param)

This could break existing user code that relies on the parameter to override environment settings. Consider:

Documenting this as a breaking change in release notes
Adding a deprecation warning in the transition period
Or reconsidering if ENV should truly override explicit constructor parameters

4. Memory Test Complexity (Medium)

The memory tests are comprehensive but very complex (717 additions). Consider:

Breaking into smaller test files by feature area (streaming, direct, mixed)
Some helper functions like count_message_types() could be moved to a test utilities module
The StreamingRunResult class could be in a shared test fixtures file

5. Missing Type Hints (Low)

Line 122 in logging_config.py:

def get_env_bool_optional(key: str) -> "Optional[bool]":

The quotes around Optional[bool] suggest forward references, but Optional should be imported from typing. Verify the import is present.

6. Code Duplication in `print_do` Methods (Medium)

Lines 126-164 in agent.py show significant duplication between print_do() and print_do_async(). The sync version manually handles the event loop and calls do_async(). Consider if this could be simplified or if comments explaining the necessity would help.

7. State Mutation Warning Needed (Low)

The code correctly avoids mutating self.print during resolution, but a comment explaining why would help future maintainers:

# Store locally - don't mutate self.print to ensure thread-safety
resolved_print_flag = self._resolve_print_flag(_print_method_default)

🔒 Security Considerations

✅ No security concerns identified. The changes don't introduce:

Command injection vulnerabilities
SQL injection risks
XSS vulnerabilities
Sensitive data exposure

⚡ Performance Considerations

✅ Positive Impact:

Thread-safe design prevents lock contention
No new synchronous blocking operations introduced

⚠️ Minor Concern:

_resolve_print_flag() is called per-run (lines 90, 109, 204) - this is acceptable overhead but could be optimized if profiling shows it's a hotspot

📊 Test Coverage

New Tests Added:

7 print behavior tests in test_agent_print_behavior.py
15+ new memory tests covering direct/streaming flows

Missing Tests:

No tests for concurrent access to print_flag
No tests verifying actual printed output
Edge case: What if ENV var is set to invalid value like "maybe"? (Currently returns False, which is correct)

Test Quality: ⭐⭐⭐⭐ (4/5)

Comprehensive coverage of memory scenarios
Print tests could be stronger with actual output verification

🎯 Recommendations

Must Fix Before Merge:

✅ Add newline at end of test_agent_print_behavior.py
📝 Document the breaking change in ENV/param priority
🧪 Consider strengthening print tests to verify actual output

Should Consider:

📚 Add comments explaining the thread-safety rationale
🔧 Verify Optional import in logging_config.py
📖 Update user documentation about the print hierarchy

Nice to Have:

♻️ Refactor test helpers into shared utilities
🧪 Add concurrent access tests
📊 Profile _resolve_print_flag() performance in high-throughput scenarios

Overall Assessment

Code Quality: ⭐⭐⭐⭐ (4/5)

Clean implementation
Good separation of concerns
Thread-safe design

Risk Level: 🟡 MEDIUM

Breaking change in priority hierarchy needs communication
Test assertions could be stronger

Recommendation: ✅ APPROVE with minor changes

The refactoring is solid and fixes real concurrency issues. The main concern is the breaking change in priority order - ensure this is intentional and well-documented. Once the newline is added and the breaking change is acknowledged, this is ready to merge.

Great work on the comprehensive test coverage and the thread-safety improvements! 🚀

refactor: refactor agent printing hierarchy

0919fe4

fix: fix memory error for streaming

edbc6f2

fix: fix state mutuation race condition and inconsistent default beha…

70e9740

…vior

DoganK01 merged commit fd729d1 into master Jan 31, 2026
5 checks passed

DoganK01 deleted the printing-fix-v2 branch January 31, 2026 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: refactor agent printing hierarchy#515

refactor: refactor agent printing hierarchy#515
DoganK01 merged 3 commits intomasterfrom
printing-fix-v2

DoganK01 commented Jan 30, 2026

Uh oh!

claude bot commented Jan 30, 2026

Uh oh!

claude bot commented Jan 31, 2026

Uh oh!

claude bot commented Jan 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DoganK01 commented Jan 30, 2026

Uh oh!

claude bot commented Jan 30, 2026

Code Review: Refactor Agent Printing Hierarchy

Overview

✅ Strengths

🐛 Critical Issues

1. Breaking Behavioral Change in print_do() (HIGH PRIORITY)

2. State Mutation Side Effect (HIGH PRIORITY)

3. Inconsistent Default Value (MEDIUM PRIORITY)

⚠️ Design Concerns

4. Code Duplication (MEDIUM PRIORITY)

5. Unclear Method Return Behavior (LOW PRIORITY)

🧪 Test Coverage Concerns

🔒 Security & Performance

📝 Documentation & Style

🎯 Recommendations Summary

Uh oh!

claude bot commented Jan 31, 2026

Comprehensive Code Review: PR #515 - Refactor Agent Printing Hierarchy

Executive Summary

🎯 Architecture & Design

✅ Strengths

⚠️ Design Concerns

1. State Mutation Creates Race Conditions (🔴 CRITICAL)

2. Inconsistent Default Behavior (🟡 MEDIUM)

3. Code Duplication (🟡 MEDIUM)

🐛 Potential Bugs

4. Memory Test Helper Missing Error Handling (🟡 MEDIUM)

5. Pipeline Step Comment Mismatch (🟢 LOW)

🧪 Test Coverage

✅ Excellent Additions

❌ Missing Test Coverage

🔒 Security & Performance

Security

Performance

📝 Code Quality & Style

1. Type Hints ✅

2. Documentation 🟡

3. Naming ✅

🎯 Recommendations Summary

🔴 Must Fix Before Merge

🟡 Should Fix

🟢 Nice to Have

📊 Overall Assessment

🚦 Verdict: ⚠️ REQUEST CHANGES

References

Uh oh!

claude bot commented Jan 31, 2026

Pull Request Review: Refactor Agent Printing Hierarchy

Summary

✅ Strengths

1. Clear Hierarchy and Design

2. Thread Safety Improvements

3. Comprehensive Test Coverage

4. Memory Bug Fix

5. Streaming Response Tracking

⚠️ Issues & Concerns

1. Missing Test File Newline (Minor)

2. Test Assertions Are Weak (Medium)

3. Potential Breaking Change (High - Security/Behavior)

4. Memory Test Complexity (Medium)

5. Missing Type Hints (Low)

6. Code Duplication in print_do Methods (Medium)

7. State Mutation Warning Needed (Low)

🔒 Security Considerations

⚡ Performance Considerations

📊 Test Coverage

🎯 Recommendations

Must Fix Before Merge:

Should Consider:

Nice to Have:

Overall Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

1. Breaking Behavioral Change in `print_do()` (HIGH PRIORITY)

6. Code Duplication in `print_do` Methods (Medium)