Skip to content

feat: state graph, recursive tool handling for streaming logic#455

Merged
onuratakan merged 5 commits intomasterfrom
graphv2
Oct 31, 2025
Merged

feat: state graph, recursive tool handling for streaming logic#455
onuratakan merged 5 commits intomasterfrom
graphv2

Conversation

@DoganK01
Copy link
Copy Markdown
Member

  • Runnable state graph
  • Recursive tool handling for streaming logic in Agent class

@claude
Copy link
Copy Markdown

claude bot commented Oct 23, 2025

PR Review: State Graph & Recursive Tool Handling

This PR introduces a substantial new feature - a LangGraph-style stateful graph system (~3,300 additions) along with improvements to streaming tool call handling.


✅ Strengths

1. Excellent Architecture & Design

  • Well-structured module organization with clear separation of concerns
  • Strong abstraction layers (BaseCheckpointer, BaseCache, BaseStore)
  • Comprehensive documentation with excellent docstrings and examples
  • Smart integration with existing LCEL Runnable interface

2. Robust Feature Set

  • Checkpointing system with SQLite and in-memory implementations
  • Node-level caching with TTL support
  • Cross-thread memory via Store system
  • Retry policies with exponential backoff and jitter
  • Powerful control flow primitives (Command, Send, interrupt)

3. Streaming Logic Improvements (steps.py:963-1093)

  • Recursive tool handling mirrors non-streaming behavior correctly
  • Proper tool_call_limit enforcement with notifications
  • Correct _stop_execution flag processing
  • Reduced code duplication

⚠️ Critical Issues

1. No Test Coverage 🔴

Impact: High | Priority: Critical

The PR adds 3,311 lines with zero tests. Need tests for:

  • StateGraph compilation, execution, edge routing
  • Checkpoint save/restore
  • Interrupt/resume workflows
  • Cache hit/miss scenarios
  • Retry policies and error handling
  • New streaming logic

2. Potential Bugs

a) Shallow Copy in _merge_state (state_graph.py:525-535)

  • Using result.copy() could lose nested dict/list data
  • Should use deepcopy() for safety

b) Missing Reducer Validation (_extract_reducers, state_graph.py:145-158)

  • No validation that metadata is callable
  • Could fail silently at runtime

c) Recursion Limit Logic (state_graph.py:625-630)

  • Doesn't account for parallel Send operations
  • Could incorrectly limit legitimate workflows

3. Security Concerns 🔒

a) Pickle Usage (cache.py:49, 245)

  • Pickle can execute arbitrary code on untrusted data
  • Cache SQLite files could be attack vector
  • Consider JSON serialization + integrity checks (HMAC)

b) Namespace Collisions (checkpoint.py:253-256)

  • Using pipe delimiter without escaping
  • Could cause collisions if namespace contains pipes

4. Performance Issues

a) Missing Compound Index (cache.py:208-217)

  • Index on timestamp but queries use namespace + key
  • Should add compound index: (namespace, key, timestamp)

b) Unbounded Checkpoint History (checkpoint.py:127-143)

  • No automatic cleanup mechanism
  • Will cause storage growth issues

c) Expensive Deepcopy (state_graph.py:650, 734)

  • Deepcopy on every step for large states
  • Consider copy-on-write or immutable patterns

📋 Recommendations

Must-Have Before Merge:

  1. ✅ Add comprehensive test coverage (minimum 80%)
  2. ✅ Address pickle security concerns
  3. ✅ Add 2-3 working examples
  4. ✅ Fix shallow copy bug in _merge_state

Should-Have:
5. Add integration tests with Agent/Task
6. Add performance benchmarks
7. Document migration path
8. Add checkpoint cleanup mechanism


🎯 Overall Assessment

This is ambitious and well-designed, with solid architecture and good code quality. However, lack of tests is a critical blocker and security/performance concerns need addressing.

Recommendation: 🟡 Request Changes - Add tests and fix critical issues before merging.

Great work! The graph system looks very promising. Looking forward to seeing this with proper test coverage! 🚀

@claude
Copy link
Copy Markdown

claude bot commented Oct 24, 2025

Pull Request Review: StateGraph and Recursive Tool Handling

This is a significant PR introducing a new Direct class for simplified LLM interactions and a comprehensive StateGraph system. Overall high-quality work with good architecture, but several critical issues need attention.

Summary

  • Additions: ~4,028 lines | Deletions: ~110 lines
  • New Direct class for high-speed LLM interactions
  • Complete StateGraph v2 with checkpointing, caching, interrupts
  • Refactored streaming with recursive tool handling

Strengths

  1. Excellent Architecture - Clean APIs, good separation of concerns
  2. Comprehensive Features - Checkpointing, interrupts, caching, retry policies, parallel execution
  3. Good Documentation - Clear docstrings and examples
  4. Proper Recursive Tool Handling - Handles limits and stop flags correctly

CRITICAL Issues

1. Missing Test Coverage

NO tests added for Direct class, StateGraph, or recursive streaming. Cannot verify correctness or prevent regressions.
MUST ADD: tests/unit_tests/test_direct.py, test_state_graph.py, test_streaming_recursive_tools.py

2. Breaking Change Without Migration

Removed Direct = Agent alias without deprecation warnings. Users will break.
MUST: Keep legacy alias with deprecation OR rename new class (e.g. DirectCall)

3. File Read Security Issue (src/upsonic/direct.py:170-182)

  • No path validation (could read arbitrary files)
  • No file size limits (memory exhaustion)
  • Silent failures
    MUST: Add path validation, 50MB size limit, proper error handling

4. Recursion Limit Missing (steps.py:966-1089)

_stream_with_tool_calls recursively calls itself without depth tracking. Risk of infinite loops.
MUST: Add recursion_depth parameter with max limit (e.g. 100)

High Priority Issues

  1. Hardcoded openai/gpt-4o default (direct.py:148) - forces OpenAI dependency
  2. Provider parameter stored but never used
  3. Mutates shared model._settings (could affect other code)
  4. Deep copy on every checkpoint - expensive for large states
  5. No checkpoint size limits
  6. Async checkpoint writes not tracked (race condition on crash)
  7. Poor error context in parallel execution

Minor Issues

  • Code duplication in streaming logic
  • Inconsistent naming conventions (do/do_async vs ainvoke)
  • Missing node validation when adding to graph
  • Overly broad type hints (Any instead of specific types)
  • References to missing print utility functions

Security Concerns

  1. File system access without validation (CRITICAL)
  2. No limits on state/message size (resource exhaustion risk)
  3. Potential checkpoint injection if using pickle

Performance Notes

Good: Async-first design, parallel execution, lazy imports
Concerns: Deep copy overhead, sync file I/O, no concurrency limits on parallel tasks

Missing Documentation

  • No migration guide for Direct alias change
  • No StateGraph examples with actual agents
  • No guidance on when to use Direct vs Agent
  • No performance guidelines for durability modes
  • No changelog entry

Recommendations Before Merge

MUST HAVE:

  1. Add comprehensive tests for all new features
  2. Fix file read security vulnerabilities
  3. Add recursion limits to streaming
  4. Handle breaking change properly (deprecation or rename)
  5. Add or remove missing print functions

SHOULD HAVE:
6. Fix hardcoded model default
7. Implement or remove provider parameter
8. Add checkpoint size limits and warnings
9. Improve error messages in parallel execution
10. Add migration guide to PR description

Final Verdict

Excellent architecture and impressive features, but BLOCKING ISSUES:

  • No tests (critical gap)
  • Breaking changes without migration path
  • Security vulnerabilities in file handling
  • Missing recursion limits

Recommendation: Request changes to address critical issues, then ready to merge.

The StateGraph implementation is particularly impressive and shows deep understanding of workflow patterns. The recursive tool handling properly fixes streaming issues. Great work overall - looking forward to seeing this land after addressing the concerns above!

@claude
Copy link
Copy Markdown

claude bot commented Oct 27, 2025

Pull Request Review: State Graph, Durable Execution & OCR Engine

I've completed a comprehensive review of PR #455. This is a substantial feature addition with ~8,700 lines of new code across 3 major feature areas.

Overview

Scope: 3 major features + infrastructure improvements

  • StateGraph (graphv2) - ~3,000 LOC
  • Durable Execution - ~2,500 LOC
  • OCR Engine - ~3,000 LOC
  • Direct class - ~400 LOC

Files Changed: 44 files (42 additions, 2 modifications)

Strengths

1. Well-Structured Architecture

  • Clean separation of concerns across all new modules
  • Good use of abstract base classes (OCRProvider, BaseCheckpointer, BaseStore)
  • Follows existing Upsonic patterns and conventions

2. Comprehensive Documentation

  • Excellent docstrings with examples in all major classes
  • Clear usage patterns demonstrated in code comments
  • Good inline documentation explaining complex logic

3. Durable Execution Design

The durable execution system is particularly well-designed:

  • JSON-based serialization (avoiding pickle security issues)
  • Multiple storage backends (Memory, File, SQLite, Redis)
  • Proper checkpoint management at each pipeline step
  • Good error recovery mechanisms

4. StateGraph Implementation

  • Clean graph-based workflow API similar to LangGraph
  • Good separation: builder pattern → compiled graph
  • Support for conditional edges and interrupts

Critical Issues

1. Missing Test Coverage (HIGH PRIORITY)

This is the most significant concern:

  • Zero tests for StateGraph/graphv2 (1,550 LOC in state_graph.py alone)
  • Zero tests for Durable Execution (2,500+ LOC)
  • Zero tests for Direct class (395 LOC)
  • Zero tests for OCR engine (3,000+ LOC)

Recommendation: Add minimum test coverage for all new features before merge.

2. Type Safety Issue (CRITICAL BUG)

src/upsonic/agent/pipeline/manager.py:166

for step_index, step in enumerate[Step](self.steps):
  • Incorrect syntax: enumerate[Step] is not valid Python
  • Should be: for step_index, step in enumerate(self.steps):

3. Security Concerns

File Path Handling (src/upsonic/direct.py:170-183)

with open(attachment_path, "rb") as attachment_file:
    attachment_data = attachment_file.read()
  • No path validation or size limits
  • Potential path traversal vulnerability
  • Could read arbitrary files or exhaust memory

Recommendation: Add path validation and size limits (e.g., max 100MB)

4. Error Handling Gaps

src/upsonic/durable/serializer.py:36-42
The comment admits class reconstruction isn't possible for Pydantic models, which could fail silently on resume if class is unavailable. Needs proper error handling.

src/upsonic/agent/pipeline/manager.py:268-295
Saving checkpoint in exception handler could fail if storage is unavailable, masking the original error. Needs try/except wrapper.

Performance Considerations

1. Checkpoint Frequency

  • Currently saves after EVERY pipeline step (15-20 checkpoints per execution)
  • Could impact performance with remote storage (Redis, PostgreSQL)
  • Recommendation: Add configurable checkpoint frequency

2. OCR Memory Usage

  • Multiple OCR providers load large models (PaddleOCR, EasyOCR, DeepSeek)
  • No model caching or lazy loading strategy
  • Could exhaust memory with concurrent requests
  • Recommendation: Implement model pooling/caching

3. StateGraph State Copying

  • Deep copies state for every node execution
  • Could be expensive for large state objects
  • Recommendation: Use shallow copy where possible or copy-on-write

Code Quality Issues

1. Code Duplication

Extensive sync/async method duplication throughout. Consider using decorators to reduce boilerplate.

2. Magic Numbers

Hardcoded default model "openai/gpt-4o" in Direct class should be a configurable constant.

3. Inconsistent Naming

  • durable_execution_id vs execution_id
  • graphv2 vs graph_v2
  • OCR (all caps) vs Direct (PascalCase)

Checklist Before Merge

  • Add comprehensive test coverage (minimum 70% for new code)
  • Fix enumerate[Step] syntax error
  • Add input validation for file paths and sizes
  • Implement state version checking for serialization
  • Add resource cleanup for OCR providers
  • Document breaking changes (Direct is no longer an alias for Agent)
  • Run mypy type checking on new code
  • Performance testing for checkpoint overhead

Priority Recommendations

Must Fix Before Merge:

  1. Add basic test coverage for all new features
  2. Fix enumerate[Step] type annotation bug
  3. Add file path validation in Direct class
  4. Add recursion limits for tool calls

Should Fix Before Merge:

  1. Add state version checking for durable execution
  2. Implement proper resource cleanup for OCR
  3. Add error handling for checkpoint save failures
  4. Document the breaking change (Direct is no longer an alias for Agent)

Nice to Have:

  1. Reduce sync/async code duplication
  2. Add performance metrics for checkpointing
  3. Implement model caching for OCR

Final Thoughts

This is ambitious, high-quality work that adds significant value to Upsonic. The architecture is well-designed and the code is generally clean. However, the lack of tests is a critical blocker for a production-ready AI framework.

Estimated Risk: HIGH without tests, MEDIUM with basic tests

Recommendation:

  • Do not merge until basic test coverage is added
  • Consider splitting this into 3 smaller PRs for easier review
  • Add integration tests demonstrating the full workflow

Great work overall! Looking forward to seeing this land with proper test coverage.


Reviewed by Claude Code following Upsonic CLAUDE.md guidelines

@claude
Copy link
Copy Markdown

claude bot commented Oct 27, 2025

Comprehensive PR Review: State Graph & Recursive Tool Handling

Executive Summary

This PR introduces three major features to Upsonic with 15,169 additions across 45 files:

  1. StateGraph (graphv2) - A LangGraph-inspired stateful workflow engine
  2. Durable Execution - Checkpoint-based fault recovery for agent tasks
  3. OCR Module - Multi-provider OCR with 5+ backends
  4. Recursive Tool Handling - Improved streaming logic for tool calls in Agent class

Code Quality Assessment

Strengths

1. StateGraph Implementation (src/upsonic/graphv2/state_graph.py)

  • Excellent architecture: Clean separation of builder (StateGraph) and runtime (CompiledStateGraph)
  • Comprehensive features: Checkpointing, interrupts, conditional edges, Send-based dynamic parallelization
  • Well-documented: Clear docstrings and examples
  • Type safety: Proper use of TypedDict for state schemas
  • Performance optimizations: _build_routing_maps() for O(1) edge lookups
  • Durability modes: Three checkpoint strategies (sync, async, exit)

2. Durable Execution System (src/upsonic/durable/)

  • Security-conscious: Uses JSON serialization instead of pickle to avoid untrusted data vulnerabilities
  • Good abstraction: DurableExecutionStorage interface supports multiple backends (Memory, File, SQLite, Redis)
  • Recovery logic: Proper checkpoint saving at pipeline steps with retry on failure
  • Clean integration: Seamlessly integrated into pipeline manager

3. Recursive Tool Handling (src/upsonic/agent/agent.py & steps.py)

  • Critical fix: Addresses tool call handling in streaming mode (lines 963-1085 in steps.py)
  • Proper recursion: New _stream_with_tool_calls() method correctly handles tool call loops
  • Maintains state: Tracks accumulated text and timing across recursive calls

4. Direct Class (src/upsonic/direct.py)

  • Simplified API: Provides lightweight alternative to full Agent for speed-critical tasks
  • Good separation: No memory/tools/knowledge base overhead

Issues & Concerns

Critical Issues

1. Missing Test Coverage

  • ZERO tests for new StateGraph functionality
  • ZERO tests for durable execution
  • For a PR adding 15k+ lines, this is a major red flag
  • Required: tests/unit_tests/graphv2/ and tests/unit_tests/durable/ directories with comprehensive coverage

2. Error Handling in Checkpoint Saving (manager.py:265-293)

  • Issue: If checkpoint saving fails during error handling, the original exception is lost
  • Recommendation: Wrap checkpoint save in try/except to preserve original error

3. Recursion Limit Handling (state_graph.py:576-579)

  • Issue: No checkpoint saved before raising recursion error
  • Impact: Cannot resume from recursion errors

Major Issues

4. Type Annotation Bug (manager.py:166)

  • Invalid syntax: enumerate[Step] is not valid Python
  • Fix: Use enumerate(self.steps)

5. Async/Sync Confusion (execution.py:146-161)

  • Issue: Using create_task without awaiting means checkpoints may not complete
  • Impact: Race conditions where execution continues before state is saved

6. Memory Leak Potential (state_graph.py:1158-1161)

  • Issue: _pending_checkpoints list grows unbounded in exit durability mode
  • Impact: Long-running graphs could accumulate checkpoints in memory

7. Security: Deserialization Risks

  • Good: Uses JSON instead of pickle
  • Concern: No validation of deserialized data structure
  • Recommendation: Add schema validation before reconstructing objects

Performance Considerations

Positive:

  • Async checkpointing reduces blocking
  • Routing maps provide O(1) lookups
  • Parallel node execution properly utilized

Concerns:

  • Checkpoint overhead on every pipeline step
  • JSON serialization cost for complex objects
  • No mention of batch writes or compression

Security Assessment

Good:

  • JSON over pickle
  • No eval/exec usage
  • Input validation in StateGraph

Concerns:

  • OCR module reads arbitrary file paths (path traversal risk)
  • No verification that resumed execution matches original context
  • Redis storage auth patterns not shown

Test Coverage Gap

Current: 41 test files exist, but 0 tests for new features (11k+ LOC)

Required minimum:

  • StateGraph: 15+ test cases
  • Durable execution: 10+ test cases
  • OCR: 20+ test cases (5 per provider)
  • Recursive tool handling: 5+ test cases

Estimated missing: 50+ test cases

Recommendations

Before Merging (BLOCKERS)

  1. Add comprehensive test suite - non-negotiable for 15k LOC
  2. Fix type annotation bug - enumerate[Step] syntax error
  3. Fix async/sync checkpoint save - prevent race conditions
  4. Add recursion error checkpointing - enable recovery

High Priority

  1. Error handling in checkpoint save during exceptions
  2. Bounds checking for _pending_checkpoints
  3. Schema validation in deserializer
  4. Security tests for path traversal in OCR
  5. Document migration from Direct = Agent removal

Nice to Have

  1. Performance benchmarks
  2. Compression for checkpoints
  3. Example projects for StateGraph
  4. Type stubs for IDE support

Conclusion

This is an ambitious and well-architected PR with high code quality and thoughtful design.

However, the complete absence of tests for 15k+ lines is a critical blocker.

Recommendation:

  • Do NOT merge until test coverage added
  • Architecture is sound once validated
  • Fix type errors and async issues

Estimated work remaining: 2-3 days for tests + fixes

Great work! StateGraph design is excellent. Once tests are added, this will be a major feature. 🚀

@onuratakan onuratakan self-requested a review October 29, 2025 12:28
Copy link
Copy Markdown
Member

@onuratakan onuratakan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Loading info message during OCR model loading section (For example, easyocr)
  • Direct info message when language support is not available
  • Why aren't things like Pillow and pdf2image included in OCR
  • Info message for: pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH

@claude
Copy link
Copy Markdown

claude bot commented Oct 29, 2025

Pull Request Review: State Graph & Durable Execution

Overview

This is a significant PR introducing two major features:

  1. State Graph: Runnable state graph implementation for workflow orchestration
  2. Durable Execution: Checkpoint-based recovery system for long-running agent tasks

Changes: 48 files changed, 15,553 additions, 4,177 deletions


✅ Strengths

Architecture & Design

  • Well-structured abstraction: The durable execution system follows a clean separation of concerns with DurableExecution, DurableExecutionStorage, and DurableStateSerializer
  • Multiple storage backends: Provides flexibility with In-Memory, File, SQLite, and Redis storage options
  • Pipeline integration: Checkpointing is elegantly integrated into the existing pipeline manager without major disruptions
  • Graph workflow support: State graph implementation with decision nodes (LLM-based and function-based) enables sophisticated workflows

Code Quality

  • Type hints: Good use of type annotations throughout the new code
  • Documentation: Classes and methods include comprehensive docstrings with examples
  • Error handling: Appropriate try-catch blocks in critical sections
  • Lazy loading: Smart use of lazy imports in __init__.py to avoid circular dependencies

⚠️ Critical Issues

1. MISSING TEST COVERAGE 🔴

Severity: High

The PR adds ~15k lines of code but NO tests were found for:

  • State graph execution (src/upsonic/graph/)
  • Durable execution checkpointing (src/upsonic/durable/)
  • Recovery/resumption logic
  • Storage backend implementations

Recommendation: Add comprehensive tests before merging:

# Required test coverage:
tests/test_graph_basic.py           # Basic graph execution
tests/test_graph_decisions.py       # Decision nodes
tests/test_durable_execution.py     # Checkpoint save/load
tests/test_durable_recovery.py      # Failure recovery
tests/test_durable_storages.py      # All storage backends

2. Type Safety Issues 🟡

File: src/upsonic/agent/pipeline/manager.py:166

for step_index, step in enumerate[Step](self.steps):

This syntax is incorrect. enumerate doesn't accept type parameters in Python. Should be:

for step_index, step in enumerate(self.steps):

3. Incomplete Error Handling in Durable Recovery 🟡

File: src/upsonic/agent/agent.py:1751-2000

The continue_durable_async method doesn't handle edge cases:

  • What if the checkpoint is corrupted?
  • What if the storage backend is unavailable during recovery?
  • What if the task definition has changed since checkpoint was saved?

Recommendation: Add validation and error recovery:

# Validate checkpoint compatibility
if checkpoint.get('version') != CURRENT_VERSION:
    raise IncompatibleCheckpointError(...)

# Add timeout for storage operations
try:
    checkpoint = await asyncio.wait_for(
        durable.load_checkpoint_async(),
        timeout=30.0
    )
except asyncio.TimeoutError:
    raise DurableRecoveryError("Storage timeout")

4. Streaming Tool Call Recursion Concerns 🟡

File: src/upsonic/agent/pipeline/steps.py:966

The new _stream_with_tool_calls method implements recursive tool calling for streaming. Potential issues:

  • Stack overflow risk: Deep recursion with many tool calls could exhaust stack
  • No recursion depth limit: Unlike non-streaming which uses _tool_call_count, streaming doesn't track depth clearly
  • Memory accumulation: Each recursive call creates a new stream context

Recommendation:

async def _stream_with_tool_calls(self, context, model_params, accumulated_text, first_token_time, depth=0):
    MAX_RECURSION_DEPTH = 50  # Match agent's tool_limit
    if depth >= MAX_RECURSION_DEPTH:
        # Yield final event and stop
        yield FinalResultEvent(...)
        return
    # ... rest of implementation
    # When recursing:
    async for event in self._stream_with_tool_calls(..., depth=depth+1):

5. OCR Dependency Management 🟡

File: pyproject.toml:123-132

Added OCR dependencies with both easyocr and paddleocr, but:

  • No OCR tests added
  • OCR integration not documented in the PR description
  • ocr-gpu extra is empty (line 131)
  • Heavy dependencies (~1GB combined) without clear justification in this PR

Questions:

  • Is OCR functionality complete or in-progress?
  • Should this be a separate PR?
  • What's the plan for ocr-gpu extra?

🐛 Potential Bugs

6. Race Condition in Checkpoint Saving 🟡

File: src/upsonic/durable/execution.py:150

if loop.is_running():
    asyncio.create_task(self.save_checkpoint_async(...))

Creating a task without awaiting means:

  • No error handling if checkpoint fails
  • No guarantee checkpoint completes before process exits
  • Potential lost checkpoints

Recommendation: Either await the task or use a task group with proper error handling.

7. Inconsistent Checkpoint Status Logic 🟡

File: src/upsonic/agent/pipeline/manager.py:206-224

The logic for determining resume point is complex:

if checkpoint_status == "failed":
    resume_from_index = step_index  # Retry failed step
else:
    resume_from_index = step_index + 1  # Next step

But there's ambiguity about whether step_index refers to:

  • The last successful step
  • The failed step
  • The last attempted step

The comments in lines 268-287 suggest this was recently fixed, but the logic should be validated with tests.

8. Missing Cleanup on Graph Execution Errors 🟡

File: src/upsonic/graph/graph.py (inferred from diff)

Graph execution may leave orphaned resources if exceptions occur mid-execution:

  • Unclosed database connections
  • Unreleased locks
  • Incomplete durable states

Recommendation: Use context managers and ensure cleanup in finally blocks.


🔒 Security Concerns

9. Deserialization of Untrusted Data 🟡

The durable execution system serializes and deserializes complex objects including:

  • Task objects with arbitrary context
  • Function references in decision nodes
  • Agent state

If checkpoints are loaded from untrusted sources (shared storage, user uploads), this could enable:

  • Code injection via pickle
  • Resource exhaustion via crafted objects

Recommendation:

  • Use safe serialization (JSON when possible, not pickle)
  • Validate checkpoint signatures
  • Add a security note in documentation about checkpoint storage security

⚡ Performance Considerations

10. Checkpoint Overhead 🟡

Checkpoints are saved after every pipeline step (17+ steps per agent execution):

if result.status == StepStatus.SUCCESS:
    await self._save_durable_checkpoint(...)

For fast operations, checkpoint I/O could exceed execution time. Consider:

  • Configurable checkpoint frequency
  • Batch checkpoints for fast tasks
  • Skip checkpoints for read-only steps

11. Graph Execution Memory Usage 📊

State graph execution maintains:

  • Full execution history for each node
  • All intermediate outputs
  • Parallel execution contexts

For large workflows, this could consume significant memory.

Recommendation: Add configuration for:

  • Maximum history retention
  • Output size limits
  • Memory budgets per graph execution

📋 Additional Recommendations

Code Quality

  1. Add logging: Durable execution needs more observability (execution IDs, checkpoint locations, recovery stats)
  2. Metrics: Track checkpoint size, save/load times, recovery success rate
  3. Configuration: Make checkpoint frequency and storage options configurable per task
  4. Validation: Add schema validation for checkpoint compatibility across versions

Documentation

  1. Add migration guide for updating to this version
  2. Document durable execution architecture in README
  3. Provide examples for common graph patterns
  4. Add troubleshooting guide for recovery failures

Testing Strategy

Priority 1 (Must have before merge):

  • Basic graph execution tests
  • Durable checkpoint save/load
  • Recovery from failure
  • Storage backend tests

Priority 2 (Nice to have):

  • Integration tests with actual LLMs
  • Performance benchmarks
  • Stress tests for graph complexity
  • Concurrent execution tests

📊 Summary

Category Status Count
Critical Issues 🔴 1
High Priority 🟡 8
Medium Priority 📊 2
Total Concerns ⚠️ 11

Verdict: ⏸️ NEEDS WORK

This PR introduces valuable features but should not be merged without:

  1. Comprehensive test coverage (blocking) 🔴
  2. Fix type safety issue in enumerate (blocking) 🔴
  3. Address streaming recursion limits (high priority) 🟡
  4. Improve error handling in recovery (high priority) 🟡
  5. Clarify OCR integration scope (medium priority) 🟡

Recommended Action Plan

  1. Phase 1: Add core tests (graph, durable execution, recovery)
  2. Phase 2: Fix identified bugs and type issues
  3. Phase 3: Address performance and security concerns
  4. Phase 4: Improve documentation and examples
  5. Phase 5: Consider splitting OCR into separate PR

Great work on this ambitious feature! The architecture is solid, but the implementation needs polish before production use. Happy to discuss any of these points further.

🤖 Generated with Claude Code

@onuratakan onuratakan merged commit 7ccd9e8 into master Oct 31, 2025
4 checks passed
@onuratakan onuratakan deleted the graphv2 branch October 31, 2025 19:04
DoganK01 pushed a commit that referenced this pull request Jan 5, 2026
feat: state graph, recursive tool handling for streaming logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants