Skip to content

feat(engine): implement context budget management in execution loops#520

Merged
Aureliolo merged 14 commits intomainfrom
feat/context-budget-management
Mar 17, 2026
Merged

feat(engine): implement context budget management in execution loops#520
Aureliolo merged 14 commits intomainfrom
feat/context-budget-management

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Add context window capacity tracking to AgentContextcontext_fill_tokens, context_capacity_tokens, compression_metadata fields with context_fill_percent computed property
  • Create ContextBudgetIndicator model with formatted soft budget display ([Context: 12,450/16,000 tokens (78%) | 0 archived blocks]) injected into system prompts via new _SECTION_CONTEXT_BUDGET
  • Extend PromptTokenEstimator protocol with estimate_conversation_tokens() for per-message conversation size estimation
  • Create compaction/ subpackage with CompactionCallback type alias, CompactionConfig, CompactionResult, CompressionMetadata models, and make_compaction_callback() factory (oldest-turns summarizer)
  • Wire compaction_callback into ReactLoop and PlanExecuteLoop constructors with turn-boundary invocation and error isolation (matching checkpoint_callback/stagnation_detector patterns)
  • Update checkpoint/resume.py — pass through compaction_callback in make_loop_with_callback(), compression-aware reconciliation messages for compressed checkpoint recovery
  • Add context_budget observability event constants

Test plan

  • 35 new tests: unit tests for indicator formatting, fill estimation, compaction models, summarizer logic
  • Hypothesis property-based tests for indicator format safety and token estimation
  • All 8721 existing tests pass (no regressions)
  • ruff lint + format clean
  • mypy strict mode clean
  • Pre-push hooks pass (mypy + pytest + commitizen)

Review notes

  • Pre-reviewed by automated checks (ruff, mypy, pytest 8721 passed)
  • Review agents launched but timed out — manual review recommended
  • Design follows existing hook patterns (checkpoint_callback, stagnation_detector, approval_gate)
  • ModelCapabilities.max_context_tokens already existed — no provider layer changes needed
  • No system prompt rebuild per turn — indicator set at build time, fill tracking drives compaction programmatically

Closes #416

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 17, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1f247c3d-fac2-4e12-b1c4-e5c973e60a40

📥 Commits

Reviewing files that changed from the base of the PR and between 8063e25 and f454896.

📒 Files selected for processing (1)
  • src/synthorg/engine/compaction/summarizer.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff

Files:

  • src/synthorg/engine/compaction/summarizer.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections use copy.deepcopy() at construction and MappingProxyType wrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state
Use @computed_field for derived values instead of storing and validating redundant fields in Pydantic models
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger: from synthorg.observability import get_logger then assign logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code — only use the centralized logger from synthorg.observability
Logger variable must always be named logger (not _logger, not log)
Event names must always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: always logger.info(EVENT, key=value) — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...

Files:

  • src/synthorg/engine/compaction/summarizer.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown

Files:

  • src/synthorg/engine/compaction/summarizer.py
🧠 Learnings (2)
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Functions must be less than 50 lines, files must be less than 800 lines

Applied to files:

  • src/synthorg/engine/compaction/summarizer.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Functions: < 50 lines, files < 800 lines.

Applied to files:

  • src/synthorg/engine/compaction/summarizer.py
🧬 Code graph analysis (1)
src/synthorg/engine/compaction/summarizer.py (3)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
src/synthorg/engine/token_estimation.py (4)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-101)
  • estimate_tokens (22-31)
  • estimate_tokens (58-67)
src/synthorg/engine/context.py (3)
  • AgentContext (104-412)
  • context_fill_percent (180-187)
  • with_compression (308-336)
🔇 Additional comments (6)
src/synthorg/engine/compaction/summarizer.py (6)

1-36: LGTM!

Clean module setup with proper imports from the observability system, correct logger initialization, and appropriate use of TYPE_CHECKING to avoid circular imports.


38-62: LGTM!

Clean factory pattern with proper type hints and documentation. The async wrapper around synchronous core logic is a sensible design choice.


65-131: Function length now compliant after refactoring.

The function is approximately 41 executable lines, well under the 50-line limit. The delegation to _split_conversation and _compress helpers keeps this coordinator function focused and readable.

Based on learnings: "Functions must be less than 50 lines".

,


133-172: LGTM - correctly preserves all leading SYSTEM messages.

The while loop at lines 153-157 properly handles multiple leading SYSTEM messages (including prior compaction summaries), addressing the earlier concern about repeated compactions dropping context.


175-205: LGTM!

Clean compression logic with proper metadata accumulation. The turn counting correctly uses ASSISTANT message count (line 198) instead of the previous len // 2 assumption.


208-244: LGTM!

The fallback path correctly uses CONTEXT_BUDGET_COMPACTION_FALLBACK with execution_id for proper observability correlation. The summary construction is straightforward and appropriately capped.


📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Automatic conversation compaction (oldest-turns summarization) with configurable thresholds and a pluggable compaction hook; runs at turn boundaries and preserves recent turns.
    • Context budget tracking (tokens, capacity, percent) with a human-readable indicator injected into system prompts.
    • Pluggable token estimation with a default heuristic.
  • Documentation

    • Design docs updated with Context Budget Management and compaction details.
  • Tests

    • Extensive unit and property tests for compaction, token estimation, and context budget.
  • Bug Fixes

    • Memory/recursion errors now propagate during cleanup; compaction failures logged and treated non-fatal.

Walkthrough

Adds pluggable conversation compaction and context-budget tracking: new compaction package (models, protocol, summarizer), token-estimation utilities, context budget indicator/estimation/update helpers, wiring of compaction into ReactLoop/PlanExecuteLoop and checkpoint resume, observability events, tests, and docs updates.

Changes

Cohort / File(s) Summary
Compaction core
src/synthorg/engine/compaction/__init__.py, src/synthorg/engine/compaction/models.py, src/synthorg/engine/compaction/protocol.py, src/synthorg/engine/compaction/summarizer.py
New compaction package: exports CompactionCallback, CompactionConfig, CompressionMetadata; adds oldest-turns summarizer factory make_compaction_callback producing compressed conversation and CompressionMetadata.
Loop & checkpoint integration
src/synthorg/engine/react_loop.py, src/synthorg/engine/plan_execute_loop.py, src/synthorg/engine/loop_helpers.py, src/synthorg/engine/checkpoint/resume.py
Propagates optional compaction_callback into loop constructors; adds invoke_compaction helper that awaits callbacks and re-raises memory/recursion errors while logging others; checkpoint resume composes reconciliation compaction note and passes compaction_callback into created loops.
Context model & budget utilities
src/synthorg/engine/context.py, src/synthorg/engine/context_budget.py, src/synthorg/engine/token_estimation.py
Extends AgentContext/snapshot with fill tokens, capacity, compression_metadata, computed context_fill_percent; adds with_context_fill/with_compression. Adds ContextBudgetIndicator, estimate_context_fill, update_context_fill, and PromptTokenEstimator + DefaultTokenEstimator.
Prompt plumbing & templates
src/synthorg/engine/prompt.py, src/synthorg/engine/prompt_template.py, src/synthorg/engine/__init__.py
Threads context_budget_indicator into system prompt rendering and template (new section); moves token estimator types to token_estimation and updates imports; updates PROMPT_TEMPLATE_VERSION.
Observability events
src/synthorg/observability/events/context_budget.py
Adds context-budget event constants (started, skipped, completed, failed, fallback, indicator-injected, fill-updated).
Checkpoint cleanup behavior
src/synthorg/engine/checkpoint/resume.py
Checkpoint/heartbeat cleanup now re-raises MemoryError/RecursionError; other exceptions logged as warnings.
Tests
tests/unit/engine/compaction/*, tests/unit/engine/test_context_budget*, tests/unit/engine/test_prompt.py, tests/unit/memory/org/test_prompt_integration.py, tests/unit/observability/test_events.py
Adds unit tests for compaction models/summarizer, context-budget utilities and properties; updates prompt/version tests and token-estimator test imports.
Docs
docs/design/engine.md, docs/roadmap/open-questions.md, CLAUDE.md
Design, roadmap, and CLAUDE.md updated to document context budget management, compaction hook, and compressed checkpoint recovery (duplicate block noted).

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Loop as Loop (ReactLoop / PlanExecuteLoop)
participant Invoker as invoke_compaction
participant Compactor as CompactionCallback / Summarizer
participant Checkpoint as CheckpointManager
Loop->>Invoker: request compaction(ctx, callback, turn)
Invoker->>Compactor: await callback(ctx)
Compactor-->>Invoker: AgentContext | None (may include CompressionMetadata)
Invoker-->>Loop: return compacted context or None
alt compacted
Loop->>Checkpoint: persist compressed checkpoint (includes CompressionMetadata)
end

mermaid
sequenceDiagram
participant Resume as Checkpoint.resume
participant Deserialize as deserialize_and_reconcile
participant LoopFactory as make_loop_with_callback
Resume->>Deserialize: load snapshot (+ CompressionMetadata)
Deserialize->>Resume: build reconciliation ChatMessage with compaction_note
Resume->>LoopFactory: create loop with compaction_callback propagated
LoopFactory-->>Resume: loop instance (with compaction wiring)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.51% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely summarizes the main objective: implementing context budget management in execution loops, which is the primary focus of all changes.
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset, covering all major changes including context tracking, indicators, compaction hooks, and test coverage.
Linked Issues check ✅ Passed The changeset fully implements the requirements from issue #416: soft budget indicators, compaction hooks, compressed checkpoint recovery, and hook patterns matching existing callbacks.
Out of Scope Changes check ✅ Passed All code changes are directly aligned with the #416 objectives; no out-of-scope modifications detected. Changes are focused on context budget management infrastructure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/context-budget-management
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/context-budget-management
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive system for managing the context budget within the engine's execution loops. It addresses the challenge of large language model context window limitations by implementing token usage tracking, dynamic conversation compaction, and clear visibility into the current context fill level. This ensures that agents can operate more efficiently within their token limits, preventing overflows and maintaining conversational coherence over longer interactions through intelligent summarization of older turns.

Highlights

  • Context Window Capacity Tracking: Added context_fill_tokens, context_capacity_tokens, and compression_metadata fields to AgentContext, along with a context_fill_percent computed property for tracking the context window usage.
  • Context Budget Indicator: Introduced ContextBudgetIndicator model to display formatted soft budget information (e.g., [Context: 12,450/16,000 tokens (78%) | 0 archived blocks]) directly within system prompts via a new _SECTION_CONTEXT_BUDGET.
  • Prompt Token Estimator Extension: Extended the PromptTokenEstimator protocol with estimate_conversation_tokens() to allow for more accurate per-message conversation size estimation.
  • Conversation Compaction Framework: Created a new compaction/ subpackage, including CompactionCallback type alias, CompactionConfig, CompactionResult, and CompressionMetadata models, along with a make_compaction_callback() factory for oldest-turns summarization.
  • Execution Loop Integration: Wired the compaction_callback into ReactLoop and PlanExecuteLoop constructors, enabling turn-boundary invocation and error isolation for context compaction.
  • Checkpoint Reconciliation Update: Updated checkpoint/resume.py to pass through the compaction_callback in make_loop_with_callback() and added compression-aware reconciliation messages for recovering from compressed checkpoints.
  • Observability Events: Added new observability event constants related to context budget management, including fill updates, compaction lifecycle, and indicator injection.
Changelog
  • src/synthorg/engine/checkpoint/resume.py
    • Modified reconciliation message to include compression details if conversation was compacted.
    • Updated make_loop_with_callback to pass compaction_callback to new loop instances.
  • src/synthorg/engine/compaction/init.py
    • Added new subpackage for context compaction.
    • Exported CompactionConfig, CompactionResult, CompressionMetadata, and CompactionCallback.
  • src/synthorg/engine/compaction/models.py
    • Added CompactionConfig model for configuring compaction behavior.
    • Added CompactionResult model for capturing compaction outcomes.
    • Added CompressionMetadata model for storing details about conversation compression.
  • src/synthorg/engine/compaction/protocol.py
    • Added CompactionCallback type alias for asynchronous compaction functions.
  • src/synthorg/engine/compaction/summarizer.py
    • Added make_compaction_callback factory to create compaction callbacks.
    • Implemented core compaction logic to summarize oldest turns when context fill exceeds a threshold.
    • Included logic to preserve system messages and recent turns during compaction.
  • src/synthorg/engine/context.py
    • Imported CompressionMetadata.
    • Added context_fill_tokens, context_capacity_tokens, and compression_metadata fields to AgentContext.
    • Added context_fill_percent as a computed property to AgentContext.
    • Updated from_identity method to accept context_capacity_tokens.
    • Added with_context_fill method to update estimated context fill level.
    • Added with_compression method to replace conversation with a compressed version and update metadata.
    • Updated to_snapshot method to include context_fill_tokens and context_fill_percent.
  • src/synthorg/engine/context_budget.py
    • Added ContextBudgetIndicator model for displaying context usage information.
    • Added make_context_indicator function to create an indicator from AgentContext.
    • Added estimate_context_fill function to calculate total context fill in tokens.
    • Added update_context_fill function to re-estimate and update context fill in AgentContext.
  • src/synthorg/engine/plan_execute_loop.py
    • Imported CONTEXT_BUDGET_COMPACTION_FAILED event.
    • Imported CompactionCallback type.
    • Added compaction_callback parameter to PlanExecuteLoop constructor and as a property.
    • Integrated compaction_callback invocation at turn boundaries within the _execute_step method.
    • Added _invoke_compaction method to safely call the compaction callback and log errors.
  • src/synthorg/engine/prompt.py
    • Imported ChatMessage.
    • Extended PromptTokenEstimator protocol with estimate_conversation_tokens method.
    • Added _PER_MESSAGE_OVERHEAD constant to DefaultTokenEstimator.
    • Implemented estimate_conversation_tokens in DefaultTokenEstimator.
    • Added _SECTION_CONTEXT_BUDGET constant for prompt sections.
    • Updated build_system_prompt to accept context_budget_indicator.
    • Modified internal prompt building functions (_build_template_context, _compute_sections, _trim_sections, _render_with_trimming, _build_prompt_result, _render_and_estimate) to incorporate the context budget indicator.
  • src/synthorg/engine/prompt_template.py
    • Added a new Context Budget section to the default prompt template, conditionally rendered if context_budget is provided.
  • src/synthorg/engine/react_loop.py
    • Imported CONTEXT_BUDGET_COMPACTION_FAILED event.
    • Imported CompactionCallback type.
    • Added compaction_callback parameter to ReactLoop constructor and as a property.
    • Integrated compaction_callback invocation at turn boundaries within the execute method.
    • Added _invoke_compaction method to safely call the compaction callback and log errors.
  • src/synthorg/observability/events/context_budget.py
    • Added new constants for context budget related observability events, including fill updates, compaction lifecycle, and indicator injection.
  • tests/unit/engine/compaction/test_models.py
    • Added unit tests for CompactionConfig, CompactionResult, and CompressionMetadata models, covering validation, defaults, and immutability.
  • tests/unit/engine/compaction/test_summarizer.py
    • Added unit tests for make_compaction_callback and its compaction logic, covering scenarios like below/above threshold, insufficient messages, and multiple compactions.
  • tests/unit/engine/test_context_budget.py
    • Added unit tests for ContextBudgetIndicator formatting and fill percentage calculation.
    • Added unit tests for estimate_context_fill, make_context_indicator, and update_context_fill functions.
    • Added unit tests for DefaultTokenEstimator.estimate_conversation_tokens.
  • tests/unit/engine/test_context_budget_properties.py
    • Added Hypothesis property-based tests for ContextBudgetIndicator to ensure format safety and non-negative fill percentages.
    • Added Hypothesis property-based tests for DefaultTokenEstimator.estimate_conversation_tokens to ensure non-negative token counts.
  • tests/unit/engine/test_prompt.py
    • Updated CountingEstimator mock to include estimate_conversation_tokens for prompt tests.
  • tests/unit/observability/test_events.py
    • Added context_budget to the list of discovered domain modules for observability events.
Activity
  • 35 new unit and property-based tests were added, covering indicator formatting, fill estimation, compaction models, and summarizer logic.
  • Hypothesis property-based tests were implemented for indicator format safety and token estimation.
  • All 8721 existing tests passed, indicating no regressions.
  • Ruff linting and formatting checks passed.
  • Mypy strict mode checks passed.
  • Pre-push hooks (mypy, pytest, commitizen) passed.
  • Automated checks pre-reviewed the pull request.
  • Review agents were launched but timed out, suggesting manual review is recommended.
  • The design aligns with existing hook patterns like checkpoint_callback and stagnation_detector.
  • Existing ModelCapabilities.max_context_tokens was leveraged, requiring no provider layer changes.
  • The system prompt is not rebuilt per turn; the indicator is set at build time, and fill tracking drives compaction programmatically.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 17, 2026 10:40 — with GitHub Actions Inactive
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 88.41463% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.12%. Comparing base (b7d8172) to head (f454896).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/engine/loop_helpers.py 27.27% 7 Missing and 1 partial ⚠️
src/synthorg/engine/plan_execute_loop.py 66.66% 2 Missing and 2 partials ⚠️
src/synthorg/engine/context.py 83.33% 2 Missing and 1 partial ⚠️
src/synthorg/engine/checkpoint/resume.py 71.42% 2 Missing ⚠️
src/synthorg/engine/react_loop.py 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #520      +/-   ##
==========================================
- Coverage   93.16%   93.12%   -0.04%     
==========================================
  Files         506      513       +7     
  Lines       24490    24644     +154     
  Branches     2333     2344      +11     
==========================================
+ Hits        22815    22950     +135     
- Misses       1328     1342      +14     
- Partials      347      352       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a great feature that adds much-needed context budget management to the engine. The implementation is thorough, with new models, protocols, and extensive tests. The use of a pluggable callback for compaction is a clean design. I have a few suggestions to improve maintainability and correctness.

Comment on lines +72 to +88
compression = checkpoint_ctx.compression_metadata
if compression is not None:
reconciliation_content = (
f"Execution resumed from checkpoint at turn "
f"{checkpoint_ctx.turn_count}. "
f"Note: conversation was previously compacted "
f"(archived {compression.archived_turns} turns). "
f"Previous error: {error_message}. "
"Review progress and continue."
)
else:
reconciliation_content = (
f"Execution resumed from checkpoint at turn "
f"{checkpoint_ctx.turn_count}. Previous error: "
f"{error_message}. "
"Review progress and continue."
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's some duplicated code for constructing the reconciliation_content. The base message is the same in both branches of the if/else statement. This can be simplified to reduce duplication and improve maintainability.

    compression = checkpoint_ctx.compression_metadata
    compaction_note = (
        f"Note: conversation was previously compacted "
        f"(archived {compression.archived_turns} turns). "
        if compression is not None
        else ""
    )
    reconciliation_content = (
        f"Execution resumed from checkpoint at turn "
        f"{checkpoint_ctx.turn_count}. {compaction_note}"
        f"Previous error: {error_message}. "
        "Review progress and continue."
    )

Comment on lines +129 to +134
metadata = CompressionMetadata(
compression_point=ctx.turn_count,
archived_turns=prior_archived + len(archivable),
summary_tokens=summary_tokens,
compactions_performed=compactions_count,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistency in the unit for archived_turns. The field name and its usage in other parts of the system (like the checkpoint reconciliation message) imply the unit is "turns" (i.e., user/assistant pairs). However, you're assigning len(archivable), which is the number of messages. This will lead to incorrect reporting (e.g., "archived 6 turns" instead of "archived 3 turns").

Assuming a turn is a pair of messages, you should divide the message count by two. This logic should be applied consistently where archived_turns is calculated and logged. You'll also need to update the archived_turns value in the log event on line 146, and consider updating the summary string in _build_summary (lines 178 and 184) to report the number of turns instead of messages for consistency.

Suggested change
metadata = CompressionMetadata(
compression_point=ctx.turn_count,
archived_turns=prior_archived + len(archivable),
summary_tokens=summary_tokens,
compactions_performed=compactions_count,
)
archived_turn_count = len(archivable) // 2
metadata = CompressionMetadata(
compression_point=ctx.turn_count,
archived_turns=prior_archived + archived_turn_count,
summary_tokens=summary_tokens,
compactions_performed=compactions_count,
)

Comment on lines +843 to +871
async def _invoke_compaction(
self,
ctx: AgentContext,
turn_number: int,
) -> AgentContext | None:
"""Invoke compaction callback if configured.

Errors are logged but never propagated — compaction must
not interrupt execution.

Args:
ctx: Current agent context.
turn_number: Current turn number for logging.

Returns:
Compacted context or ``None``.
"""
try:
return await self._compaction_callback(ctx) # type: ignore[misc]
except MemoryError, RecursionError:
raise
except Exception as exc:
logger.exception(
CONTEXT_BUDGET_COMPACTION_FAILED,
execution_id=ctx.execution_id,
turn=turn_number,
error=f"{type(exc).__name__}: {exc}",
)
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This _invoke_compaction method is identical to the one in ReactLoop. To avoid code duplication and improve maintainability, this logic should be extracted into a shared helper function. A good place for it would be in src/synthorg/engine/loop_helpers.py, alongside other shared loop utilities like check_stagnation and execute_tool_calls. You could then call this helper from both loops.

Comment on lines +311 to +339
async def _invoke_compaction(
self,
ctx: AgentContext,
turn_number: int,
) -> AgentContext | None:
"""Invoke compaction callback if configured.

Errors are logged but never propagated — compaction must
not interrupt execution.

Args:
ctx: Current agent context.
turn_number: Current turn number for logging.

Returns:
Compacted context or ``None``.
"""
try:
return await self._compaction_callback(ctx) # type: ignore[misc]
except MemoryError, RecursionError:
raise
except Exception as exc:
logger.exception(
CONTEXT_BUDGET_COMPACTION_FAILED,
execution_id=ctx.execution_id,
turn=turn_number,
error=f"{type(exc).__name__}: {exc}",
)
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This _invoke_compaction method is identical to the one in PlanExecuteLoop. To avoid code duplication and improve maintainability, this logic should be extracted into a shared helper function. A good place for it would be in src/synthorg/engine/loop_helpers.py, alongside other shared loop utilities like check_stagnation and execute_tool_calls. You could then call this helper from both loops.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/context_budget.py`:
- Around line 141-175: The logger in update_context_fill logs
ctx.context_fill_percent which reflects the old state; instead compute or obtain
the new percentage after computing fill and log that instead. Specifically,
after calculating fill (the local variable) either compute new_fill_percent =
fill / ctx.context_capacity_tokens (or call ctx.with_context_fill(fill) and read
its context_fill_percent) and use that value in the logger.debug call (replace
ctx.context_fill_percent with the new_fill_percent), keeping the same message
keys (execution_id, fill_tokens, capacity_tokens, fill_percent) so logs reflect
the updated context fill.

In `@src/synthorg/engine/context.py`:
- Around line 289-324: Reject impossible values before creating the new
AgentContext: in with_context_fill(validate that fill_tokens is >= 0, log an
error and raise ValueError if not) and in with_compression(validate that
fill_tokens is >= 0, metadata.archived_turns is >= 0, and
metadata.compression_point is not greater than the total turns implied by
archived_turns + len(compressed_conversation); if any check fails log a clear
error mentioning compression_metadata/archived_turns/compression_point and raise
ValueError), otherwise proceed to return the model_copy updating conversation,
compression_metadata, and context_fill_tokens as before.

In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 841-871: Extract the duplicated compaction logic into a shared
helper function (e.g., invoke_compaction) and have both _invoke_compaction
methods delegate to it: create loop_helpers.invoke_compaction(callback, ctx,
turn_number) that awaits callback(ctx), re-raises MemoryError and
RecursionError, logs other exceptions with the same logger.exception
call/arguments (use CONTEXT_BUDGET_COMPACTION_FAILED,
execution_id=ctx.execution_id, turn=turn_number, error=f"{type(exc).__name__}:
{exc}") and returns None on error; then replace
SynthOrgEngine._invoke_compaction and ReactLoop._invoke_compaction bodies to
call that helper with the existing compaction callback (preserving signature and
behavior).

In `@src/synthorg/engine/prompt.py`:
- Around line 155-176: The estimator in estimate_conversation_tokens currently
omits msg.tool_calls, undercounting tokens for ASSISTANT messages with tool
invocations; update estimate_conversation_tokens to, for each message that has a
non-empty msg.tool_calls tuple, iterate its ToolCall entries and add an
estimated token count for each (include the ToolCall.id, ToolCall.name and a
serialized form of ToolCall.arguments), e.g. serialize arguments (json.dumps or
similar), measure lengths and convert to token estimates (length // 4) and add
those to total (you may also include per-call overhead similar to
self._PER_MESSAGE_OVERHEAD); use the existing ChatMessage, ToolCall attributes
(msg.tool_calls, msg.tool_result, msg.content) to locate where to add this
logic.

In `@tests/unit/engine/compaction/test_models.py`:
- Around line 13-14: The test module currently uses per-class decorators like
`@pytest.mark.unit` (e.g., TestCompactionConfig); add a module-level pytestmark =
[pytest.mark.unit, pytest.mark.timeout(30)] to enforce the 30s timeout per
guidelines and then remove individual `@pytest.mark.unit` decorators from test
classes such as TestCompactionConfig so the module-level marks apply
consistently.

In `@tests/unit/engine/test_context_budget_properties.py`:
- Around line 13-14: Add a 30-second timeout decorator to the test class by
applying pytest.mark.timeout(30) to TestContextBudgetIndicatorProperties; ensure
pytest is imported and annotate the class with `@pytest.mark.timeout`(30) so all
tests in the class inherit the timeout requirement.

In `@tests/unit/engine/test_context_budget.py`:
- Around line 105-112: Update the test_with_tools test to import the
_TOOL_DEFINITION_TOKEN_OVERHEAD constant from synthorg.engine.context_budget and
use it in the assertion instead of the hardcoded 50; locate the test function
test_with_tools and the call to estimate_context_fill, add the import for
_TOOL_DEFINITION_TOKEN_OVERHEAD, and change the assertion to assert result ==
100 + 3 * _TOOL_DEFINITION_TOKEN_OVERHEAD so the test stays correct if the
constant changes.
- Around line 20-21: Add a 30-second pytest timeout to the test module by
declaring a module-level pytest mark (e.g., pytestmark =
pytest.mark.timeout(30)) so all tests in this file, including the
TestContextBudgetIndicator class, inherit the timeout; place the pytestmark
declaration near the top of the file alongside existing pytest imports to
satisfy the coding guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ebca776b-3cfb-4d4b-b02c-7a84fe639e8c

📥 Commits

Reviewing files that changed from the base of the PR and between 32f553d and 8c7224a.

📒 Files selected for processing (19)
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/compaction/protocol.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/prompt_template.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/observability/events/context_budget.py
  • tests/unit/engine/compaction/__init__.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/compaction/test_summarizer.py
  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/test_prompt.py
  • tests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) — PEP 758 except syntax enforced by ruff on Python 3.14.
Add type hints to all public functions; enforce mypy strict mode compliance.
Use Google-style docstrings on all public classes and functions; enforce via ruff D rules.
Keep functions under 50 lines and files under 800 lines.
Handle errors explicitly and never silently swallow exceptions.
Validate at system boundaries (user input, external APIs, config files).
Set line length to 88 characters (ruff enforces).
Use uv for Python dependency management. Install with uv sync, run tests with -n auto for parallelism, lint with ruff check, format with ruff format, type-check with mypy.
Scripts in scripts/ have relaxed ruff rules: print and deferred imports are allowed.

Files:

  • tests/unit/engine/compaction/test_models.py
  • src/synthorg/engine/prompt_template.py
  • tests/unit/observability/test_events.py
  • tests/unit/engine/compaction/test_summarizer.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/compaction/protocol.py
  • tests/unit/engine/test_prompt.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/compaction/__init__.py
  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/test_context_budget.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow. Maintain 80% minimum code coverage.
Use asyncio_mode = 'auto' for async tests — no manual @pytest.mark.asyncio needed. Set test timeout to 30 seconds per test. Always include -n auto for parallelism via pytest-xdist.
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned test code, docstrings, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, test-provider, test-small-001, etc. Vendor names may only appear in Operations design page, .claude/ files, or third-party import paths.
Use Hypothesis for property-based testing with @given + @settings. Profiles: ci (200 examples, default) and dev (1000 examples) via HYPOTHESIS_PROFILE env var. Never skip, dismiss, or ignore flaky tests — fix them fully. Mock time.monotonic() and asyncio.sleep() for deterministic timing-sensitive tests instead of widening margins.

Files:

  • tests/unit/engine/compaction/test_models.py
  • tests/unit/observability/test_events.py
  • tests/unit/engine/compaction/test_summarizer.py
  • tests/unit/engine/test_prompt.py
  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/test_context_budget.py
**/*.{md,py,ts,tsx,vue,go,yml,yaml,json,toml}

📄 CodeRabbit inference engine (CLAUDE.md)

Use conventional commits: <type>: <description> where types are feat, fix, refactor, docs, test, chore, perf, ci. Commits must be GPG/SSH signed on main branch.

Files:

  • tests/unit/engine/compaction/test_models.py
  • src/synthorg/engine/prompt_template.py
  • tests/unit/observability/test_events.py
  • tests/unit/engine/compaction/test_summarizer.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/compaction/protocol.py
  • tests/unit/engine/test_prompt.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/compaction/__init__.py
  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/test_context_budget.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Create new objects instead of mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping. For dict/list fields in frozen Pydantic models, rely on frozen=True and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Adopt Pydantic v2 conventions: use @computed_field for derived values instead of storing redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Every module with business logic must have: from synthorg.observability import get_logger then logger = get_logger(__name__). Never use import logging, logging.getLogger(), or print() in application code.
Always use logger as the variable name (not _logger, not log). Use event name constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Always use structured kwargs for logging: logger.info(EVENT, key=value) — never use string interpolation like logger.info('msg %s', val).
Log all error paths at WARNING or ERROR with context before raising. Log all state transitions at INFO. Log object creation, internal flow, and entry/exit of key functions at DEBUG.
Do NOT use import logging or logging.getLogger(). Pure data models, ...

Files:

  • src/synthorg/engine/prompt_template.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/compaction/protocol.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/compaction/__init__.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.

Files:

  • src/synthorg/engine/prompt_template.py
  • src/synthorg/engine/compaction/protocol.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/compaction/__init__.py
🧠 Learnings (14)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Always use `logger` as the variable name (not `_logger`, not `log`). Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/budget/**/*.py : Cost tracking with pre-flight/in-flight budget checks, auto-downgrade, billing periods, cost tiers, quota/subscription tracking. CFO cost optimization: anomaly detection, efficiency analysis, downgrade recommendations, approval decisions. Budget errors: `BudgetExhaustedError`, `DailyLimitExceededError`, `QuotaExhaustedError`.

Applied to files:

  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/plan_execute_loop.py
  • tests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/engine/context.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising. Log all state transitions at INFO. Log object creation, internal flow, and entry/exit of key functions at DEBUG.

Applied to files:

  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/tools/**/*.py : Tool registry with built-in tools: file_system/, git (SSRF prevention via git_url_validator), sandbox/, code_runner. MCP bridge (mcp/). Role-based access. Approval tool (request_human_approval).

Applied to files:

  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/engine/**/*.py : Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.

Applied to files:

  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/security/**/*.py : SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Create new objects instead of mutating existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` wrapping. For `dict`/`list` fields in frozen Pydantic models, rely on `frozen=True` and `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).

Applied to files:

  • src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings`. Profiles: `ci` (200 examples, default) and `dev` (1000 examples) via `HYPOTHESIS_PROFILE` env var. Never skip, dismiss, or ignore flaky tests — fix them fully. Mock `time.monotonic()` and `asyncio.sleep()` for deterministic timing-sensitive tests instead of widening margins.

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
🧬 Code graph analysis (11)
tests/unit/engine/compaction/test_models.py (1)
src/synthorg/engine/compaction/models.py (3)
  • CompactionConfig (10-39)
  • CompactionResult (42-69)
  • CompressionMetadata (72-103)
tests/unit/engine/compaction/test_summarizer.py (3)
src/synthorg/engine/compaction/models.py (1)
  • CompactionConfig (10-39)
src/synthorg/engine/compaction/summarizer.py (1)
  • make_compaction_callback (34-58)
src/synthorg/engine/context.py (2)
  • AgentContext (104-400)
  • from_identity (190-225)
src/synthorg/engine/compaction/protocol.py (1)
src/synthorg/engine/context.py (1)
  • AgentContext (104-400)
src/synthorg/engine/context.py (1)
src/synthorg/engine/compaction/models.py (1)
  • CompressionMetadata (72-103)
src/synthorg/engine/react_loop.py (1)
src/synthorg/engine/plan_execute_loop.py (2)
  • compaction_callback (138-140)
  • _invoke_compaction (843-871)
src/synthorg/engine/checkpoint/resume.py (2)
src/synthorg/engine/plan_execute_loop.py (4)
  • compaction_callback (138-140)
  • config (123-125)
  • approval_gate (128-130)
  • stagnation_detector (133-135)
src/synthorg/engine/react_loop.py (3)
  • compaction_callback (111-113)
  • approval_gate (101-103)
  • stagnation_detector (106-108)
src/synthorg/engine/compaction/summarizer.py (5)
tests/unit/engine/conftest.py (1)
  • engine (449-460)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (72-103)
src/synthorg/engine/prompt.py (6)
  • DefaultTokenEstimator (134-176)
  • PromptTokenEstimator (101-131)
  • estimate_tokens (108-117)
  • estimate_tokens (144-153)
  • estimate_conversation_tokens (119-131)
  • estimate_conversation_tokens (155-176)
src/synthorg/engine/context.py (3)
  • AgentContext (104-400)
  • context_fill_percent (180-187)
  • with_compression (302-324)
src/synthorg/engine/context_budget.py (1)
  • fill_percent (56-60)
src/synthorg/engine/plan_execute_loop.py (3)
src/synthorg/engine/react_loop.py (3)
  • compaction_callback (111-113)
  • approval_gate (101-103)
  • stagnation_detector (106-108)
src/synthorg/engine/stagnation/detector.py (1)
  • config (46-48)
src/synthorg/engine/context.py (1)
  • AgentContext (104-400)
src/synthorg/engine/compaction/__init__.py (1)
src/synthorg/engine/compaction/models.py (3)
  • CompactionConfig (10-39)
  • CompactionResult (42-69)
  • CompressionMetadata (72-103)
tests/unit/engine/test_context_budget_properties.py (2)
src/synthorg/engine/context_budget.py (3)
  • ContextBudgetIndicator (29-80)
  • format (62-80)
  • fill_percent (56-60)
src/synthorg/engine/prompt.py (3)
  • DefaultTokenEstimator (134-176)
  • estimate_conversation_tokens (119-131)
  • estimate_conversation_tokens (155-176)
tests/unit/engine/test_context_budget.py (3)
tests/unit/engine/conftest.py (2)
  • engine (449-460)
  • sample_agent_context (183-191)
src/synthorg/engine/context_budget.py (5)
  • ContextBudgetIndicator (29-80)
  • estimate_context_fill (112-138)
  • make_context_indicator (83-109)
  • update_context_fill (141-175)
  • fill_percent (56-60)
src/synthorg/engine/prompt.py (3)
  • DefaultTokenEstimator (134-176)
  • estimate_conversation_tokens (119-131)
  • estimate_conversation_tokens (155-176)
🔇 Additional comments (32)
tests/unit/observability/test_events.py (1)

229-229: LGTM!

The addition of "context_budget" to the expected domain modules set correctly aligns with the new src/synthorg/observability/events/context_budget.py module introduced in this PR for context budget observability events.

tests/unit/engine/compaction/test_models.py (1)

17-102: LGTM!

Comprehensive test coverage for compaction models:

  • CompactionConfig: Validates defaults, threshold bounds (gt=0.0, le=100.0), minimum turns (ge=2), preserve recent (ge=1), and frozen immutability.
  • CompactionResult: Validates construction and frozen behavior.
  • CompressionMetadata: Validates default compactions_performed=1, custom values, and frozen behavior.

Tests align well with the model field constraints defined in src/synthorg/engine/compaction/models.py.

tests/unit/engine/test_prompt.py (1)

475-479: LGTM!

The estimate_conversation_tokens stub correctly satisfies the expanded PromptTokenEstimator protocol interface. Returning 0 is appropriate for this test double since the test (test_custom_estimator_used) only validates that estimate_tokens is called during prompt construction.

src/synthorg/engine/prompt_template.py (1)

186-191: LGTM!

The Context Budget section follows the established template pattern for optional sections (conditional rendering with {% if context_budget %}). Placing it at the end of the template is appropriate for auxiliary runtime information that shouldn't interrupt core agent instructions.

tests/unit/engine/test_context_budget_properties.py (1)

17-88: LGTM!

Well-designed property-based tests using Hypothesis:

  • Strategies appropriately model the domain (non-negative fill tokens, optional capacity with minimum 1).
  • @settings(max_examples=200) aligns with the ci profile per coding guidelines.
  • Tests cover key invariants: formatting never crashes, fill percent is non-negative when capacity exists, and token estimation is non-negative.

Based on learnings: "Use Hypothesis for property-based testing with given + settings. Profiles: ci (200 examples, default)..."

src/synthorg/engine/compaction/protocol.py (1)

1-24: LGTM!

Clean type alias definition following the established CheckpointCallback pattern. The signature correctly models the compaction callback semantics:

  • Input: current AgentContext
  • Output: new AgentContext with compressed conversation (compaction ran) or None (no compaction needed)

The TYPE_CHECKING guard appropriately avoids runtime circular imports.

src/synthorg/engine/checkpoint/resume.py (2)

72-93: LGTM!

The compression-aware reconciliation message enhancement correctly informs the agent about prior compaction when resuming from a checkpoint. The conditional logic appropriately:

  • Includes archived turn count when compression_metadata exists
  • Falls back to the simpler message format when no compression occurred

This aligns with the PR objective for "compressed checkpoint recovery."


128-142: LGTM!

The compaction_callback is correctly propagated when creating new loop instances during checkpoint recovery. This ensures that:

  • ReactLoop (line 133) and PlanExecuteLoop (line 141) preserve compaction behavior
  • The pattern matches existing callback propagation for approval_gate and stagnation_detector
tests/unit/engine/test_context_budget.py (1)

24-200: LGTM!

Comprehensive test coverage for context budget functionality:

  • ContextBudgetIndicator: Validates formatting with known/unknown capacity, fill percent calculation edge cases (including zero fill), and frozen immutability.
  • estimate_context_fill: Correctly tests empty conversation, messages with overhead, and tool definitions.
  • make_context_indicator: Tests factory with both capacity scenarios.
  • update_context_fill: Validates fill token updates.
  • estimate_conversation_tokens: Edge cases including empty tuple, single message arithmetic, and empty content overhead.

Good use of pytest.approx for floating-point comparisons.

src/synthorg/engine/compaction/__init__.py (1)

1-20: LGTM!

Clean package initialization with well-structured public API exports. The docstring clearly describes the compaction hook's purpose, and the __all__ correctly exposes the four public symbols from the subpackage.

tests/unit/engine/compaction/test_summarizer.py (2)

1-36: LGTM!

Well-structured test helpers. _msg provides a clean factory for chat messages, and _build_context properly uses from_identity followed by model_copy(update=...) to set up test contexts with specific fill levels and conversation states.


39-232: LGTM!

Comprehensive test coverage for the compaction callback factory. Tests exercise:

  • Threshold logic (below/above)
  • Edge cases (too few messages, nothing to archive, unknown capacity)
  • Multi-step compaction with metadata increment verification

The assertions correctly verify system message preservation, summary content, and compression metadata state.

src/synthorg/engine/compaction/models.py (1)

1-103: LGTM!

Well-designed frozen Pydantic models following project conventions:

  • All models use ConfigDict(frozen=True) for immutability
  • Field constraints are appropriate (e.g., gt=0.0, le=100.0 for percentages, ge=2 for minimum turns)
  • Google-style docstrings with Attributes sections
  • Sensible defaults aligned with conservative compaction behavior

Based on learnings: "Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state."

src/synthorg/engine/react_loop.py (4)

13-15: LGTM!

Clean integration of compaction callback into ReactLoop:

  • Event constant imported from the correct domain-specific module
  • Docstring properly documents the new parameter
  • Constructor stores the callback following the same pattern as other optional callbacks

Also applies to: 53-53, 82-98


110-113: LGTM!

Property accessor follows the established pattern for approval_gate and stagnation_detector.


208-215: LGTM!

Compaction is correctly invoked at turn boundaries after stagnation detection. The placement ensures the turn is fully processed before attempting compression, and the context is updated only if compaction succeeds.


311-339: LGTM!

Error isolation pattern correctly mirrors the checkpoint callback handling:

  • MemoryError and RecursionError are re-raised (non-recoverable)
  • Other exceptions are logged with structured kwargs and swallowed to avoid interrupting execution
  • Uses logger.exception for full traceback capture

The # type: ignore[misc] comment is appropriate since the None-check happens at the call site (line 209).

src/synthorg/engine/compaction/summarizer.py (5)

1-31: LGTM!

Clean module structure with proper observability setup. The _MAX_SUMMARY_CHARS constant appropriately limits summary length to prevent unbounded growth.


34-58: LGTM!

Factory follows a clean closure pattern, capturing config and estimator. The inner _compact function is marked async to satisfy the CompactionCallback protocol even though _do_compaction is synchronous—this allows future async operations if needed.


61-109: LGTM!

Guard conditions are well-ordered:

  1. Check fill percentage exists and exceeds threshold
  2. Check minimum message count
  3. Check that there are messages to archive after preservation

Logging at appropriate levels (DEBUG for skips, INFO for start).


111-155: LGTM!

Core compression logic is sound:

  • System message preserved at index 0
  • Archivable messages extracted correctly
  • Summary injected as a SYSTEM message (appropriate for context)
  • CompressionMetadata properly accumulates archived_turns across multiple compactions
  • Fill re-estimated after compression

158-184: LGTM!

_build_summary handles edge cases well:

  • Only extracts from ASSISTANT messages (user queries don't need summarization)
  • Truncates individual snippets to 100 chars
  • Falls back to a generic archival message when no suitable content exists
  • Final summary capped at _MAX_SUMMARY_CHARS
src/synthorg/observability/events/context_budget.py (1)

1-16: LGTM!

Well-structured event constants following the project's observability conventions:

  • Final[str] typing for immutability
  • Consistent dot-separated naming (context_budget.<category>.<action>)
  • Comprehensive coverage of fill tracking, compaction lifecycle, and indicator injection
src/synthorg/engine/plan_execute_loop.py (3)

17-19: LGTM!

Clean integration following the same pattern as ReactLoop. The compaction_callback parameter is properly documented and stored.

Also applies to: 76-76, 102-120


137-140: LGTM!

Property accessor follows the established pattern.


692-699: LGTM!

Compaction correctly invoked at turn boundaries within step execution. The placement before stagnation detection differs slightly from ReactLoop (which does it after), but this ordering is acceptable—both approaches process the complete turn before compacting.

src/synthorg/engine/context_budget.py (4)

1-27: LGTM!

Clean module setup with appropriate observability infrastructure. The _TOOL_DEFINITION_TOKEN_OVERHEAD constant (50 tokens per tool) is a reasonable heuristic for tool definition serialization overhead.


29-80: LGTM!

Well-designed frozen model with:

  • Proper @computed_field for fill_percent (avoids storing redundant data)
  • Null-safe percentage calculation
  • Human-readable format() output with comma-separated numbers and percentage display
  • Handles unknown capacity gracefully in formatting

83-109: LGTM!

Factory correctly constructs the indicator from context fields and logs at DEBUG level with structured kwargs.


112-138: LGTM!

Clean estimation function with keyword-only arguments for clarity. The formula correctly sums system prompt tokens, conversation tokens, and tool definition overhead.

src/synthorg/engine/context.py (1)

176-188: Nice use of @computed_field for context_fill_percent.

Keeping the percentage derived from the authoritative token counts/capacity avoids stale state as the context evolves. Based on learnings "Use computed_field for derived values instead of storing + validating redundant fields."

src/synthorg/engine/prompt.py (1)

705-758: Nice: the budget indicator is threaded through both render and trim paths.

That keeps estimated_tokens aligned with the final prompt after trimming.

Comment on lines +155 to +176
def estimate_conversation_tokens(
self,
messages: tuple[ChatMessage, ...],
) -> int:
"""Estimate total tokens across all messages.

Sums ``len(content) // 4 + overhead`` per message.
Tool results use their content field for estimation.

Args:
messages: The conversation messages to estimate.

Returns:
Estimated total token count (minimum 0).
"""
total = 0
for msg in messages:
content = msg.content or ""
if msg.tool_result is not None:
content = msg.tool_result.content or ""
total += len(content) // 4 + self._PER_MESSAGE_OVERHEAD
return total
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the ChatMessage model definition. If it exposes token-bearing fields
# beyond `content` / `tool_result`, fold them into estimate_conversation_tokens().
rg -n -A120 -B5 --type=py '^class ChatMessage\b' .

Repository: Aureliolo/synthorg

Length of output: 9326


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find ToolCall definition to see what fields it contains
rg -n -A30 -B2 '^class ToolCall\b' src/synthorg/providers/models.py

Repository: Aureliolo/synthorg

Length of output: 1242


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for tests of estimate_conversation_tokens
rg -n "estimate_conversation_tokens" --type=py .

# Also check if there are any comments in the prompt.py file about tool_calls estimation
echo "---"
rg -n -B5 -A5 "tool_call" src/synthorg/engine/prompt.py

Repository: Aureliolo/synthorg

Length of output: 1364


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Look at the main test for estimate_conversation_tokens
sed -n '172,220p' tests/unit/engine/test_context_budget.py

echo "---"

# Check if any test creates ChatMessage with tool_calls
rg -n "tool_calls=" tests/unit/engine/test_context_budget.py -A3 -B3

Repository: Aureliolo/synthorg

Length of output: 1018


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check all tests for messages with tool_calls
rg -n "tool_calls" tests/unit/engine/test_context_budget.py -A3 -B3

echo "---"

# Check if there's a test for tool_calls in the test_prompt.py mock
rg -n "tool_calls" tests/unit/engine/test_prompt.py -A3 -B3

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# View the full estimate_conversation_tokens implementation and surrounding context
sed -n '150,180p' src/synthorg/engine/prompt.py

echo "---"

# Check if there are any comments in the class about limitations
sed -n '85,160p' src/synthorg/engine/prompt.py | grep -A10 -B10 "DefaultTokenEstimator"

Repository: Aureliolo/synthorg

Length of output: 1518


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Look for any other token estimation for tool_calls elsewhere in the codebase
rg -n "tool_calls" src/synthorg/engine/ --type=py

echo "---"

# Check if there's any serialization/JSON encoding of tool_calls to understand token impact
rg -n "json" src/synthorg/engine/prompt.py -i -A2 -B2

Repository: Aureliolo/synthorg

Length of output: 3404


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if there are any assistant messages with tool_calls created in tests
rg -n "MessageRole.ASSISTANT" tests/unit/engine/test_context_budget.py -A5 -B2 | head -60

echo "---"

# Check the broader test structure to see if tool_calls scenarios are tested anywhere
rg -n "ToolCall" tests/unit/engine/ -l

Repository: Aureliolo/synthorg

Length of output: 1004


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if tool_calls are intentionally omitted with any comments
rg -n "tool_call" src/synthorg/engine/prompt.py -B3 -A3

# Verify tool_calls do get sent to LLM providers (proof they consume tokens)
rg -n "tool_calls" src/synthorg/providers/ --type=py | head -20

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

#!/bin/bash
set -euo pipefochk

# Check how tool_calls are serialized/sent to providers
rg -n "to_dict\|model_dump\|serialize" src/synthorg/providers/ --type=py | grep -i tool

echo "---"

# Check if tool_calls on assistant messages actually get sent to LLM providers
rg -n "tool_calls" src/synthorg/providers/anthropic_provider.py -B2 -A2

echo "---"

# Verify completion request structure includes tool_calls
rg -n "class.*Request" src/synthorg/providers/models.py | head -5

Repository: Aureliolo/synthorg

Length of output: 116


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check how tool_calls are serialized/sent to providers
rg -n "to_dict\|model_dump\|serialize" src/synthorg/providers/ --type=py | grep -i tool

echo "---"

# Check if tool_calls on assistant messages actually get sent to LLM providers
rg -n "tool_calls" src/synthorg/providers/anthropic_provider.py -B2 -A2

echo "---"

# Verify completion request structure includes tool_calls
rg -n "class.*Request" src/synthorg/providers/models.py | head -5

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find CompletionRequest definition to see if it includes messages with tool_calls
rg -n "class CompletionRequest" src/synthorg/providers/models.py -A30

echo "---"

# Check an actual provider implementation to see how messages are used
fd "openai_provider.py" --type f src/ 2>/dev/null | head -1 | xargs -I {} sed -n '1,100p' {}

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Simple check: look for CompletionRequest definition
rg -n "^class CompletionRequest" src/synthorg/ --type=py -A20

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for any TODOs, FIXMEs, or limitations mentioning tool_calls in token estimation
rg -n "TODO\|FIXME\|limitation\|XXX" src/synthorg/engine/prompt.py -i -A2 -B2

echo "---"

# Check the entire DefaultTokenEstimator class definition
sed -n '85,210p' src/synthorg/engine/prompt.py

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read the entire prompt.py file to see the full context
wc -l src/synthorg/engine/prompt.py

echo "---"

# Read the DefaultTokenEstimator class
head -250 src/synthorg/engine/prompt.py | tail -150

Repository: Aureliolo/synthorg

Length of output: 5094


Include tool_calls in token estimation for ASSISTANT messages.

The estimator currently ignores msg.tool_calls, which are token-bearing on ASSISTANT messages (id, name, arguments dict). When ASSISTANT messages include tool calls, the conversation token count is undercounted, causing context compaction to trigger later than planned. Update estimate_conversation_tokens() to sum token estimates for each ToolCall in the tuple alongside content and tool_result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/prompt.py` around lines 155 - 176, The estimator in
estimate_conversation_tokens currently omits msg.tool_calls, undercounting
tokens for ASSISTANT messages with tool invocations; update
estimate_conversation_tokens to, for each message that has a non-empty
msg.tool_calls tuple, iterate its ToolCall entries and add an estimated token
count for each (include the ToolCall.id, ToolCall.name and a serialized form of
ToolCall.arguments), e.g. serialize arguments (json.dumps or similar), measure
lengths and convert to token estimates (length // 4) and add those to total (you
may also include per-call overhead similar to self._PER_MESSAGE_OVERHEAD); use
the existing ChatMessage, ToolCall attributes (msg.tool_calls, msg.tool_result,
msg.content) to locate where to add this logic.

Comment on lines +13 to +14
@pytest.mark.unit
class TestCompactionConfig:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add pytest.mark.timeout(30) to comply with test guidelines.

Per coding guidelines, tests should have a 30-second timeout. Consider adding pytestmark at the module level for consistency with other test files.

💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
-@pytest.mark.unit
 class TestCompactionConfig:

Then remove individual @pytest.mark.unit decorators from each class.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.unit
class TestCompactionConfig:
pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
class TestCompactionConfig:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/compaction/test_models.py` around lines 13 - 14, The test
module currently uses per-class decorators like `@pytest.mark.unit` (e.g.,
TestCompactionConfig); add a module-level pytestmark = [pytest.mark.unit,
pytest.mark.timeout(30)] to enforce the 30s timeout per guidelines and then
remove individual `@pytest.mark.unit` decorators from test classes such as
TestCompactionConfig so the module-level marks apply consistently.

Comment on lines +13 to +14
@pytest.mark.unit
class TestContextBudgetIndicatorProperties:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add pytest.mark.timeout(30) for test timeout compliance.

Per coding guidelines, tests should have a 30-second timeout.

💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
-@pytest.mark.unit
 class TestContextBudgetIndicatorProperties:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_context_budget_properties.py` around lines 13 - 14,
Add a 30-second timeout decorator to the test class by applying
pytest.mark.timeout(30) to TestContextBudgetIndicatorProperties; ensure pytest
is imported and annotate the class with `@pytest.mark.timeout`(30) so all tests in
the class inherit the timeout requirement.

Comment on lines +20 to +21
@pytest.mark.unit
class TestContextBudgetIndicator:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add pytest.mark.timeout(30) for test timeout compliance.

Per coding guidelines, tests should have a 30-second timeout. Consider using pytestmark at module level.

💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
-@pytest.mark.unit
 class TestContextBudgetIndicator:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_context_budget.py` around lines 20 - 21, Add a
30-second pytest timeout to the test module by declaring a module-level pytest
mark (e.g., pytestmark = pytest.mark.timeout(30)) so all tests in this file,
including the TestContextBudgetIndicator class, inherit the timeout; place the
pytestmark declaration near the top of the file alongside existing pytest
imports to satisfy the coding guideline.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/engine/plan_execute_loop.py (1)

684-698: ⚠️ Potential issue | 🟠 Major

Compaction is skipped on step-completion turns.

When _run_step_turn() returns (ctx, bool) (Line 686), _execute_step() returns immediately and never reaches the compaction block (Lines 690-697). This means compaction only runs on continuing turns, not on terminal turns of a step.

♻️ Proposed fix
         while ctx.has_turns_remaining:
             result = await self._run_step_turn(
                 ctx,
                 provider,
                 executor_model,
                 config,
                 tool_defs,
                 tool_invoker,
                 turns,
                 budget_checker,
                 shutdown_checker,
             )
             if isinstance(result, ExecutionResult):
                 return result
             if isinstance(result, tuple):
-                return result
-            ctx = result
+                ctx, step_ok = result
+                compacted = await invoke_compaction(
+                    ctx,
+                    self._compaction_callback,
+                    ctx.turn_count,
+                )
+                if compacted is not None:
+                    ctx = compacted
+                return ctx, step_ok
+            ctx = result
 
             # Context compaction at turn boundaries
             compacted = await invoke_compaction(
                 ctx,
                 self._compaction_callback,
                 ctx.turn_count,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/plan_execute_loop.py` around lines 684 - 698, When
_execute_step() receives a tuple result from _run_step_turn() (i.e., (ctx,
bool)), it currently returns immediately and skips context compaction; modify
_execute_step() so it extracts ctx from the tuple result, calls
invoke_compaction(ctx, self._compaction_callback, ctx.turn_count), updates ctx
if compaction returns a new context, and then returns the original tuple with
the potentially compacted ctx and the original boolean flag. Make this change in
the branch that handles tuple results so invoke_compaction is applied to the ctx
before returning, using the same invoke_compaction, _compaction_callback, and
ctx.turn_count symbols already present.
♻️ Duplicate comments (4)
tests/unit/engine/test_context_budget_properties.py (1)

13-14: 🧹 Nitpick | 🔵 Trivial

Add pytest.mark.timeout(30) for test timeout compliance.

Per coding guidelines, tests should have a 30-second timeout. Consider using pytestmark at module level to apply both markers.

💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
-@pytest.mark.unit
 class TestContextBudgetIndicatorProperties:
     """Hypothesis property tests for ContextBudgetIndicator."""

Then remove the individual @pytest.mark.unit decorators from both classes.

Also applies to: 69-70

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_context_budget_properties.py` around lines 13 - 14,
Add a module-level pytestmark to apply both the unit marker and a 30-second
timeout by defining pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)] at
the top of the test module, then remove the per-class decorator(s) such as
`@pytest.mark.unit` from TestContextBudgetIndicatorProperties (and the other test
class referenced) so the markers are applied consistently via pytestmark.
tests/unit/engine/test_context_budget.py (2)

107-114: 🧹 Nitpick | 🔵 Trivial

Import _TOOL_DEFINITION_TOKEN_OVERHEAD instead of hardcoding 150.

The assertion hardcodes 100 + 150 where 150 = 3 × 50 (tool overhead). If _TOOL_DEFINITION_TOKEN_OVERHEAD changes, this test will fail with a confusing message. Import the constant to make the relationship explicit.

♻️ Proposed fix
 from synthorg.engine.context_budget import (
     ContextBudgetIndicator,
+    _TOOL_DEFINITION_TOKEN_OVERHEAD,
     estimate_context_fill,
     make_context_indicator,
     update_context_fill,
 )
     def test_with_tools(self) -> None:
         result = estimate_context_fill(
             system_prompt_tokens=100,
             conversation=(),
             tool_definitions_count=3,
         )
-        # 3 tools * 50 overhead = 150
-        assert result == 100 + 150
+        # 3 tools * overhead
+        assert result == 100 + 3 * _TOOL_DEFINITION_TOKEN_OVERHEAD
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_context_budget.py` around lines 107 - 114, The test
test_with_tools currently hardcodes the tool overhead (100 + 150) instead of
using the constant; update the test to import and use
_TOOL_DEFINITION_TOKEN_OVERHEAD and compute expected as system_prompt_tokens +
tool_definitions_count * _TOOL_DEFINITION_TOKEN_OVERHEAD (referencing the test
function test_with_tools and the constant _TOOL_DEFINITION_TOKEN_OVERHEAD) so
the assertion becomes dynamic and will remain correct if the overhead value
changes.

22-23: 🧹 Nitpick | 🔵 Trivial

Add pytest.mark.timeout(30) for test timeout compliance.

Per coding guidelines, tests should have a 30-second timeout.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_context_budget.py` around lines 22 - 23, The test
class TestContextBudgetIndicator is missing a timeout decorator; add
pytest.mark.timeout(30) as a decorator on the TestContextBudgetIndicator class
(above its declaration) so all tests in that class inherit the 30-second timeout
per guidelines; ensure you import pytest at top if not already present and apply
the decorator directly to the class definition.
src/synthorg/engine/prompt.py (1)

36-39: ⚠️ Potential issue | 🟠 Major

Imported estimator still undercounts assistant tool_calls payloads.

build_system_prompt() now relies on DefaultTokenEstimator from synthorg.engine.token_estimation, whose conversation estimator currently counts content/tool_result but not tool_calls. Prompt size is therefore under-estimated when assistant tool calls are present, which delays trimming.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/prompt.py` around lines 36 - 39, build_system_prompt()
switched to DefaultTokenEstimator which currently omits assistant tool_calls
payloads when estimating conversation tokens; update the estimator used or its
implementation so tool_calls are counted: either revert to PromptTokenEstimator
in build_system_prompt() or modify DefaultTokenEstimator (and its conversation
estimator logic) to include assistant.tool_calls (and nested tool_result
contents) in the token count. Locate the estimator classes in
synthorg.engine.token_estimation (DefaultTokenEstimator, PromptTokenEstimator)
and the build_system_prompt function, and ensure the conversation tokenization
logic accounts for the assistant message field "tool_calls" the same way it
handles "content" and "tool_result".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 185-190: The code in _build_summary currently emits
CONTEXT_BUDGET_COMPACTION_SKIPPED when no assistant snippets are found but still
returns a fallback summary and allows compaction to proceed, which mislabels the
event; update the logging so that CONTEXT_BUDGET_COMPACTION_SKIPPED is only used
when compaction truly is skipped, and for the "no assistant content" path either
remove that skipped log or replace it with a distinct event (e.g.,
CONTEXT_BUDGET_COMPACTION_FALLBACK or CONTEXT_BUDGET_NO_ASSISTANT_CONTENT) and
keep the fallback return behavior; modify the logger.debug call in
_build_summary (and related constants if needed) so metrics/alerts reflect the
actual outcome.
- Around line 101-126: The code incorrectly assumes conversation[0] is a system
message; instead, scan conversation to find the first message with role
MessageRole.SYSTEM (e.g., index_system = next((i for i,m in
enumerate(conversation) if m.role == MessageRole.SYSTEM), None)). If a system
message exists, keep it (use that index to build archivable =
conversation[index_system+1:-preserve_count] and recent =
conversation[-preserve_count:]) and place the summary after the existing system
message; if no system message is found, treat all but the preserved recent turns
as archivable (archivable = conversation[:-preserve_count]) and prepend the
generated summary_msg (role=MessageRole.SYSTEM) before recent turns. Update
construction of compressed_conversation accordingly and ensure _build_summary,
summary_msg, summary_tokens and estimator usage remain the same.

In `@src/synthorg/engine/context_budget.py`:
- Around line 121-147: The estimate_context_fill function should validate
numeric inputs at its boundary: check that system_prompt_tokens and
tool_definitions_count are non-negative before computing the sum (and optionally
validate estimator is not an invalid type); if either is negative, raise a
ValueError with a clear message; keep the existing logic (use estimator or
DefaultTokenEstimator, call estimate_conversation_tokens, and apply
_TOOL_DEFINITION_TOKEN_OVERHEAD) and reference the same symbols
(estimate_context_fill, DefaultTokenEstimator, estimator,
_TOOL_DEFINITION_TOKEN_OVERHEAD, conversation) so callers receive immediate,
deterministic errors for invalid inputs.

In `@tests/unit/engine/test_context_budget.py`:
- Around line 204-206: The test method test_capacity_tokens_zero_rejected is
testing ContextBudgetIndicator validation but is currently placed with
DefaultTokenEstimator tests; move the test into the TestContextBudgetIndicator
test class so it lives alongside other ContextBudgetIndicator tests, keeping the
same body (with pytest.raises(ValidationError):
ContextBudgetIndicator(fill_tokens=100, capacity_tokens=0)) and adjusting
indentation/placement so it becomes a method of TestContextBudgetIndicator and
is removed from the DefaultTokenEstimator class.

---

Outside diff comments:
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 684-698: When _execute_step() receives a tuple result from
_run_step_turn() (i.e., (ctx, bool)), it currently returns immediately and skips
context compaction; modify _execute_step() so it extracts ctx from the tuple
result, calls invoke_compaction(ctx, self._compaction_callback, ctx.turn_count),
updates ctx if compaction returns a new context, and then returns the original
tuple with the potentially compacted ctx and the original boolean flag. Make
this change in the branch that handles tuple results so invoke_compaction is
applied to the ctx before returning, using the same invoke_compaction,
_compaction_callback, and ctx.turn_count symbols already present.

---

Duplicate comments:
In `@src/synthorg/engine/prompt.py`:
- Around line 36-39: build_system_prompt() switched to DefaultTokenEstimator
which currently omits assistant tool_calls payloads when estimating conversation
tokens; update the estimator used or its implementation so tool_calls are
counted: either revert to PromptTokenEstimator in build_system_prompt() or
modify DefaultTokenEstimator (and its conversation estimator logic) to include
assistant.tool_calls (and nested tool_result contents) in the token count.
Locate the estimator classes in synthorg.engine.token_estimation
(DefaultTokenEstimator, PromptTokenEstimator) and the build_system_prompt
function, and ensure the conversation tokenization logic accounts for the
assistant message field "tool_calls" the same way it handles "content" and
"tool_result".

In `@tests/unit/engine/test_context_budget_properties.py`:
- Around line 13-14: Add a module-level pytestmark to apply both the unit marker
and a 30-second timeout by defining pytestmark = [pytest.mark.unit,
pytest.mark.timeout(30)] at the top of the test module, then remove the
per-class decorator(s) such as `@pytest.mark.unit` from
TestContextBudgetIndicatorProperties (and the other test class referenced) so
the markers are applied consistently via pytestmark.

In `@tests/unit/engine/test_context_budget.py`:
- Around line 107-114: The test test_with_tools currently hardcodes the tool
overhead (100 + 150) instead of using the constant; update the test to import
and use _TOOL_DEFINITION_TOKEN_OVERHEAD and compute expected as
system_prompt_tokens + tool_definitions_count * _TOOL_DEFINITION_TOKEN_OVERHEAD
(referencing the test function test_with_tools and the constant
_TOOL_DEFINITION_TOKEN_OVERHEAD) so the assertion becomes dynamic and will
remain correct if the overhead value changes.
- Around line 22-23: The test class TestContextBudgetIndicator is missing a
timeout decorator; add pytest.mark.timeout(30) as a decorator on the
TestContextBudgetIndicator class (above its declaration) so all tests in that
class inherit the 30-second timeout per guidelines; ensure you import pytest at
top if not already present and apply the decorator directly to the class
definition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a8ac89dc-7fcb-4e10-ba46-bd9f8dc30786

📥 Commits

Reviewing files that changed from the base of the PR and between 8c7224a and bf10489.

📒 Files selected for processing (20)
  • CLAUDE.md
  • docs/design/engine.md
  • docs/roadmap/open-questions.md
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/token_estimation.py
  • src/synthorg/observability/events/context_budget.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/compaction/test_summarizer.py
  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/test_prompt.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Deploy Preview
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) — PEP 758 except syntax enforced by ruff on Python 3.14.
Add type hints to all public functions; enforce mypy strict mode compliance.
Use Google-style docstrings on all public classes and functions; enforce via ruff D rules.
Keep functions under 50 lines and files under 800 lines.
Handle errors explicitly and never silently swallow exceptions.
Validate at system boundaries (user input, external APIs, config files).
Set line length to 88 characters (ruff enforces).
Use uv for Python dependency management. Install with uv sync, run tests with -n auto for parallelism, lint with ruff check, format with ruff format, type-check with mypy.
Scripts in scripts/ have relaxed ruff rules: print and deferred imports are allowed.

Files:

  • tests/unit/engine/test_context_budget_properties.py
  • src/synthorg/engine/checkpoint/resume.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/compaction/test_summarizer.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/plan_execute_loop.py
  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/test_prompt.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/token_estimation.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/react_loop.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow. Maintain 80% minimum code coverage.
Use asyncio_mode = 'auto' for async tests — no manual @pytest.mark.asyncio needed. Set test timeout to 30 seconds per test. Always include -n auto for parallelism via pytest-xdist.
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned test code, docstrings, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, test-provider, test-small-001, etc. Vendor names may only appear in Operations design page, .claude/ files, or third-party import paths.
Use Hypothesis for property-based testing with @given + @settings. Profiles: ci (200 examples, default) and dev (1000 examples) via HYPOTHESIS_PROFILE env var. Never skip, dismiss, or ignore flaky tests — fix them fully. Mock time.monotonic() and asyncio.sleep() for deterministic timing-sensitive tests instead of widening margins.

Files:

  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/compaction/test_summarizer.py
  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/test_prompt.py
**/*.{md,py,ts,tsx,vue,go,yml,yaml,json,toml}

📄 CodeRabbit inference engine (CLAUDE.md)

Use conventional commits: <type>: <description> where types are feat, fix, refactor, docs, test, chore, perf, ci. Commits must be GPG/SSH signed on main branch.

Files:

  • tests/unit/engine/test_context_budget_properties.py
  • docs/roadmap/open-questions.md
  • src/synthorg/engine/checkpoint/resume.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/compaction/test_summarizer.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/plan_execute_loop.py
  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/test_prompt.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/prompt.py
  • CLAUDE.md
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/token_estimation.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/react_loop.py
  • docs/design/engine.md
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Read relevant docs/design/ page before implementing features or planning issues. Design spec is the starting point for architecture, data models, and behavior.

Files:

  • docs/roadmap/open-questions.md
  • CLAUDE.md
  • docs/design/engine.md
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Documentation in Markdown built with Zensical. Design spec: 7 pages under docs/design/ (index, agents, organization, communication, engine, memory, operations). Architecture: docs/architecture/. Roadmap: docs/roadmap/. Security: docs/security.md. Licensing: docs/licensing.md. Reference: docs/reference/. REST API: docs/rest-api.md with generated Scalar UI. Library reference: docs/api/ (auto-generated from docstrings via mkdocstrings + Griffe).

Files:

  • docs/roadmap/open-questions.md
  • docs/design/engine.md
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Create new objects instead of mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping. For dict/list fields in frozen Pydantic models, rely on frozen=True and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Adopt Pydantic v2 conventions: use @computed_field for derived values instead of storing redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Every module with business logic must have: from synthorg.observability import get_logger then logger = get_logger(__name__). Never use import logging, logging.getLogger(), or print() in application code.
Always use logger as the variable name (not _logger, not log). Use event name constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Always use structured kwargs for logging: logger.info(EVENT, key=value) — never use string interpolation like logger.info('msg %s', val).
Log all error paths at WARNING or ERROR with context before raising. Log all state transitions at INFO. Log object creation, internal flow, and entry/exit of key functions at DEBUG.
Do NOT use import logging or logging.getLogger(). Pure data models, ...

Files:

  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/token_estimation.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/react_loop.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.

Files:

  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/token_estimation.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/react_loop.py
🧠 Learnings (28)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/engine/**/*.py : Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to tests/**/*.py : Use `asyncio_mode = 'auto'` for async tests — no manual `pytest.mark.asyncio` needed. Set test timeout to 30 seconds per test. Always include `-n auto` for parallelism via pytest-xdist.

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to tests/**/*.py : Mark tests with `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow`. Maintain 80% minimum code coverage.

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings`. Profiles: `ci` (200 examples, default) and `dev` (1000 examples) via `HYPOTHESIS_PROFILE` env var. Never skip, dismiss, or ignore flaky tests — fix them fully. Mock `time.monotonic()` and `asyncio.sleep()` for deterministic timing-sensitive tests instead of widening margins.

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/engine/**/*.py : Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.

Applied to files:

  • docs/roadmap/open-questions.md
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/plan_execute_loop.py
  • CLAUDE.md
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/react_loop.py
  • docs/design/engine.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration sampling, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping).

Applied to files:

  • docs/roadmap/open-questions.md
  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/context_budget.py
  • CLAUDE.md
  • src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Always use `logger` as the variable name (not `_logger`, not `log`). Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/context_budget.py
  • CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/budget/**/*.py : Cost tracking with pre-flight/in-flight budget checks, auto-downgrade, billing periods, cost tiers, quota/subscription tracking. CFO cost optimization: anomaly detection, efficiency analysis, downgrade recommendations, approval decisions. Budget errors: `BudgetExhaustedError`, `DailyLimitExceededError`, `QuotaExhaustedError`.

Applied to files:

  • src/synthorg/observability/events/context_budget.py
  • tests/unit/engine/test_context_budget.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/loop_helpers.py
  • docs/design/engine.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising. Log all state transitions at INFO. Log object creation, internal flow, and entry/exit of key functions at DEBUG.

Applied to files:

  • src/synthorg/observability/events/context_budget.py
  • CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/tools/**/*.py : Tool registry with built-in tools: file_system/, git (SSRF prevention via git_url_validator), sandbox/, code_runner. MCP bridge (mcp/). Role-based access. Approval tool (request_human_approval).

Applied to files:

  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/react_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/security/**/*.py : SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • src/synthorg/engine/plan_execute_loop.py
  • CLAUDE.md
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/react_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to docs/**/*.md : Documentation in Markdown built with Zensical. Design spec: 7 pages under `docs/design/` (index, agents, organization, communication, engine, memory, operations). Architecture: `docs/architecture/`. Roadmap: `docs/roadmap/`. Security: `docs/security.md`. Licensing: `docs/licensing.md`. Reference: `docs/reference/`. REST API: `docs/rest-api.md` with generated Scalar UI. Library reference: `docs/api/` (auto-generated from docstrings via mkdocstrings + Griffe).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/memory/**/*.py : Pluggable MemoryBackend protocol with backends/ adapters (Mem0 at backends/mem0/). Retrieval pipeline: ranking, RRF fusion, injection, context formatting, non-inferable filtering. Shared org memory (org/). Consolidation/archival (consolidation/).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Always use structured kwargs for logging: `logger.info(EVENT, key=value)` — never use string interpolation like `logger.info('msg %s', val)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging`, `logging.getLogger()`, or `print()` in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Do NOT use `import logging` or `logging.getLogger()`. Pure data models, enums, and re-exports do NOT need logging setup.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/api/**/*.py : Litestar REST + WebSocket API with JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint. Use RFC 9457 structured errors: `ErrorCategory`, `ErrorCode`, `ErrorDetail`, `ProblemDetail`, `CATEGORY_TITLES`, `category_title`, `category_type_uri`, content negotiation.

Applied to files:

  • src/synthorg/engine/loop_helpers.py
🧬 Code graph analysis (13)
tests/unit/engine/test_context_budget_properties.py (2)
src/synthorg/engine/context_budget.py (3)
  • ContextBudgetIndicator (32-84)
  • format (66-84)
  • fill_percent (60-64)
src/synthorg/engine/token_estimation.py (3)
  • DefaultTokenEstimator (48-90)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-90)
src/synthorg/engine/checkpoint/resume.py (3)
src/synthorg/engine/plan_execute_loop.py (4)
  • compaction_callback (136-138)
  • config (121-123)
  • approval_gate (126-128)
  • stagnation_detector (131-133)
src/synthorg/engine/react_loop.py (3)
  • compaction_callback (109-111)
  • approval_gate (99-101)
  • stagnation_detector (104-106)
src/synthorg/engine/stagnation/detector.py (1)
  • config (46-48)
tests/unit/engine/compaction/test_models.py (1)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
tests/unit/engine/compaction/test_summarizer.py (3)
src/synthorg/engine/compaction/models.py (1)
  • CompactionConfig (10-39)
src/synthorg/engine/compaction/summarizer.py (1)
  • make_compaction_callback (37-61)
src/synthorg/engine/context.py (2)
  • AgentContext (104-400)
  • from_identity (190-225)
src/synthorg/engine/__init__.py (1)
src/synthorg/engine/token_estimation.py (2)
  • DefaultTokenEstimator (48-90)
  • PromptTokenEstimator (15-45)
src/synthorg/engine/compaction/summarizer.py (3)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
src/synthorg/engine/token_estimation.py (6)
  • DefaultTokenEstimator (48-90)
  • PromptTokenEstimator (15-45)
  • estimate_tokens (22-31)
  • estimate_tokens (58-67)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-90)
src/synthorg/engine/context.py (3)
  • AgentContext (104-400)
  • context_fill_percent (180-187)
  • with_compression (302-324)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
  • invoke_compaction (654-689)
src/synthorg/engine/react_loop.py (1)
  • compaction_callback (109-111)
tests/unit/engine/test_context_budget.py (3)
src/synthorg/engine/context.py (6)
  • AgentContext (104-400)
  • from_identity (190-225)
  • context_fill_percent (180-187)
  • with_context_fill (289-300)
  • with_compression (302-324)
  • to_snapshot (371-395)
src/synthorg/engine/context_budget.py (6)
  • ContextBudgetIndicator (32-84)
  • estimate_context_fill (121-147)
  • make_context_indicator (87-118)
  • update_context_fill (150-186)
  • format (66-84)
  • fill_percent (60-64)
src/synthorg/engine/token_estimation.py (3)
  • DefaultTokenEstimator (48-90)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-90)
tests/unit/engine/test_prompt.py (1)
src/synthorg/engine/token_estimation.py (1)
  • DefaultTokenEstimator (48-90)
src/synthorg/engine/context_budget.py (2)
src/synthorg/engine/token_estimation.py (3)
  • DefaultTokenEstimator (48-90)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-90)
src/synthorg/engine/context.py (2)
  • AgentContext (104-400)
  • with_context_fill (289-300)
src/synthorg/engine/prompt.py (1)
src/synthorg/engine/token_estimation.py (2)
  • DefaultTokenEstimator (48-90)
  • PromptTokenEstimator (15-45)
src/synthorg/engine/compaction/__init__.py (1)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
src/synthorg/engine/react_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
  • invoke_compaction (654-689)
src/synthorg/engine/plan_execute_loop.py (3)
  • compaction_callback (136-138)
  • approval_gate (126-128)
  • stagnation_detector (131-133)
🔇 Additional comments (20)
src/synthorg/observability/events/context_budget.py (1)

1-15: LGTM!

Clean, minimal event constants module following the established patterns. Correctly organized by lifecycle phase (fill tracking, compaction lifecycle, indicator injection) with appropriate Final[str] typing. Pure data module appropriately omits logging setup per coding guidelines.

src/synthorg/engine/__init__.py (1)

186-189: LGTM!

Clean internal refactoring relocating token estimation exports to the dedicated token_estimation module. Public API surface remains unchanged.

tests/unit/engine/compaction/test_models.py (1)

12-74: LGTM!

Comprehensive test coverage for CompactionConfig and CompressionMetadata validation boundaries, defaults, and immutability. Tests are well-structured and cover the critical edge cases (threshold bounds, minimum constraints, frozen model behavior).

tests/unit/engine/test_prompt.py (2)

31-31: LGTM!

Import correctly updated to reflect the token estimation module relocation.


475-479: LGTM!

CountingEstimator now correctly implements the full PromptTokenEstimator protocol by adding estimate_conversation_tokens. Returning 0 is appropriate here since this test only verifies estimate_tokens is called during prompt construction.

docs/roadmap/open-questions.md (1)

12-12: LGTM!

Clear documentation updates reflecting the partial resolution of context window management. Good practice to track remaining work (LLM-based summarization, tiktoken estimator, AgentEngine wiring) for future iterations.

Also applies to: 24-24

tests/unit/engine/compaction/test_summarizer.py (1)

39-232: LGTM!

Excellent test coverage for the compaction callback factory, including:

  • Threshold boundary behavior (below/above)
  • Edge cases (too few messages, nothing to archive, unknown capacity)
  • Multiple compaction increment tracking
  • System message preservation and summary injection

Tests are well-structured with clear setup helpers and focused assertions.

src/synthorg/engine/checkpoint/resume.py (3)

72-93: LGTM!

Well-designed compression-aware reconciliation messaging. The conditional content construction provides helpful context to the agent about prior compaction, including the number of archived turns, which aids in understanding the conversation state upon resume.


128-142: LGTM!

Clean propagation of compaction_callback when reconstructing loop instances with checkpoint callbacks. Follows the established pattern for other loop properties (approval_gate, stagnation_detector).


176-177: LGTM!

Correct use of PEP 758 except A, B: syntax for re-raising MemoryError and RecursionError. These catastrophic errors should not be masked by the best-effort cleanup logic—good defensive coding.

Also applies to: 193-194

CLAUDE.md (2)

125-125: LGTM!

Accurately documents the new context budget management and compaction capabilities added to the engine package.


199-199: LGTM!

Context budget event constants properly added to the comprehensive event naming reference list, maintaining consistency with existing domain event patterns.

src/synthorg/engine/react_loop.py (1)

36-36: LGTM! Clean integration of compaction callback following established patterns.

The compaction hook integration mirrors the existing checkpoint_callback, stagnation_detector, and approval_gate patterns:

  • TYPE_CHECKING import for the protocol type
  • Optional constructor parameter with underscore-prefixed storage
  • Property accessor for external access
  • Invocation at turn boundaries with result replacement

The placement after stagnation detection is appropriate — compaction should run after determining the loop will continue.

Also applies to: 51-51, 80-82, 91-96, 108-111, 206-213

src/synthorg/engine/compaction/__init__.py (1)

1-18: LGTM! Clean package initialization with explicit public API.

The __all__ export list correctly exposes the three primary types consumers need: CompactionCallback, CompactionConfig, and CompressionMetadata.

docs/design/engine.md (1)

587-663: LGTM! Comprehensive documentation for the context budget management feature.

The new section accurately documents:

  • Context fill tracking fields on AgentContext
  • Soft budget indicator format and injection point
  • CompactionCallback type alias and wiring pattern
  • CompactionConfig parameters with defaults
  • Compressed checkpoint recovery behavior
  • Loop-specific integration points (ReactLoop vs PlanExecuteLoop)

The documentation aligns with the implementation in the reviewed code files.

tests/unit/engine/test_context_budget_properties.py (1)

17-39: LGTM! Solid property test for format robustness.

The strategy correctly handles the capacity_tokens constraint (gt=0 in the model) by using min_value=1 in the st.integers strategy.

src/synthorg/engine/token_estimation.py (2)

14-45: LGTM! Clean protocol definition and heuristic implementation.

The PromptTokenEstimator protocol is appropriately @runtime_checkable for duck typing, and DefaultTokenEstimator provides a reasonable len(text) // 4 baseline. The docstring clearly notes this is a rough estimate suitable for swapping with a tiktoken-based estimator for precision.

Also applies to: 48-90


84-89: No action needed. The code correctly prioritizes tool_result.content over msg.content when both could exist. This is intentional and explicitly documented in the method docstring: "Tool results use their content field for estimation." Since tool_result is marked for tool messages and content is optional, this behavior aligns with the ChatMessage schema design.

src/synthorg/engine/loop_helpers.py (1)

653-689: LGTM! invoke_compaction follows established helper patterns.

The function correctly:

  • Returns early with None when no callback is configured
  • Awaits the callback and returns the compacted context on success
  • Re-raises MemoryError and RecursionError unconditionally
  • Logs other exceptions with structured kwargs and returns None (non-fatal)

This matches the error isolation pattern used by check_stagnation, check_shutdown, and check_budget.

src/synthorg/engine/compaction/models.py (1)

10-39: LGTM! Well-structured frozen Pydantic models with appropriate constraints.

Both models follow project conventions:

  • ConfigDict(frozen=True) for immutability
  • Google-style docstrings with Attributes sections
  • Field constraints that prevent invalid states (gt, ge, le)
  • Sensible defaults for CompactionConfig (80% threshold, 4 min messages, 3 preserved turns)

The CompressionMetadata correctly requires explicit values for compression_point, archived_turns, and summary_tokens (no defaults), while compactions_performed defaults to 1 for the first compaction.

Also applies to: 42-73

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
tests/unit/engine/test_context_budget.py (1)

23-24: 🧹 Nitpick | 🔵 Trivial

Consider adding module-level timeout marker.

Per coding guidelines, tests should have a 30-second timeout. A module-level pytestmark could enforce this:

pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]

However, if timeout enforcement is configured globally in pytest.ini or pyproject.toml, this may be unnecessary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_context_budget.py` around lines 23 - 24, Add a
module-level pytestmark to enforce the 30s timeout by defining pytestmark =
[pytest.mark.unit, pytest.mark.timeout(30)] at the top of the test module (where
the TestContextBudgetIndicator class is declared); alternatively, if timeouts
are already enforced globally via pytest.ini/pyproject.toml, confirm that and
skip adding the marker.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 137-143: The archived_turn_count calculation currently uses
len(archivable) // 2 which assumes perfect user+assistant pairs; instead compute
archived_turn_count by counting actual assistant-turn messages in the archivable
list to reflect real turns (e.g., count items where role/type indicates
assistant or equivalent), then construct CompressionMetadata using that count
(keep references to archived_turn_count, archivable, CompressionMetadata, and
ctx.turn_count in summarizer.py); ensure prior_archived is still added and
summary_tokens/compactions_performed remain unchanged so the compressed metadata
accurately reflects archived assistant turns.

In `@src/synthorg/engine/token_estimation.py`:
- Around line 94-99: The token estimate for tool calls currently only accounts
for tc.name and tc.arguments; include the tool call identifier as well (the
serialized id field) when computing tc_tokens so the estimate matches the
comment ("id, name, serialized arguments"). Update the calculation in the
token_estimation logic that computes tc_tokens (where tc.name, tc.arguments and
self._PER_MESSAGE_OVERHEAD are used) to also add the serialized length
contribution of tc.id (e.g., include len(str(tc.id)) // 4 or equivalent
serialization length) before adding to total.

---

Duplicate comments:
In `@tests/unit/engine/test_context_budget.py`:
- Around line 23-24: Add a module-level pytestmark to enforce the 30s timeout by
defining pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)] at the top of
the test module (where the TestContextBudgetIndicator class is declared);
alternatively, if timeouts are already enforced globally via
pytest.ini/pyproject.toml, confirm that and skip adding the marker.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f740a26f-eb5d-4a84-aed9-dfdbb2509067

📥 Commits

Reviewing files that changed from the base of the PR and between bf10489 and 86511ea.

📒 Files selected for processing (7)
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/token_estimation.py
  • tests/unit/engine/test_context_budget.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Line length: 88 characters (ruff).

Files:

  • src/synthorg/engine/plan_execute_loop.py
  • tests/unit/engine/test_context_budget.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/token_estimation.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions: < 50 lines, files < 800 lines.
Errors: handle explicitly, never silently swallow.
Validate: at system boundaries (user input, external APIs, config files).

Files:

  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/token_estimation.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__). Never use import logging / logging.getLogger() / print() in application code.
Logging variable name: always logger (not _logger, not log).
Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Structured kwargs in logging: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/token_estimation.py
src/synthorg/{providers,engine}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests.

Files:

  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/token_estimation.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow. Coverage: 80% minimum. Async: asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed. Timeout: 30 seconds per test. Parallelism: always include -n auto when running pytest, never run tests sequentially.
Parametrize: Prefer @pytest.mark.parametrize for testing similar cases.
Property-based testing: Python uses Hypothesis (@given + @settings). Run dev profile with 1000 examples: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Flaky tests: NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

  • tests/unit/engine/test_context_budget.py
🧠 Learnings (4)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-17T11:37:41.549Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:37:41.549Z
Learning: Applies to src/**/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing + validating redundant fields. Use `NotBlankStr` for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/engine/context.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/engine/context.py
🧬 Code graph analysis (5)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
  • invoke_compaction (654-689)
src/synthorg/engine/react_loop.py (1)
  • compaction_callback (109-111)
src/synthorg/engine/compaction/summarizer.py (3)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
src/synthorg/engine/token_estimation.py (6)
  • DefaultTokenEstimator (48-100)
  • PromptTokenEstimator (15-45)
  • estimate_tokens (22-31)
  • estimate_tokens (58-67)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-100)
src/synthorg/engine/context.py (3)
  • AgentContext (104-412)
  • context_fill_percent (180-187)
  • with_compression (308-336)
src/synthorg/engine/context.py (1)
src/synthorg/engine/compaction/models.py (1)
  • CompressionMetadata (42-73)
src/synthorg/engine/checkpoint/resume.py (2)
src/synthorg/engine/plan_execute_loop.py (1)
  • compaction_callback (136-138)
src/synthorg/engine/react_loop.py (1)
  • compaction_callback (109-111)
src/synthorg/engine/context_budget.py (2)
src/synthorg/engine/token_estimation.py (4)
  • DefaultTokenEstimator (48-100)
  • PromptTokenEstimator (15-45)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-100)
src/synthorg/engine/context.py (2)
  • AgentContext (104-412)
  • with_context_fill (289-306)
🔇 Additional comments (11)
src/synthorg/engine/context.py (1)

289-336: Validation for fill_tokens has been added — past concern addressed.

Both with_context_fill() and with_compression() now validate that fill_tokens >= 0 and raise ValueError if negative. The CompressionMetadata model (in compaction/models.py) uses Pydantic Field(ge=0) constraints on compression_point, archived_turns, and summary_tokens, so invalid metadata is rejected at construction time.

src/synthorg/engine/compaction/summarizer.py (2)

191-196: Misleading event name for fallback summary path.

When no assistant content is found, the code logs CONTEXT_BUDGET_COMPACTION_COMPLETED with reason="no_assistant_content_for_summary", but compaction hasn't completed yet at this point — this is called from within _build_summary() before ctx.with_compression(). Consider using a distinct event like CONTEXT_BUDGET_SUMMARY_FALLBACK or moving this log after the actual compaction completes.

This was flagged in a previous review. The event name changed from COMPACTION_SKIPPED to COMPACTION_COMPLETED, but the fundamental issue remains — logging a "completed" event during summary construction is misleading.


101-118: System message detection and preservation logic looks correct.

The code now properly checks conversation[0].role == MessageRole.SYSTEM before treating it as a system message to preserve, addressing the previous review concern. The slicing logic for archivable and recent is correct.

src/synthorg/engine/context_budget.py (2)

148-157: Input validation for non-negative token counts has been added.

estimate_context_fill() now validates that system_prompt_tokens and tool_definitions_count are non-negative, raising ValueError with descriptive messages. This addresses the previous review concern about invalid inputs.


187-196: Now logs the newly computed fill percentage.

The function correctly computes new_pct from the fresh fill value and logs it, rather than logging the stale ctx.context_fill_percent. This addresses the previous review concern.

tests/unit/engine/test_context_budget.py (1)

9-18: Past review concerns addressed: constant imported and test relocated.

The _TOOL_DEFINITION_TOKEN_OVERHEAD constant is now imported (line 10) and used in the assertion (line 118), avoiding hardcoded magic numbers. The test_capacity_tokens_zero_rejected test has been moved into TestContextBudgetIndicator (lines 72-74).

src/synthorg/engine/plan_execute_loop.py (1)

687-706: Compaction callback integration follows established patterns.

Compaction is invoked at both step-completion turns and general turn boundaries using the shared invoke_compaction() helper from loop_helpers.py. Error isolation is handled by the helper (logs but doesn't propagate non-fatal exceptions). This mirrors the ReactLoop implementation and addresses the previous review comment about extracting shared logic.

src/synthorg/engine/checkpoint/resume.py (3)

72-84: Compression-aware reconciliation message is well-constructed.

When resuming from a checkpoint with compression metadata, the reconciliation message now informs the agent about archived turns. The conditional construction with an empty string fallback keeps the message clean when no compaction occurred.


124-138: Compaction callback correctly propagated to new loop instances.

Both ReactLoop and PlanExecuteLoop instantiations now include compaction_callback=loop.compaction_callback, ensuring the compaction behavior is preserved when injecting checkpoint callbacks.


172-173: PEP 758 except syntax used for critical exceptions.

The except MemoryError, RecursionError: syntax follows PEP 758 (Python 3.14) as specified in coding guidelines. These critical errors are correctly re-raised while other exceptions are logged as warnings.

Also applies to: 189-190

src/synthorg/engine/token_estimation.py (1)

86-89: No issues found. The message role constraints (ChatMessage validation in src/synthorg/providers/models.py) guarantee that tool_result is exclusive to TOOL messages and cannot coexist with meaningful msg.content. The logic at lines 87–88 correctly prioritizes tool_result.content for TOOL messages and falls back to msg.content for other message types, preventing any token underestimation.

- Add context_fill_tokens, context_capacity_tokens, compression_metadata
  fields and context_fill_percent computed property to AgentContext
- Add with_context_fill() and with_compression() evolution methods
- Extend PromptTokenEstimator protocol with estimate_conversation_tokens()
- Create ContextBudgetIndicator model with format() for soft budget display
- Add _SECTION_CONTEXT_BUDGET to system prompt template and builder
- Create compaction subpackage: CompactionCallback type alias,
  CompactionConfig, CompactionResult, CompressionMetadata models
- Implement OldestTurnsSummarizer via make_compaction_callback() factory
- Wire compaction_callback into ReactLoop and PlanExecuteLoop constructors
  with turn-boundary invocation and error isolation
- Update checkpoint/resume.py: pass through compaction_callback in
  make_loop_with_callback(), compression-aware reconciliation messages
- Add observability events for context budget operations
- 35 new tests: unit, property-based (Hypothesis), compaction models
  and summarizer

Closes #416
- Add gt=0 constraint to ContextBudgetIndicator.capacity_tokens (ZeroDivisionError)
- Document post-compaction fill estimate approximation in summarizer
- Fix stale fill_percent in update_context_fill log entry
- Rename min_turns_to_compact to min_messages_to_compact (semantic accuracy)
- Extract _invoke_compaction to shared invoke_compaction() in loop_helpers
- Extract PromptTokenEstimator/DefaultTokenEstimator to token_estimation.py
- Remove unused CompactionResult model and CONTEXT_BUDGET_THRESHOLD_REACHED event
- Auto-derive archived_blocks from compression_metadata in make_context_indicator
- Add MemoryError/RecursionError guard to cleanup_checkpoint_artifacts
- Log fallback path in _build_summary for non-assistant messages
- Update CLAUDE.md: package structure + 6 context_budget event constants
- Add context budget management section to docs/design/engine.md
- Update docs/roadmap/open-questions.md (partially resolved)
- Add 8 missing tests: AgentContext evolution methods, fill percent,
  from_identity capacity, snapshot fields, compression indicator
…mini

- Validate fill_tokens >= 0 in with_context_fill/with_compression
- Include tool_calls in token estimation for ASSISTANT messages
- Fix archived_turns to count turns (msg_count // 2) not messages
- Guard system message role in compaction (don't assume index 0)
- Simplify reconciliation content (reduce duplication per Gemini)
- Fix COMPACTION_SKIPPED event used on non-skipped path
- Validate non-negative system_prompt_tokens/tool_definitions_count
- Fix compaction skipped on step-completion turns in PlanExecuteLoop
- Move test_capacity_tokens_zero_rejected to correct test class
- Import _TOOL_DEFINITION_TOKEN_OVERHEAD constant in tests
@Aureliolo Aureliolo force-pushed the feat/context-budget-management branch from 86511ea to eb6cda4 Compare March 17, 2026 12:43
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 17, 2026 12:44 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 190-196: Replace the use of CONTEXT_BUDGET_COMPACTION_COMPLETED in
the fallback branch where snippets is empty with a distinct event constant (e.g.
CONTEXT_BUDGET_COMPACTION_FALLBACK) so the logger.debug call (currently passing
reason="no_assistant_content_for_summary" and archived_count=len(messages))
emits a dedicated observable event; add the new constant (or import it if
defined elsewhere) and update the logger.debug invocation in the snippets-empty
branch to use CONTEXT_BUDGET_COMPACTION_FALLBACK while keeping the same
structured fields and the existing return value.

In `@src/synthorg/engine/prompt_template.py`:
- Around line 186-191: The DEFAULT_TEMPLATE/system prompt rendering was modified
(the context_budget block added) but PROMPT_TEMPLATE_VERSION was not updated;
update the PROMPT_TEMPLATE_VERSION constant in prompt_template.py to a new
semantically-incremented version string (e.g., "1.4.1" or following your
project's versioning policy) so that any metadata/caching/snapshot flows keyed
on PROMPT_TEMPLATE_VERSION reflect the template change; ensure the version
string at the top (PROMPT_TEMPLATE_VERSION) is incremented and run any tests
that validate template-version consistency.

In `@tests/unit/engine/compaction/test_summarizer.py`:
- Around line 39-41: Add a module-level pytestmark to enforce the 30s timeout by
declaring pytestmark = [pytest.mark.timeout(30)] after the imports (and before
helper functions), and remove the `@pytest.mark.unit` decorator from the
TestMakeCompactionCallback class so the class remains a unit test but uses the
module-level timeout; ensure the symbol TestMakeCompactionCallback keeps its
docstring and test methods unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1b3a1877-df4a-462e-acb4-04dc7b885e50

📥 Commits

Reviewing files that changed from the base of the PR and between 86511ea and eb6cda4.

📒 Files selected for processing (25)
  • CLAUDE.md
  • docs/design/engine.md
  • docs/roadmap/open-questions.md
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/compaction/protocol.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/context.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/prompt_template.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/token_estimation.py
  • src/synthorg/observability/events/context_budget.py
  • tests/unit/engine/compaction/__init__.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/compaction/test_summarizer.py
  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/test_prompt.py
  • tests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff

Files:

  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/test_prompt.py
  • src/synthorg/engine/prompt_template.py
  • tests/unit/engine/compaction/test_models.py
  • src/synthorg/engine/compaction/protocol.py
  • src/synthorg/engine/compaction/models.py
  • tests/unit/engine/test_context_budget_properties.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/token_estimation.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/context_budget.py
  • tests/unit/engine/compaction/test_summarizer.py
  • src/synthorg/engine/__init__.py
  • tests/unit/observability/test_events.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/context.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain 80% minimum code coverage — enforced in CI
Test timeout: 30 seconds per test
Prefer @pytest.mark.parametrize for testing similar cases
Tests must use test-provider, test-small-001, and similar generic vendor-agnostic names instead of real vendor names
Use Python Hypothesis for property-based testing: @given + @settings with HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic.

Files:

  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/test_prompt.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/test_context_budget_properties.py
  • tests/unit/engine/compaction/test_summarizer.py
  • tests/unit/observability/test_events.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections use copy.deepcopy() at construction and MappingProxyType wrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state
Use @computed_field for derived values instead of storing and validating redundant fields in Pydantic models
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger: from synthorg.observability import get_logger then assign logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code — only use the centralized logger from synthorg.observability
Logger variable must always be named logger (not _logger, not log)
Event names must always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: always logger.info(EVENT, key=value) — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...

Files:

  • src/synthorg/engine/prompt_template.py
  • src/synthorg/engine/compaction/protocol.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/token_estimation.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/context.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown

Files:

  • src/synthorg/engine/prompt_template.py
  • src/synthorg/engine/compaction/protocol.py
  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/token_estimation.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/context_budget.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/compaction/__init__.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/prompt.py
  • src/synthorg/engine/context.py
src/synthorg/engine/checkpoint/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Checkpoint recovery strategy requires per-turn persistence and heartbeat detection with CheckpointRecoveryStrategy config

Files:

  • src/synthorg/engine/checkpoint/resume.py
🧠 Learnings (31)
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/budget/**/*.py : Cost tracking system must track budget, enforce pre-flight/in-flight checks, auto-downgrade, track billing periods, cost tiers, quotas, and detect anomalies

Applied to files:

  • tests/unit/engine/test_context_budget.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test

Applied to files:

  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic.

Applied to files:

  • tests/unit/engine/test_context_budget.py
  • tests/unit/engine/compaction/test_models.py
  • tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to tests/**/*.py : Mark tests with `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow`

Applied to files:

  • tests/unit/engine/compaction/test_models.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with `model_copy(update=...)` for runtime state

Applied to files:

  • src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Pydantic models must use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`)

Applied to files:

  • src/synthorg/engine/compaction/models.py
  • src/synthorg/engine/context.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Use `computed_field` for derived values instead of storing and validating redundant fields in Pydantic models

Applied to files:

  • src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to tests/**/*.py : Use Python Hypothesis for property-based testing: `given` + `settings` with HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)

Applied to files:

  • tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/loop_helpers.py
  • tests/unit/observability/test_events.py
  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Event names must always use constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/loop_helpers.py
  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/engine/**/*.py : Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown

Applied to files:

  • src/synthorg/engine/__init__.py
  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/tools/**/*.py : Tools must support role-based access control and require approval via `request_human_approval` tool

Applied to files:

  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/engine/checkpoint/**/*.py : Checkpoint recovery strategy requires per-turn persistence and heartbeat detection with CheckpointRecoveryStrategy config

Applied to files:

  • src/synthorg/engine/checkpoint/resume.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine includes hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration sampling)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Engine coordination supports 4 dispatchers (SAS/centralized/decentralized/context-dependent) with wave execution and workspace lifecycle integration

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/communication/meeting/**/*.py : Meeting protocol uses round-robin, position papers, structured phases with scheduler (frequency, participant resolver) and orchestrator

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST import logger: `from synthorg.observability import get_logger` then assign `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Observability system requires structured logging, correlation tracking, and log sinks — all business logic must use `get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Never use `import logging` / `logging.getLogger()` / `print()` in application code — only use the centralized logger from `synthorg.observability`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Logger variable must always be named `logger` (not `_logger`, not `log`)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Use structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

  • src/synthorg/engine/context.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/engine/context.py
🧬 Code graph analysis (15)
tests/unit/engine/test_context_budget.py (4)
src/synthorg/engine/compaction/models.py (1)
  • CompressionMetadata (42-73)
src/synthorg/engine/context.py (6)
  • AgentContext (104-412)
  • from_identity (190-225)
  • context_fill_percent (180-187)
  • with_context_fill (289-306)
  • with_compression (308-336)
  • to_snapshot (383-407)
src/synthorg/engine/context_budget.py (6)
  • ContextBudgetIndicator (32-84)
  • estimate_context_fill (121-157)
  • make_context_indicator (87-118)
  • update_context_fill (160-196)
  • format (66-84)
  • fill_percent (60-64)
src/synthorg/engine/token_estimation.py (3)
  • DefaultTokenEstimator (48-101)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-101)
tests/unit/engine/test_prompt.py (1)
src/synthorg/engine/token_estimation.py (1)
  • DefaultTokenEstimator (48-101)
tests/unit/engine/compaction/test_models.py (1)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
src/synthorg/engine/compaction/protocol.py (1)
src/synthorg/engine/context.py (1)
  • AgentContext (104-412)
tests/unit/engine/test_context_budget_properties.py (2)
src/synthorg/engine/context_budget.py (3)
  • ContextBudgetIndicator (32-84)
  • format (66-84)
  • fill_percent (60-64)
src/synthorg/engine/token_estimation.py (3)
  • DefaultTokenEstimator (48-101)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-101)
src/synthorg/engine/compaction/summarizer.py (3)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
src/synthorg/engine/token_estimation.py (6)
  • DefaultTokenEstimator (48-101)
  • PromptTokenEstimator (15-45)
  • estimate_tokens (22-31)
  • estimate_tokens (58-67)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-101)
src/synthorg/engine/context.py (3)
  • AgentContext (104-412)
  • context_fill_percent (180-187)
  • with_compression (308-336)
src/synthorg/engine/loop_helpers.py (2)
src/synthorg/engine/plan_execute_loop.py (1)
  • compaction_callback (136-138)
src/synthorg/engine/react_loop.py (1)
  • compaction_callback (109-111)
src/synthorg/engine/context_budget.py (2)
src/synthorg/engine/token_estimation.py (4)
  • DefaultTokenEstimator (48-101)
  • PromptTokenEstimator (15-45)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-101)
src/synthorg/engine/context.py (2)
  • AgentContext (104-412)
  • with_context_fill (289-306)
tests/unit/engine/compaction/test_summarizer.py (3)
src/synthorg/engine/compaction/models.py (1)
  • CompactionConfig (10-39)
src/synthorg/engine/compaction/summarizer.py (1)
  • make_compaction_callback (37-61)
src/synthorg/engine/context.py (2)
  • AgentContext (104-412)
  • from_identity (190-225)
src/synthorg/engine/__init__.py (1)
src/synthorg/engine/token_estimation.py (2)
  • DefaultTokenEstimator (48-101)
  • PromptTokenEstimator (15-45)
src/synthorg/engine/react_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
  • invoke_compaction (654-689)
src/synthorg/engine/plan_execute_loop.py (3)
  • compaction_callback (136-138)
  • approval_gate (126-128)
  • stagnation_detector (131-133)
src/synthorg/engine/compaction/__init__.py (1)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
  • invoke_compaction (654-689)
src/synthorg/engine/react_loop.py (1)
  • compaction_callback (109-111)
src/synthorg/engine/checkpoint/resume.py (2)
src/synthorg/engine/plan_execute_loop.py (5)
  • compaction_callback (136-138)
  • PlanExecuteLoop (84-930)
  • config (121-123)
  • approval_gate (126-128)
  • stagnation_detector (131-133)
src/synthorg/engine/react_loop.py (3)
  • compaction_callback (109-111)
  • approval_gate (99-101)
  • stagnation_detector (104-106)
src/synthorg/engine/context.py (1)
src/synthorg/engine/compaction/models.py (1)
  • CompressionMetadata (42-73)
🪛 LanguageTool
CLAUDE.md

[style] ~199-~199: Since ownership is already implied, this phrasing may be redundant.
Context: ...vents.context_budget). Each domain has its own module — see src/synthorg/observabilit...

(PRP_OWN)

🔇 Additional comments (41)
docs/roadmap/open-questions.md (1)

12-12: Clear and accurate partial-resolution update.

The wording at Line 12 and Line 24 cleanly reflects what is implemented now vs what is still pending, which keeps roadmap risk tracking actionable.

Also applies to: 24-24

src/synthorg/engine/compaction/protocol.py (1)

14-24: Compaction callback contract is well defined.

The alias and attached docstring clearly define the async callback behavior and return semantics for loop integration.

src/synthorg/engine/compaction/models.py (1)

10-39: Model design and validation look solid.

Both models are frozen, typed, and constrained appropriately for config/metadata boundaries.

As per coding guidelines, "Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state" and "Pydantic models must use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict)."

Also applies to: 42-73

src/synthorg/engine/compaction/__init__.py (1)

8-18: Good explicit compaction package API surface.

The re-exports and __all__ keep the public interface discoverable and stable.

CLAUDE.md (1)

125-125: Documentation updates are useful and aligned with the new feature set.

The engine scope summary and observability event guidance now reflect context-budget compaction work clearly.

Also applies to: 199-199

src/synthorg/observability/events/context_budget.py (1)

6-15: Event constants are consistent and well-scoped.

The new context-budget lifecycle constants follow the established domain event naming pattern.

Based on learnings, "Event names must always use constants from domain-specific modules under synthorg.observability.events ... Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT."

tests/unit/observability/test_events.py (1)

229-229: Good test expectation update for event-module discovery.

Adding context_budget in the expected set keeps discovery assertions aligned with the new observability domain.

tests/unit/engine/compaction/test_models.py (1)

12-74: LGTM — tests correctly validate model constraints and immutability.

The test coverage for CompactionConfig and CompressionMetadata is comprehensive, covering defaults, boundary conditions, and frozen behavior. The assertions align with the model definitions in src/synthorg/engine/compaction/models.py.

src/synthorg/engine/__init__.py (1)

186-189: LGTM — import source relocation maintains public API compatibility.

The DefaultTokenEstimator and PromptTokenEstimator imports are correctly relocated from synthorg.engine.prompt to synthorg.engine.token_estimation. The __all__ exports remain unchanged, preserving backward compatibility for consumers importing from synthorg.engine.

tests/unit/engine/test_prompt.py (2)

31-31: LGTM — import updated to match new module location.

The import correctly references synthorg.engine.token_estimation where DefaultTokenEstimator now lives.


475-479: LGTM — test double updated to satisfy extended protocol.

The CountingEstimator now implements estimate_conversation_tokens to satisfy the PromptTokenEstimator protocol. Returning 0 is appropriate since these tests focus on estimate_tokens call counting.

src/synthorg/engine/checkpoint/resume.py (3)

72-84: LGTM — compression-aware reconciliation message.

The reconciliation content now informs the agent about prior compaction when resuming from a compressed checkpoint. The conditional note construction is clean and provides useful context for the agent to understand the conversation state.


129-129: LGTM — compaction_callback correctly propagated through loop reconstruction.

Both ReactLoop and PlanExecuteLoop now receive the compaction_callback from the original loop instance when rebuilding with a checkpoint callback. This maintains compaction behavior across checkpoint/resume cycles.

Also applies to: 137-137


172-173: LGTM — critical exceptions re-raised in cleanup paths.

MemoryError and RecursionError are correctly re-raised before the generic exception handler, following the established pattern in react_loop.py and loop_helpers.py. This ensures non-recoverable errors propagate rather than being silently logged.

Also applies to: 189-190

tests/unit/engine/compaction/test_summarizer.py (1)

43-232: LGTM — comprehensive compaction callback tests.

The test suite thoroughly covers the compaction callback behavior:

  • Threshold-based triggering (below/above)
  • Edge cases (too few messages, nothing to archive, unknown capacity)
  • Multiple compactions with counter increment
  • Proper verification of system message preservation and summary injection
src/synthorg/engine/react_loop.py (3)

80-96: LGTM — compaction_callback follows established hook patterns.

The compaction_callback parameter is integrated consistently with existing hooks (checkpoint_callback, stagnation_detector, approval_gate), including proper docstring documentation and private storage.


108-111: LGTM — property accessor matches other hook accessors.

The compaction_callback property follows the same pattern as approval_gate and stagnation_detector properties.


206-213: LGTM — compaction invoked at turn boundaries with error isolation.

Compaction is correctly invoked after stagnation detection at turn boundaries, using the shared invoke_compaction() helper which handles error isolation. The context is updated only when compaction returns a non-None result.

docs/design/engine.md (1)

587-663: LGTM — comprehensive design documentation for context budget management.

The new section thoroughly documents:

  • Context fill tracking fields and estimation
  • Soft budget indicator format and injection
  • Compaction hook configuration and behavior
  • Compressed checkpoint recovery messaging
  • Loop integration points

The documentation aligns well with the implementation across context_budget.py, compaction/ modules, and execution loops.

tests/unit/engine/test_context_budget.py (1)

23-304: LGTM — comprehensive context budget test coverage.

The test suite thoroughly covers:

  • ContextBudgetIndicator formatting, computed properties, and validation
  • estimate_context_fill with various input combinations
  • make_context_indicator factory behavior
  • update_context_fill helper
  • DefaultTokenEstimator.estimate_conversation_tokens edge cases
  • AgentContext budget field integration including compression metadata

The import of _TOOL_DEFINITION_TOKEN_OVERHEAD at line 10 and its use at line 118 addresses the previous review concern about hardcoded values. The test_capacity_tokens_zero_rejected test is correctly placed in TestContextBudgetIndicator.

tests/unit/engine/test_context_budget_properties.py (3)

13-14: Add pytest.mark.timeout(30) for test timeout compliance.

Per coding guidelines, tests should have a 30-second timeout. This was flagged in a previous review but appears unaddressed.

💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
-@pytest.mark.unit
 class TestContextBudgetIndicatorProperties:

17-66: LGTM!

Well-designed property-based tests covering edge cases for ContextBudgetIndicator. The Hypothesis strategies correctly match the model's field constraints.


69-88: LGTM!

The property test correctly verifies non-negative token estimation. Consider adding a follow-up test covering ASSISTANT messages with tool_calls to exercise that estimation path.

src/synthorg/engine/loop_helpers.py (2)

16-18: LGTM!

Event constant properly imported from the domain-specific observability module.


654-689: LGTM!

The invoke_compaction helper follows the established patterns in this module (matching check_shutdown, check_budget, etc.) with proper error isolation, logging, and PEP 758 exception syntax.

src/synthorg/engine/plan_execute_loop.py (3)

48-48: LGTM!

Correctly imports the shared invoke_compaction helper, addressing the previous review comment about extracting duplicated compaction logic.


100-118: LGTM!

The compaction_callback parameter and property follow the established patterns for other callbacks in this class (checkpoint_callback, stagnation_detector).


686-706: Verify compaction isn't invoked twice for the same context state.

When result is a tuple (step completion), compaction runs at lines 689-695, then returns immediately at line 696. When result is AgentContext (tool calls completed, loop continues), compaction runs at lines 700-706. This looks correct — only one path is taken per loop iteration.

src/synthorg/engine/token_estimation.py (2)

14-45: LGTM!

Well-designed protocol with runtime_checkable decorator enabling isinstance checks. Clear docstrings following Google style.


48-101: LGTM!

The implementation correctly handles all token-bearing fields including tool call IDs (addressing past review feedback). The per-message overhead applied to both messages and tool calls accounts for structural tokens.

src/synthorg/engine/compaction/summarizer.py (2)

37-61: LGTM!

Well-structured factory function that creates a compaction callback closure. The default to DefaultTokenEstimator() is appropriate.


137-143: archived_turn_count assumes perfectly paired user+assistant messages.

len(archivable) // 2 underestimates archived turns when the conversation contains multiple tool result messages or sequential assistant messages. This was noted in a prior review.

♻️ Count actual assistant messages for accuracy
-    archived_turn_count = len(archivable) // 2
+    archived_turn_count = sum(
+        1 for m in archivable if m.role == MessageRole.ASSISTANT
+    )
src/synthorg/engine/context_budget.py (3)

32-84: LGTM!

Well-designed frozen Pydantic model with proper field constraints (ge=0, gt=0), computed property, and human-readable formatting. The format() method handles both known and unknown capacity cases.


121-157: LGTM!

Input validation at lines 148-153 addresses the previous review feedback. The function correctly validates system boundaries per coding guidelines.


160-196: LGTM!

The function now correctly logs the newly computed fill percentage (lines 187-194), addressing the previous review feedback.

src/synthorg/engine/context.py (3)

85-93: LGTM!

New snapshot fields properly typed with defaults and constraints matching the source AgentContext fields.


161-187: LGTM!

The new context budget fields are well-designed with appropriate constraints. The computed_field for context_fill_percent correctly handles the None capacity case, and division by zero is prevented by the gt=0 constraint on context_capacity_tokens.


308-336: LGTM!

The with_compression method properly validates fill_tokens. The CompressionMetadata model already enforces ge=0 constraints on compression_point and archived_turns, so additional validation here would be redundant.

src/synthorg/engine/prompt.py (3)

36-39: LGTM!

Token estimation imports correctly moved from local definitions to the dedicated token_estimation module.


138-139: LGTM!

The context_budget_indicator parameter is properly threaded through the rendering pipeline, and the new _SECTION_CONTEXT_BUDGET is correctly added to sections when the indicator is provided.

Also applies to: 484-486


389-409: LGTM!

The context_budget is correctly added to the template context, making it available for rendering in both default and custom templates.

Comment on lines +186 to +191
{% if context_budget %}

## Context Budget

{{ context_budget }}
{% endif %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Bump prompt template version when changing DEFAULT_TEMPLATE.

Line 186-191 changes the rendered system prompt shape, but Line 20 still reports PROMPT_TEMPLATE_VERSION = "1.4.0". This can cause version/behavior drift in any metadata, caching, or snapshot flows keyed by template version.

Suggested fix
-PROMPT_TEMPLATE_VERSION: Final[str] = "1.4.0"
+PROMPT_TEMPLATE_VERSION: Final[str] = "1.5.0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/prompt_template.py` around lines 186 - 191, The
DEFAULT_TEMPLATE/system prompt rendering was modified (the context_budget block
added) but PROMPT_TEMPLATE_VERSION was not updated; update the
PROMPT_TEMPLATE_VERSION constant in prompt_template.py to a new
semantically-incremented version string (e.g., "1.4.1" or following your
project's versioning policy) so that any metadata/caching/snapshot flows keyed
on PROMPT_TEMPLATE_VERSION reflect the template change; ensure the version
string at the top (PROMPT_TEMPLATE_VERSION) is incremented and run any tests
that validate template-version consistency.

Comment on lines +39 to +41
@pytest.mark.unit
class TestMakeCompactionCallback:
"""make_compaction_callback factory and compaction logic."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add pytest.mark.timeout(30) for test timeout compliance.

Per coding guidelines, tests should have a 30-second timeout. Consider adding a module-level pytestmark declaration.

💡 Suggested fix

Add at the module level (after imports, before helper functions):

+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
 def _msg(role: MessageRole, content: str) -> ChatMessage:

Then remove the @pytest.mark.unit decorator from the class.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/compaction/test_summarizer.py` around lines 39 - 41, Add a
module-level pytestmark to enforce the 30s timeout by declaring pytestmark =
[pytest.mark.timeout(30)] after the imports (and before helper functions), and
remove the `@pytest.mark.unit` decorator from the TestMakeCompactionCallback class
so the class remains a unit test but uses the module-level timeout; ensure the
symbol TestMakeCompactionCallback keeps its docstring and test methods
unchanged.

- Add distinct CONTEXT_BUDGET_COMPACTION_FALLBACK event for no-assistant
  summary fallback path (per CodeRabbit)
- Reset PROMPT_TEMPLATE_VERSION to "1.0.0" — no users, no migrations,
  no caching depends on it yet. Will bump on first production deploy.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 65-169: The function _do_compaction is too long; extract the
skip-check and metadata construction into helpers to reduce its size: implement
_should_skip_compaction(ctx, config) that returns a skip-reason string or None
(handling fill_pct threshold and min_messages_to_compact checks and logging
decisions), and _build_compression_metadata(ctx, archived_turn_count,
summary_tokens) that returns a CompressionMetadata object (accumulating
prior.compactions_performed and prior.archived_turns). Then refactor
_do_compaction to call _should_skip_compaction and _build_compression_metadata,
keep summary creation via _build_summary and token estimation with estimator
unchanged, and ensure _do_compaction only coordinates these steps so its body
falls under the 50-line limit.

In `@tests/unit/engine/test_prompt.py`:
- Around line 475-480: The test double method estimate_conversation_tokens
currently accepts messages: object which is too permissive; change its parameter
type to match the production estimator signature (e.g., use the same concrete
type/import used in production such as List[Message], Sequence[PromptMessage],
or whatever type the real estimator uses) and update any necessary test imports
so the test will fail if the estimator interface drifts; keep the method name
estimate_conversation_tokens and only change the parameter type annotation to
the exact production type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b59709a4-64a8-4645-af5f-b34aa4b5483e

📥 Commits

Reviewing files that changed from the base of the PR and between eb6cda4 and 440e66a.

📒 Files selected for processing (5)
  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/prompt_template.py
  • src/synthorg/observability/events/context_budget.py
  • tests/unit/engine/test_prompt.py
  • tests/unit/memory/org/test_prompt_integration.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff

Files:

  • src/synthorg/engine/compaction/summarizer.py
  • tests/unit/memory/org/test_prompt_integration.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/prompt_template.py
  • tests/unit/engine/test_prompt.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections use copy.deepcopy() at construction and MappingProxyType wrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state
Use @computed_field for derived values instead of storing and validating redundant fields in Pydantic models
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger: from synthorg.observability import get_logger then assign logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code — only use the centralized logger from synthorg.observability
Logger variable must always be named logger (not _logger, not log)
Event names must always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: always logger.info(EVENT, key=value) — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...

Files:

  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/observability/events/context_budget.py
  • src/synthorg/engine/prompt_template.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown

Files:

  • src/synthorg/engine/compaction/summarizer.py
  • src/synthorg/engine/prompt_template.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain 80% minimum code coverage — enforced in CI
Test timeout: 30 seconds per test
Prefer @pytest.mark.parametrize for testing similar cases
Tests must use test-provider, test-small-001, and similar generic vendor-agnostic names instead of real vendor names
Use Python Hypothesis for property-based testing: @given + @settings with HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic.

Files:

  • tests/unit/memory/org/test_prompt_integration.py
  • tests/unit/engine/test_prompt.py
🧠 Learnings (4)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/context_budget.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/budget/**/*.py : Cost tracking system must track budget, enforce pre-flight/in-flight checks, auto-downgrade, track billing periods, cost tiers, quotas, and detect anomalies

Applied to files:

  • src/synthorg/observability/events/context_budget.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Event names must always use constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/context_budget.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/security/**/*.py : Progressive trust strategy supports 4 modes (disabled/weighted/per-category/milestone) with autonomy level presets and change strategy

Applied to files:

  • src/synthorg/engine/prompt_template.py
🧬 Code graph analysis (3)
src/synthorg/engine/compaction/summarizer.py (3)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
src/synthorg/engine/token_estimation.py (3)
  • DefaultTokenEstimator (48-101)
  • estimate_tokens (22-31)
  • estimate_tokens (58-67)
src/synthorg/engine/context.py (3)
  • AgentContext (104-412)
  • context_fill_percent (180-187)
  • with_compression (308-336)
tests/unit/memory/org/test_prompt_integration.py (1)
tests/unit/engine/test_prompt.py (1)
  • test_template_version_frozen (547-549)
tests/unit/engine/test_prompt.py (2)
src/synthorg/engine/token_estimation.py (1)
  • DefaultTokenEstimator (48-101)
tests/unit/memory/org/test_prompt_integration.py (1)
  • test_template_version_frozen (51-52)
🔇 Additional comments (11)
src/synthorg/engine/prompt_template.py (2)

20-23: Version freeze rationale is clear and consistent.

The inline note and constant update make the versioning intent explicit and maintainable.


189-194: Conditional Context Budget rendering looks correct.

Good use of a guard to include the section only when context budget data is present.

tests/unit/memory/org/test_prompt_integration.py (1)

51-52: Version-freeze assertion update is aligned with production code.

This keeps integration coverage in sync with PROMPT_TEMPLATE_VERSION.

tests/unit/engine/test_prompt.py (2)

31-31: Import relocation is correctly reflected in tests.

Good alignment with the estimator module split.


547-549: Prompt version test rename and expectation update are good.

This makes the test intent explicit and synchronized with the current versioning policy.

src/synthorg/observability/events/context_budget.py (1)

1-18: Well-structured event constants module.

The constants follow the established domain.category.action naming pattern and correctly use Final[str] typing. The logical grouping (fill tracking, compaction lifecycle, indicator injection) aids discoverability.

src/synthorg/engine/compaction/summarizer.py (5)

1-32: Imports and logger setup are well-organized.

Proper use of TYPE_CHECKING block for type-only imports, correct logger instantiation via get_logger(__name__), and event constants imported from the domain-specific module.


38-62: Clean factory pattern implementation.

The factory correctly creates a closure capturing config and estimator, with sensible defaulting to DefaultTokenEstimator. The async wrapper appropriately matches the CompactionCallback protocol.


138-144: archived_turn_count still assumes perfectly paired messages.

This was flagged in a previous review: the len(archivable) // 2 calculation underestimates turn count when tool result messages or sequential assistant messages exist.

♻️ Previously suggested fix
-    archived_turn_count = len(archivable) // 2
+    archived_turn_count = sum(
+        1 for m in archivable if m.role == MessageRole.ASSISTANT
+    )

172-203: Summary building logic is well-implemented.

Good use of the dedicated CONTEXT_BUDGET_COMPACTION_FALLBACK event for the no-assistant-content edge case. The character capping and snippet extraction are sensible defaults.


151-153: No changes needed — estimate_conversation_tokens is properly defined and tested.

The method exists on the PromptTokenEstimator protocol (line 33 of token_estimation.py) and is fully implemented in DefaultTokenEstimator (line 69). It is already in use elsewhere in the codebase and has comprehensive test coverage.

			> Likely an incorrect or invalid review comment.

Comment on lines +65 to +169
def _do_compaction(
ctx: AgentContext,
config: CompactionConfig,
estimator: PromptTokenEstimator,
) -> AgentContext | None:
"""Core compaction logic.

Args:
ctx: Current agent context.
config: Compaction configuration.
estimator: Token estimator.

Returns:
New compacted ``AgentContext`` or ``None`` if no compaction needed.
"""
fill_pct = ctx.context_fill_percent
if fill_pct is None or fill_pct < config.fill_threshold_percent:
return None

conversation = ctx.conversation
if len(conversation) < config.min_messages_to_compact:
logger.debug(
CONTEXT_BUDGET_COMPACTION_SKIPPED,
execution_id=ctx.execution_id,
reason="too_few_messages",
message_count=len(conversation),
min_required=config.min_messages_to_compact,
)
return None

logger.info(
CONTEXT_BUDGET_COMPACTION_STARTED,
execution_id=ctx.execution_id,
fill_percent=fill_pct,
message_count=len(conversation),
)

# Preserve a leading SYSTEM message only when present.
# preserve_recent_turns * 2 for user+assistant pairs.
preserve_count = config.preserve_recent_turns * 2
head: tuple[ChatMessage, ...] = ()
start_idx = 0
if conversation and conversation[0].role == MessageRole.SYSTEM:
head = (conversation[0],)
start_idx = 1

if preserve_count >= len(conversation) - start_idx:
logger.debug(
CONTEXT_BUDGET_COMPACTION_SKIPPED,
execution_id=ctx.execution_id,
reason="nothing_to_archive",
preserve_count=preserve_count,
message_count=len(conversation),
)
return None

archivable = conversation[start_idx:-preserve_count]
recent = conversation[-preserve_count:]

summary_text = _build_summary(archivable)
summary_msg = ChatMessage(
role=MessageRole.SYSTEM,
content=summary_text,
)
summary_tokens = estimator.estimate_tokens(summary_text)

compressed_conversation = (*head, summary_msg, *recent)

# Build compression metadata.
prior = ctx.compression_metadata
compactions_count = prior.compactions_performed + 1 if prior is not None else 1
prior_archived = prior.archived_turns if prior is not None else 0

archived_turn_count = len(archivable) // 2
metadata = CompressionMetadata(
compression_point=ctx.turn_count,
archived_turns=prior_archived + archived_turn_count,
summary_tokens=summary_tokens,
compactions_performed=compactions_count,
)

# Re-estimate fill with compressed conversation. This counts
# conversation tokens only — system prompt and tool overhead are
# excluded because the compaction callback does not have access to
# those values. The execution loop's next call to
# ``update_context_fill`` will restore the full estimate.
new_fill = estimator.estimate_conversation_tokens(
compressed_conversation,
)

logger.info(
CONTEXT_BUDGET_COMPACTION_COMPLETED,
execution_id=ctx.execution_id,
original_messages=len(conversation),
compacted_messages=len(compressed_conversation),
archived_turns=archived_turn_count,
summary_tokens=summary_tokens,
compactions_total=compactions_count,
)

return ctx.with_compression(
metadata,
compressed_conversation,
new_fill,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Function exceeds 50-line limit.

_do_compaction spans ~90 executable lines, exceeding the project's 50-line function limit. Consider extracting helpers for distinct responsibilities.

♻️ Suggested extraction points
def _should_skip_compaction(
    ctx: AgentContext,
    config: CompactionConfig,
) -> str | None:
    """Return skip reason or None if compaction should proceed."""
    fill_pct = ctx.context_fill_percent
    if fill_pct is None or fill_pct < config.fill_threshold_percent:
        return "below_threshold"
    if len(ctx.conversation) < config.min_messages_to_compact:
        return "too_few_messages"
    return None


def _build_compression_metadata(
    ctx: AgentContext,
    archived_turn_count: int,
    summary_tokens: int,
) -> CompressionMetadata:
    """Construct compression metadata accumulating prior compactions."""
    prior = ctx.compression_metadata
    return CompressionMetadata(
        compression_point=ctx.turn_count,
        archived_turns=(prior.archived_turns if prior else 0) + archived_turn_count,
        summary_tokens=summary_tokens,
        compactions_performed=(prior.compactions_performed + 1) if prior else 1,
    )

Then _do_compaction becomes a coordinating function calling these helpers.

As per coding guidelines: "Functions must be less than 50 lines".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/compaction/summarizer.py` around lines 65 - 169, The
function _do_compaction is too long; extract the skip-check and metadata
construction into helpers to reduce its size: implement
_should_skip_compaction(ctx, config) that returns a skip-reason string or None
(handling fill_pct threshold and min_messages_to_compact checks and logging
decisions), and _build_compression_metadata(ctx, archived_turn_count,
summary_tokens) that returns a CompressionMetadata object (accumulating
prior.compactions_performed and prior.archived_turns). Then refactor
_do_compaction to call _should_skip_compaction and _build_compression_metadata,
keep summary creation via _build_summary and token estimation with estimator
unchanged, and ensure _do_compaction only coordinates these steps so its body
falls under the 50-line limit.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/synthorg/engine/compaction/summarizer.py (1)

194-199: 🧹 Nitpick | 🔵 Trivial

archived_turns calculation assumes paired messages.

Line 196 uses len(archivable) // 2 which assumes archivable messages are perfectly paired user+assistant turns. If the conversation contains tool result messages, multiple assistant messages in sequence, or other patterns, this underestimates the actual archived turn count.

Consider counting actual assistant messages for a more accurate turn count:

♻️ Proposed fix
+    archived_turn_count = sum(
+        1 for m in archivable if m.role == MessageRole.ASSISTANT
+    )
+
     metadata = CompressionMetadata(
         compression_point=ctx.turn_count,
-        archived_turns=prior_archived + len(archivable) // 2,
+        archived_turns=prior_archived + archived_turn_count,
         summary_tokens=summary_tokens,
         compactions_performed=compactions_count,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/compaction/summarizer.py` around lines 194 - 199, The
archived_turns field incorrectly assumes paired user+assistant messages by using
len(archivable) // 2; instead compute the archived turn count by counting actual
assistant turns in the archivable list to handle tool results or consecutive
assistant messages. Update the code that constructs CompressionMetadata (where
CompressionMetadata is instantiated and variables ctx.turn_count,
prior_archived, archivable are in scope) to replace len(archivable) // 2 with a
computed count of assistant messages (e.g., sum over archivable checking the
message role or is_assistant property) and add that count to prior_archived when
setting archived_turns.
tests/unit/engine/test_prompt.py (1)

475-480: ⚠️ Potential issue | 🟡 Minor

Use the concrete conversation message type in the test double signature.

Line 477 uses tuple[object, ...], which is too permissive and can hide interface drift versus the production estimator contract (tuple[ChatMessage, ...]).

#!/bin/bash
set -euo pipefail

echo "Production signature:"
rg -nP --type=py -C2 'def estimate_conversation_tokens\(|messages:\s*tuple\[ChatMessage,\s*\.\.\.\]' src/synthorg/engine/token_estimation.py

echo
echo "Test double signature:"
rg -nP --type=py -C3 'class CountingEstimator|def estimate_conversation_tokens|messages:\s*tuple\[.*\]' tests/unit/engine/test_prompt.py

Expected result: production uses tuple[ChatMessage, ...]; test double should match it exactly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_prompt.py` around lines 475 - 480, The test double's
method signature for estimate_conversation_tokens uses the permissive
tuple[object, ...]; update the CountingEstimator (or whichever test stub defines
estimate_conversation_tokens) to use the concrete conversation message type
tuple[ChatMessage, ...] to mirror the production contract in token_estimation.py
so the test will catch interface drift; import or reference the same ChatMessage
type used by production and adjust the method signature accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 194-199: The archived_turns field incorrectly assumes paired
user+assistant messages by using len(archivable) // 2; instead compute the
archived turn count by counting actual assistant turns in the archivable list to
handle tool results or consecutive assistant messages. Update the code that
constructs CompressionMetadata (where CompressionMetadata is instantiated and
variables ctx.turn_count, prior_archived, archivable are in scope) to replace
len(archivable) // 2 with a computed count of assistant messages (e.g., sum over
archivable checking the message role or is_assistant property) and add that
count to prior_archived when setting archived_turns.

In `@tests/unit/engine/test_prompt.py`:
- Around line 475-480: The test double's method signature for
estimate_conversation_tokens uses the permissive tuple[object, ...]; update the
CountingEstimator (or whichever test stub defines estimate_conversation_tokens)
to use the concrete conversation message type tuple[ChatMessage, ...] to mirror
the production contract in token_estimation.py so the test will catch interface
drift; import or reference the same ChatMessage type used by production and
adjust the method signature accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 313fd83e-8897-4072-93da-b95836dd112b

📥 Commits

Reviewing files that changed from the base of the PR and between 440e66a and 9a285fa.

📒 Files selected for processing (2)
  • src/synthorg/engine/compaction/summarizer.py
  • tests/unit/engine/test_prompt.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff

Files:

  • src/synthorg/engine/compaction/summarizer.py
  • tests/unit/engine/test_prompt.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections use copy.deepcopy() at construction and MappingProxyType wrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state
Use @computed_field for derived values instead of storing and validating redundant fields in Pydantic models
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger: from synthorg.observability import get_logger then assign logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code — only use the centralized logger from synthorg.observability
Logger variable must always be named logger (not _logger, not log)
Event names must always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: always logger.info(EVENT, key=value) — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...

Files:

  • src/synthorg/engine/compaction/summarizer.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown

Files:

  • src/synthorg/engine/compaction/summarizer.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain 80% minimum code coverage — enforced in CI
Test timeout: 30 seconds per test
Prefer @pytest.mark.parametrize for testing similar cases
Tests must use test-provider, test-small-001, and similar generic vendor-agnostic names instead of real vendor names
Use Python Hypothesis for property-based testing: @given + @settings with HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic.

Files:

  • tests/unit/engine/test_prompt.py
🧠 Learnings (2)
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Functions must be less than 50 lines, files must be less than 800 lines

Applied to files:

  • src/synthorg/engine/compaction/summarizer.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Functions: < 50 lines, files < 800 lines.

Applied to files:

  • src/synthorg/engine/compaction/summarizer.py
🧬 Code graph analysis (1)
tests/unit/engine/test_prompt.py (2)
src/synthorg/engine/token_estimation.py (1)
  • DefaultTokenEstimator (48-101)
tests/unit/memory/org/test_prompt_integration.py (1)
  • test_template_version_frozen (51-52)
🔇 Additional comments (6)
src/synthorg/engine/compaction/summarizer.py (4)

1-36: LGTM!

Clean module setup: proper TYPE_CHECKING guard, centralized logger, and domain-specific event constants.


38-62: LGTM!

Well-designed factory with proper type hints, keyword-only parameters, and sensible default for the estimator. The async wrapper around sync logic is appropriate given the CPU-bound operations.


65-131: LGTM!

Well-structured orchestration function with proper early returns, state transition logging at INFO level, and immutable context updates via with_compression(). The refactoring into _split_conversation and _compress helpers keeps this under the 50-line limit.


203-234: LGTM!

Clean summary construction with appropriate snippet extraction, length capping, and proper use of the distinct CONTEXT_BUDGET_COMPACTION_FALLBACK event for the no-content edge case.

tests/unit/engine/test_prompt.py (2)

31-31: Import path update looks correct.

This aligns the test with the estimator’s new module location.


547-549: Version-freeze test update is clear and consistent.

The renamed test and explicit 1.0.0 assertion are consistent with related prompt-version tests.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 204-235: The fallback log in _build_summary uses
CONTEXT_BUDGET_COMPACTION_FALLBACK without an execution_id, breaking event
correlation; change _build_summary to accept an execution_id parameter (e.g.,
execution_id: str | None), include execution_id in the logger.debug call
alongside reason and archived_count, and ensure the caller (_compress) passes
the current execution_id into _build_summary when it invokes it so all
compaction events include execution_id for observability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7ee13b8-e2c0-4f6b-bbc4-e043f0355a49

📥 Commits

Reviewing files that changed from the base of the PR and between 9a285fa and cb5bc80.

📒 Files selected for processing (2)
  • src/synthorg/engine/compaction/summarizer.py
  • tests/unit/engine/test_prompt.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: CLA Signature Check
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff

Files:

  • src/synthorg/engine/compaction/summarizer.py
  • tests/unit/engine/test_prompt.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections use copy.deepcopy() at construction and MappingProxyType wrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state
Use @computed_field for derived values instead of storing and validating redundant fields in Pydantic models
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger: from synthorg.observability import get_logger then assign logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code — only use the centralized logger from synthorg.observability
Logger variable must always be named logger (not _logger, not log)
Event names must always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: always logger.info(EVENT, key=value) — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...

Files:

  • src/synthorg/engine/compaction/summarizer.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown

Files:

  • src/synthorg/engine/compaction/summarizer.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain 80% minimum code coverage — enforced in CI
Test timeout: 30 seconds per test
Prefer @pytest.mark.parametrize for testing similar cases
Tests must use test-provider, test-small-001, and similar generic vendor-agnostic names instead of real vendor names
Use Python Hypothesis for property-based testing: @given + @settings with HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic.

Files:

  • tests/unit/engine/test_prompt.py
🧠 Learnings (2)
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Functions must be less than 50 lines, files must be less than 800 lines

Applied to files:

  • src/synthorg/engine/compaction/summarizer.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Functions: < 50 lines, files < 800 lines.

Applied to files:

  • src/synthorg/engine/compaction/summarizer.py
🧬 Code graph analysis (1)
tests/unit/engine/test_prompt.py (2)
src/synthorg/engine/token_estimation.py (1)
  • DefaultTokenEstimator (48-101)
tests/unit/memory/org/test_prompt_integration.py (1)
  • test_template_version_frozen (51-52)
🔇 Additional comments (7)
tests/unit/engine/test_prompt.py (3)

31-37: Import alignment looks correct.

The updated DefaultTokenEstimator import path and added ChatMessage import match the new token-estimation surface used by the tests.


476-480: Test double now correctly mirrors estimator contract.

Using messages: tuple[ChatMessage, ...] in estimate_conversation_tokens makes this double interface-accurate and protects against signature drift.


548-550: Template version freeze test is clear and consistent.

The renamed test plus explicit PROMPT_TEMPLATE_VERSION == "1.0.0" assertion clearly enforces the frozen-version expectation.

src/synthorg/engine/compaction/summarizer.py (4)

1-35: LGTM!

Module setup follows all guidelines: proper logger initialization via get_logger(__name__), TYPE_CHECKING guard for forward references, and event constants imported from the observability domain module.


38-62: LGTM!

Clean factory pattern with keyword-only arguments, proper type annotations, and well-documented behavior. The async wrapper appropriately delegates to the synchronous core logic.


65-130: Function still exceeds 50-line guideline.

While the function has been improved by extracting _split_conversation and _compress, it spans 66 lines. Consider extracting the skip-check logic (lines 80-93) into a helper like _should_skip_compaction() -> str | None to bring it under the limit.

As per coding guidelines: "Functions must be less than 50 lines".
[duplicate_comment, suggest_optional_refactor]

♻️ Suggested extraction
def _should_skip_compaction(
    ctx: AgentContext,
    config: CompactionConfig,
) -> str | None:
    """Return skip reason or None if compaction should proceed."""
    fill_pct = ctx.context_fill_percent
    if fill_pct is None or fill_pct < config.fill_threshold_percent:
        return "below_threshold"
    if len(ctx.conversation) < config.min_messages_to_compact:
        return "too_few_messages"
    return None

Then simplify _do_compaction:

 def _do_compaction(...) -> AgentContext | None:
-    fill_pct = ctx.context_fill_percent
-    if fill_pct is None or fill_pct < config.fill_threshold_percent:
-        return None
-
-    conversation = ctx.conversation
-    if len(conversation) < config.min_messages_to_compact:
-        logger.debug(
-            CONTEXT_BUDGET_COMPACTION_SKIPPED,
-            execution_id=ctx.execution_id,
-            reason="too_few_messages",
-            message_count=len(conversation),
-            min_required=config.min_messages_to_compact,
-        )
-        return None
+    skip_reason = _should_skip_compaction(ctx, config)
+    if skip_reason:
+        if skip_reason == "too_few_messages":
+            logger.debug(
+                CONTEXT_BUDGET_COMPACTION_SKIPPED,
+                execution_id=ctx.execution_id,
+                reason=skip_reason,
+                message_count=len(ctx.conversation),
+                min_required=config.min_messages_to_compact,
+            )
+        return None
+
+    conversation = ctx.conversation
+    fill_pct = ctx.context_fill_percent

133-201: LGTM!

Well-structured helper functions with clear responsibilities. The _split_conversation correctly handles the presence or absence of a leading SYSTEM message, and _compress accurately counts archived assistant turns for metadata.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/synthorg/engine/compaction/summarizer.py (1)

65-130: 🛠️ Refactor suggestion | 🟠 Major

Refactor _do_compaction to satisfy the 50-line function limit.

This function currently exceeds the repository maximum and should be split into small helpers (e.g., skip checks, logging+metrics payload build, and finalize/apply compression).

As per coding guidelines: "Functions must be less than 50 lines, files must be less than 800 lines".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/compaction/summarizer.py` around lines 65 - 130,
_refactor _do_compaction to be under 50 lines by extracting its logical sections
into small helper functions: pull the early skip logic (fill threshold and min
message check + the CONTEXT_BUDGET_COMPACTION_SKIPPED logging) into a helper
like _should_skip_compaction(ctx, config) that returns None or proceeds; extract
the start/complete logging and metrics payload construction into helpers (e.g.,
_log_compaction_started(ctx, fill_pct) and _log_compaction_completed(ctx,
conversation, compressed, metadata, summary_tokens)); and extract the
finalization step that re-estimates tokens and returns
ctx.with_compression(metadata, compressed, new_fill) into a helper like
_finalize_compaction(ctx, estimator, compressed, metadata, summary_tokens). Keep
calls to _split_conversation and _compress in the main _do_compaction flow, and
ensure all moved logic still uses the same symbols (_do_compaction,
_split_conversation, _compress, estimator, ctx.with_compression) so behavior is
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 150-167: The logic only preserves the first leading SYSTEM message
(head = (conversation[0],)) causing earlier SYSTEM summaries to be included in
archivable and dropped; change the preservation to capture all leading SYSTEM
messages by advancing start_idx while conversation[start_idx].role ==
MessageRole.SYSTEM and set head = tuple(conversation[:start_idx]); then compute
archivable = conversation[start_idx:-preserve_count] and recent as before so
_build_summary() won’t discard previously created SYSTEM summaries.

---

Duplicate comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 65-130: _refactor _do_compaction to be under 50 lines by
extracting its logical sections into small helper functions: pull the early skip
logic (fill threshold and min message check + the
CONTEXT_BUDGET_COMPACTION_SKIPPED logging) into a helper like
_should_skip_compaction(ctx, config) that returns None or proceeds; extract the
start/complete logging and metrics payload construction into helpers (e.g.,
_log_compaction_started(ctx, fill_pct) and _log_compaction_completed(ctx,
conversation, compressed, metadata, summary_tokens)); and extract the
finalization step that re-estimates tokens and returns
ctx.with_compression(metadata, compressed, new_fill) into a helper like
_finalize_compaction(ctx, estimator, compressed, metadata, summary_tokens). Keep
calls to _split_conversation and _compress in the main _do_compaction flow, and
ensure all moved logic still uses the same symbols (_do_compaction,
_split_conversation, _compress, estimator, ctx.with_compression) so behavior is
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 287c2f19-c42f-493a-b984-b6c5b35fa46c

📥 Commits

Reviewing files that changed from the base of the PR and between cb5bc80 and 8063e25.

📒 Files selected for processing (1)
  • src/synthorg/engine/compaction/summarizer.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff

Files:

  • src/synthorg/engine/compaction/summarizer.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections use copy.deepcopy() at construction and MappingProxyType wrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state
Use @computed_field for derived values instead of storing and validating redundant fields in Pydantic models
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger: from synthorg.observability import get_logger then assign logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code — only use the centralized logger from synthorg.observability
Logger variable must always be named logger (not _logger, not log)
Event names must always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: always logger.info(EVENT, key=value) — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...

Files:

  • src/synthorg/engine/compaction/summarizer.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown

Files:

  • src/synthorg/engine/compaction/summarizer.py
🧠 Learnings (2)
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Functions must be less than 50 lines, files must be less than 800 lines

Applied to files:

  • src/synthorg/engine/compaction/summarizer.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Functions: < 50 lines, files < 800 lines.

Applied to files:

  • src/synthorg/engine/compaction/summarizer.py
🧬 Code graph analysis (1)
src/synthorg/engine/compaction/summarizer.py (4)
src/synthorg/engine/compaction/models.py (2)
  • CompactionConfig (10-39)
  • CompressionMetadata (42-73)
src/synthorg/engine/token_estimation.py (6)
  • DefaultTokenEstimator (48-101)
  • PromptTokenEstimator (15-45)
  • estimate_conversation_tokens (33-45)
  • estimate_conversation_tokens (69-101)
  • estimate_tokens (22-31)
  • estimate_tokens (58-67)
src/synthorg/engine/context.py (3)
  • AgentContext (104-412)
  • context_fill_percent (180-187)
  • with_compression (308-336)
src/synthorg/engine/context_budget.py (1)
  • fill_percent (60-64)

Previously only conversation[0] was preserved, dropping prior
compaction summaries (also SYSTEM role) on subsequent compactions.
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 17, 2026 16:37 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit 181eb8a into main Mar 17, 2026
31 of 33 checks passed
@Aureliolo Aureliolo deleted the feat/context-budget-management branch March 17, 2026 17:23
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 17, 2026 17:23 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Mar 17, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.1](v0.3.0...v0.3.1)
(2026-03-17)


### Features

* **api:** RFC 9457 Phase 2 — ProblemDetail and content negotiation
([#496](#496))
([30f7c49](30f7c49))
* **cli:** verify container image signatures and SLSA provenance on pull
([#492](#492))
([bef272d](bef272d)),
closes [#491](#491)
* **engine:** implement context budget management in execution loops
([#520](#520))
([181eb8a](181eb8a)),
closes [#416](#416)
* implement settings persistence layer (DB-backed config)
([#495](#495))
([4bd99f7](4bd99f7)),
closes [#450](#450)
* **memory:** implement dual-mode archival in memory consolidation
([#524](#524))
([4603c9e](4603c9e)),
closes [#418](#418)
* migrate config consumers to read through SettingsService
([#510](#510))
([32f553d](32f553d))
* **settings:** implement settings change subscriptions for service
hot-reload ([#526](#526))
([53f908e](53f908e)),
closes [#503](#503)
* **settings:** register API config in SettingsService with 2-phase init
([#518](#518))
([29f7481](29f7481))
* **tools:** add SSRF prevention for git clone URLs
([#505](#505))
([492dd0d](492dd0d))
* **tools:** wire RootConfig.git_clone to GitCloneTool instantiation
([#519](#519))
([b7d8172](b7d8172))


### Bug Fixes

* **api:** replace JWT query parameter with one-time ticket for
WebSocket auth
([#493](#493))
([22a25f6](22a25f6)),
closes [#343](#343)


### Documentation

* add uv cache lock contention handling to worktree skill
([#500](#500))
([bd85a8d](bd85a8d))
* document RFC 9457 dual response formats in OpenAPI schema
([#506](#506))
([8dd2524](8dd2524))


### Maintenance

* upgrade jsdom from 28 to 29
([#499](#499))
([1ea2249](1ea2249))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement context budget management in execution loops

1 participant