feat(engine): implement context budget management in execution loops#520
feat(engine): implement context budget management in execution loops#520
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🧰 Additional context used📓 Path-based instructions (3)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/engine/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (2)📚 Learning: 2026-03-17T11:39:10.447ZApplied to files:
📚 Learning: 2026-03-15T11:48:14.867ZApplied to files:
🧬 Code graph analysis (1)src/synthorg/engine/compaction/summarizer.py (3)
🔇 Additional comments (6)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds pluggable conversation compaction and context-budget tracking: new compaction package (models, protocol, summarizer), token-estimation utilities, context budget indicator/estimation/update helpers, wiring of compaction into ReactLoop/PlanExecuteLoop and checkpoint resume, observability events, tests, and docs updates. Changes
Sequence Diagram(s)mermaid mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
📝 Coding Plan
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive system for managing the context budget within the engine's execution loops. It addresses the challenge of large language model context window limitations by implementing token usage tracking, dynamic conversation compaction, and clear visibility into the current context fill level. This ensures that agents can operate more efficiently within their token limits, preventing overflows and maintaining conversational coherence over longer interactions through intelligent summarization of older turns. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #520 +/- ##
==========================================
- Coverage 93.16% 93.12% -0.04%
==========================================
Files 506 513 +7
Lines 24490 24644 +154
Branches 2333 2344 +11
==========================================
+ Hits 22815 22950 +135
- Misses 1328 1342 +14
- Partials 347 352 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This is a great feature that adds much-needed context budget management to the engine. The implementation is thorough, with new models, protocols, and extensive tests. The use of a pluggable callback for compaction is a clean design. I have a few suggestions to improve maintainability and correctness.
| compression = checkpoint_ctx.compression_metadata | ||
| if compression is not None: | ||
| reconciliation_content = ( | ||
| f"Execution resumed from checkpoint at turn " | ||
| f"{checkpoint_ctx.turn_count}. " | ||
| f"Note: conversation was previously compacted " | ||
| f"(archived {compression.archived_turns} turns). " | ||
| f"Previous error: {error_message}. " | ||
| "Review progress and continue." | ||
| ) | ||
| else: | ||
| reconciliation_content = ( | ||
| f"Execution resumed from checkpoint at turn " | ||
| f"{checkpoint_ctx.turn_count}. Previous error: " | ||
| f"{error_message}. " | ||
| "Review progress and continue." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
There's some duplicated code for constructing the reconciliation_content. The base message is the same in both branches of the if/else statement. This can be simplified to reduce duplication and improve maintainability.
compression = checkpoint_ctx.compression_metadata
compaction_note = (
f"Note: conversation was previously compacted "
f"(archived {compression.archived_turns} turns). "
if compression is not None
else ""
)
reconciliation_content = (
f"Execution resumed from checkpoint at turn "
f"{checkpoint_ctx.turn_count}. {compaction_note}"
f"Previous error: {error_message}. "
"Review progress and continue."
)| metadata = CompressionMetadata( | ||
| compression_point=ctx.turn_count, | ||
| archived_turns=prior_archived + len(archivable), | ||
| summary_tokens=summary_tokens, | ||
| compactions_performed=compactions_count, | ||
| ) |
There was a problem hiding this comment.
There's an inconsistency in the unit for archived_turns. The field name and its usage in other parts of the system (like the checkpoint reconciliation message) imply the unit is "turns" (i.e., user/assistant pairs). However, you're assigning len(archivable), which is the number of messages. This will lead to incorrect reporting (e.g., "archived 6 turns" instead of "archived 3 turns").
Assuming a turn is a pair of messages, you should divide the message count by two. This logic should be applied consistently where archived_turns is calculated and logged. You'll also need to update the archived_turns value in the log event on line 146, and consider updating the summary string in _build_summary (lines 178 and 184) to report the number of turns instead of messages for consistency.
| metadata = CompressionMetadata( | |
| compression_point=ctx.turn_count, | |
| archived_turns=prior_archived + len(archivable), | |
| summary_tokens=summary_tokens, | |
| compactions_performed=compactions_count, | |
| ) | |
| archived_turn_count = len(archivable) // 2 | |
| metadata = CompressionMetadata( | |
| compression_point=ctx.turn_count, | |
| archived_turns=prior_archived + archived_turn_count, | |
| summary_tokens=summary_tokens, | |
| compactions_performed=compactions_count, | |
| ) |
| async def _invoke_compaction( | ||
| self, | ||
| ctx: AgentContext, | ||
| turn_number: int, | ||
| ) -> AgentContext | None: | ||
| """Invoke compaction callback if configured. | ||
|
|
||
| Errors are logged but never propagated — compaction must | ||
| not interrupt execution. | ||
|
|
||
| Args: | ||
| ctx: Current agent context. | ||
| turn_number: Current turn number for logging. | ||
|
|
||
| Returns: | ||
| Compacted context or ``None``. | ||
| """ | ||
| try: | ||
| return await self._compaction_callback(ctx) # type: ignore[misc] | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception as exc: | ||
| logger.exception( | ||
| CONTEXT_BUDGET_COMPACTION_FAILED, | ||
| execution_id=ctx.execution_id, | ||
| turn=turn_number, | ||
| error=f"{type(exc).__name__}: {exc}", | ||
| ) | ||
| return None |
There was a problem hiding this comment.
This _invoke_compaction method is identical to the one in ReactLoop. To avoid code duplication and improve maintainability, this logic should be extracted into a shared helper function. A good place for it would be in src/synthorg/engine/loop_helpers.py, alongside other shared loop utilities like check_stagnation and execute_tool_calls. You could then call this helper from both loops.
src/synthorg/engine/react_loop.py
Outdated
| async def _invoke_compaction( | ||
| self, | ||
| ctx: AgentContext, | ||
| turn_number: int, | ||
| ) -> AgentContext | None: | ||
| """Invoke compaction callback if configured. | ||
|
|
||
| Errors are logged but never propagated — compaction must | ||
| not interrupt execution. | ||
|
|
||
| Args: | ||
| ctx: Current agent context. | ||
| turn_number: Current turn number for logging. | ||
|
|
||
| Returns: | ||
| Compacted context or ``None``. | ||
| """ | ||
| try: | ||
| return await self._compaction_callback(ctx) # type: ignore[misc] | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception as exc: | ||
| logger.exception( | ||
| CONTEXT_BUDGET_COMPACTION_FAILED, | ||
| execution_id=ctx.execution_id, | ||
| turn=turn_number, | ||
| error=f"{type(exc).__name__}: {exc}", | ||
| ) | ||
| return None |
There was a problem hiding this comment.
This _invoke_compaction method is identical to the one in PlanExecuteLoop. To avoid code duplication and improve maintainability, this logic should be extracted into a shared helper function. A good place for it would be in src/synthorg/engine/loop_helpers.py, alongside other shared loop utilities like check_stagnation and execute_tool_calls. You could then call this helper from both loops.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/context_budget.py`:
- Around line 141-175: The logger in update_context_fill logs
ctx.context_fill_percent which reflects the old state; instead compute or obtain
the new percentage after computing fill and log that instead. Specifically,
after calculating fill (the local variable) either compute new_fill_percent =
fill / ctx.context_capacity_tokens (or call ctx.with_context_fill(fill) and read
its context_fill_percent) and use that value in the logger.debug call (replace
ctx.context_fill_percent with the new_fill_percent), keeping the same message
keys (execution_id, fill_tokens, capacity_tokens, fill_percent) so logs reflect
the updated context fill.
In `@src/synthorg/engine/context.py`:
- Around line 289-324: Reject impossible values before creating the new
AgentContext: in with_context_fill(validate that fill_tokens is >= 0, log an
error and raise ValueError if not) and in with_compression(validate that
fill_tokens is >= 0, metadata.archived_turns is >= 0, and
metadata.compression_point is not greater than the total turns implied by
archived_turns + len(compressed_conversation); if any check fails log a clear
error mentioning compression_metadata/archived_turns/compression_point and raise
ValueError), otherwise proceed to return the model_copy updating conversation,
compression_metadata, and context_fill_tokens as before.
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 841-871: Extract the duplicated compaction logic into a shared
helper function (e.g., invoke_compaction) and have both _invoke_compaction
methods delegate to it: create loop_helpers.invoke_compaction(callback, ctx,
turn_number) that awaits callback(ctx), re-raises MemoryError and
RecursionError, logs other exceptions with the same logger.exception
call/arguments (use CONTEXT_BUDGET_COMPACTION_FAILED,
execution_id=ctx.execution_id, turn=turn_number, error=f"{type(exc).__name__}:
{exc}") and returns None on error; then replace
SynthOrgEngine._invoke_compaction and ReactLoop._invoke_compaction bodies to
call that helper with the existing compaction callback (preserving signature and
behavior).
In `@src/synthorg/engine/prompt.py`:
- Around line 155-176: The estimator in estimate_conversation_tokens currently
omits msg.tool_calls, undercounting tokens for ASSISTANT messages with tool
invocations; update estimate_conversation_tokens to, for each message that has a
non-empty msg.tool_calls tuple, iterate its ToolCall entries and add an
estimated token count for each (include the ToolCall.id, ToolCall.name and a
serialized form of ToolCall.arguments), e.g. serialize arguments (json.dumps or
similar), measure lengths and convert to token estimates (length // 4) and add
those to total (you may also include per-call overhead similar to
self._PER_MESSAGE_OVERHEAD); use the existing ChatMessage, ToolCall attributes
(msg.tool_calls, msg.tool_result, msg.content) to locate where to add this
logic.
In `@tests/unit/engine/compaction/test_models.py`:
- Around line 13-14: The test module currently uses per-class decorators like
`@pytest.mark.unit` (e.g., TestCompactionConfig); add a module-level pytestmark =
[pytest.mark.unit, pytest.mark.timeout(30)] to enforce the 30s timeout per
guidelines and then remove individual `@pytest.mark.unit` decorators from test
classes such as TestCompactionConfig so the module-level marks apply
consistently.
In `@tests/unit/engine/test_context_budget_properties.py`:
- Around line 13-14: Add a 30-second timeout decorator to the test class by
applying pytest.mark.timeout(30) to TestContextBudgetIndicatorProperties; ensure
pytest is imported and annotate the class with `@pytest.mark.timeout`(30) so all
tests in the class inherit the timeout requirement.
In `@tests/unit/engine/test_context_budget.py`:
- Around line 105-112: Update the test_with_tools test to import the
_TOOL_DEFINITION_TOKEN_OVERHEAD constant from synthorg.engine.context_budget and
use it in the assertion instead of the hardcoded 50; locate the test function
test_with_tools and the call to estimate_context_fill, add the import for
_TOOL_DEFINITION_TOKEN_OVERHEAD, and change the assertion to assert result ==
100 + 3 * _TOOL_DEFINITION_TOKEN_OVERHEAD so the test stays correct if the
constant changes.
- Around line 20-21: Add a 30-second pytest timeout to the test module by
declaring a module-level pytest mark (e.g., pytestmark =
pytest.mark.timeout(30)) so all tests in this file, including the
TestContextBudgetIndicator class, inherit the timeout; place the pytestmark
declaration near the top of the file alongside existing pytest imports to
satisfy the coding guideline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ebca776b-3cfb-4d4b-b02c-7a84fe639e8c
📒 Files selected for processing (19)
src/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/compaction/protocol.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/prompt_template.pysrc/synthorg/engine/react_loop.pysrc/synthorg/observability/events/context_budget.pytests/unit/engine/compaction/__init__.pytests/unit/engine/compaction/test_models.pytests/unit/engine/compaction/test_summarizer.pytests/unit/engine/test_context_budget.pytests/unit/engine/test_context_budget_properties.pytests/unit/engine/test_prompt.pytests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:syntax (no parentheses) — PEP 758 except syntax enforced by ruff on Python 3.14.
Add type hints to all public functions; enforce mypy strict mode compliance.
Use Google-style docstrings on all public classes and functions; enforce via ruff D rules.
Keep functions under 50 lines and files under 800 lines.
Handle errors explicitly and never silently swallow exceptions.
Validate at system boundaries (user input, external APIs, config files).
Set line length to 88 characters (ruff enforces).
Useuvfor Python dependency management. Install withuv sync, run tests with-n autofor parallelism, lint withruff check, format withruff format, type-check withmypy.
Scripts inscripts/have relaxed ruff rules:
Files:
tests/unit/engine/compaction/test_models.pysrc/synthorg/engine/prompt_template.pytests/unit/observability/test_events.pytests/unit/engine/compaction/test_summarizer.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/compaction/protocol.pytests/unit/engine/test_prompt.pysrc/synthorg/engine/context.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/compaction/__init__.pytests/unit/engine/test_context_budget_properties.pytests/unit/engine/test_context_budget.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow. Maintain 80% minimum code coverage.
Useasyncio_mode = 'auto'for async tests — no manual@pytest.mark.asyncioneeded. Set test timeout to 30 seconds per test. Always include-n autofor parallelism via pytest-xdist.
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned test code, docstrings, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,test-provider,test-small-001, etc. Vendor names may only appear in Operations design page, .claude/ files, or third-party import paths.
Use Hypothesis for property-based testing with@given+@settings. Profiles:ci(200 examples, default) anddev(1000 examples) viaHYPOTHESIS_PROFILEenv var. Never skip, dismiss, or ignore flaky tests — fix them fully. Mocktime.monotonic()andasyncio.sleep()for deterministic timing-sensitive tests instead of widening margins.
Files:
tests/unit/engine/compaction/test_models.pytests/unit/observability/test_events.pytests/unit/engine/compaction/test_summarizer.pytests/unit/engine/test_prompt.pytests/unit/engine/test_context_budget_properties.pytests/unit/engine/test_context_budget.py
**/*.{md,py,ts,tsx,vue,go,yml,yaml,json,toml}
📄 CodeRabbit inference engine (CLAUDE.md)
Use conventional commits:
<type>: <description>where types are feat, fix, refactor, docs, test, chore, perf, ci. Commits must be GPG/SSH signed on main branch.
Files:
tests/unit/engine/compaction/test_models.pysrc/synthorg/engine/prompt_template.pytests/unit/observability/test_events.pytests/unit/engine/compaction/test_summarizer.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/compaction/protocol.pytests/unit/engine/test_prompt.pysrc/synthorg/engine/context.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/compaction/__init__.pytests/unit/engine/test_context_budget_properties.pytests/unit/engine/test_context_budget.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Create new objects instead of mutating existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping. Fordict/listfields in frozen Pydantic models, rely onfrozen=Trueandcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (withmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Adopt Pydantic v2 conventions: use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfromcore.typesfor all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Every module with business logic must have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging,logging.getLogger(), orprint()in application code.
Always useloggeras the variable name (not_logger, notlog). Use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,PROVIDER_CALL_STARTfromevents.provider). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Always use structured kwargs for logging:logger.info(EVENT, key=value)— never use string interpolation likelogger.info('msg %s', val).
Log all error paths at WARNING or ERROR with context before raising. Log all state transitions at INFO. Log object creation, internal flow, and entry/exit of key functions at DEBUG.
Do NOT useimport loggingorlogging.getLogger(). Pure data models, ...
Files:
src/synthorg/engine/prompt_template.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/compaction/protocol.pysrc/synthorg/engine/context.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/compaction/__init__.py
src/synthorg/engine/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.
Files:
src/synthorg/engine/prompt_template.pysrc/synthorg/engine/compaction/protocol.pysrc/synthorg/engine/context.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/compaction/__init__.py
🧠 Learnings (14)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Always use `logger` as the variable name (not `_logger`, not `log`). Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/context_budget.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/budget/**/*.py : Cost tracking with pre-flight/in-flight budget checks, auto-downgrade, billing periods, cost tiers, quota/subscription tracking. CFO cost optimization: anomaly detection, efficiency analysis, downgrade recommendations, approval decisions. Budget errors: `BudgetExhaustedError`, `DailyLimitExceededError`, `QuotaExhaustedError`.
Applied to files:
src/synthorg/observability/events/context_budget.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/plan_execute_loop.pytests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/engine/context.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising. Log all state transitions at INFO. Log object creation, internal flow, and entry/exit of key functions at DEBUG.
Applied to files:
src/synthorg/engine/react_loop.pysrc/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/engine/react_loop.pysrc/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/tools/**/*.py : Tool registry with built-in tools: file_system/, git (SSRF prevention via git_url_validator), sandbox/, code_runner. MCP bridge (mcp/). Role-based access. Approval tool (request_human_approval).
Applied to files:
src/synthorg/engine/react_loop.pysrc/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/engine/**/*.py : Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.
Applied to files:
src/synthorg/engine/react_loop.pysrc/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/security/**/*.py : SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).
Applied to files:
src/synthorg/engine/react_loop.pysrc/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Create new objects instead of mutating existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` wrapping. For `dict`/`list` fields in frozen Pydantic models, rely on `frozen=True` and `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Applied to files:
src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Applied to files:
tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings`. Profiles: `ci` (200 examples, default) and `dev` (1000 examples) via `HYPOTHESIS_PROFILE` env var. Never skip, dismiss, or ignore flaky tests — fix them fully. Mock `time.monotonic()` and `asyncio.sleep()` for deterministic timing-sensitive tests instead of widening margins.
Applied to files:
tests/unit/engine/test_context_budget_properties.py
🧬 Code graph analysis (11)
tests/unit/engine/compaction/test_models.py (1)
src/synthorg/engine/compaction/models.py (3)
CompactionConfig(10-39)CompactionResult(42-69)CompressionMetadata(72-103)
tests/unit/engine/compaction/test_summarizer.py (3)
src/synthorg/engine/compaction/models.py (1)
CompactionConfig(10-39)src/synthorg/engine/compaction/summarizer.py (1)
make_compaction_callback(34-58)src/synthorg/engine/context.py (2)
AgentContext(104-400)from_identity(190-225)
src/synthorg/engine/compaction/protocol.py (1)
src/synthorg/engine/context.py (1)
AgentContext(104-400)
src/synthorg/engine/context.py (1)
src/synthorg/engine/compaction/models.py (1)
CompressionMetadata(72-103)
src/synthorg/engine/react_loop.py (1)
src/synthorg/engine/plan_execute_loop.py (2)
compaction_callback(138-140)_invoke_compaction(843-871)
src/synthorg/engine/checkpoint/resume.py (2)
src/synthorg/engine/plan_execute_loop.py (4)
compaction_callback(138-140)config(123-125)approval_gate(128-130)stagnation_detector(133-135)src/synthorg/engine/react_loop.py (3)
compaction_callback(111-113)approval_gate(101-103)stagnation_detector(106-108)
src/synthorg/engine/compaction/summarizer.py (5)
tests/unit/engine/conftest.py (1)
engine(449-460)src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(72-103)src/synthorg/engine/prompt.py (6)
DefaultTokenEstimator(134-176)PromptTokenEstimator(101-131)estimate_tokens(108-117)estimate_tokens(144-153)estimate_conversation_tokens(119-131)estimate_conversation_tokens(155-176)src/synthorg/engine/context.py (3)
AgentContext(104-400)context_fill_percent(180-187)with_compression(302-324)src/synthorg/engine/context_budget.py (1)
fill_percent(56-60)
src/synthorg/engine/plan_execute_loop.py (3)
src/synthorg/engine/react_loop.py (3)
compaction_callback(111-113)approval_gate(101-103)stagnation_detector(106-108)src/synthorg/engine/stagnation/detector.py (1)
config(46-48)src/synthorg/engine/context.py (1)
AgentContext(104-400)
src/synthorg/engine/compaction/__init__.py (1)
src/synthorg/engine/compaction/models.py (3)
CompactionConfig(10-39)CompactionResult(42-69)CompressionMetadata(72-103)
tests/unit/engine/test_context_budget_properties.py (2)
src/synthorg/engine/context_budget.py (3)
ContextBudgetIndicator(29-80)format(62-80)fill_percent(56-60)src/synthorg/engine/prompt.py (3)
DefaultTokenEstimator(134-176)estimate_conversation_tokens(119-131)estimate_conversation_tokens(155-176)
tests/unit/engine/test_context_budget.py (3)
tests/unit/engine/conftest.py (2)
engine(449-460)sample_agent_context(183-191)src/synthorg/engine/context_budget.py (5)
ContextBudgetIndicator(29-80)estimate_context_fill(112-138)make_context_indicator(83-109)update_context_fill(141-175)fill_percent(56-60)src/synthorg/engine/prompt.py (3)
DefaultTokenEstimator(134-176)estimate_conversation_tokens(119-131)estimate_conversation_tokens(155-176)
🔇 Additional comments (32)
tests/unit/observability/test_events.py (1)
229-229: LGTM!The addition of
"context_budget"to the expected domain modules set correctly aligns with the newsrc/synthorg/observability/events/context_budget.pymodule introduced in this PR for context budget observability events.tests/unit/engine/compaction/test_models.py (1)
17-102: LGTM!Comprehensive test coverage for compaction models:
CompactionConfig: Validates defaults, threshold bounds (gt=0.0, le=100.0), minimum turns (ge=2), preserve recent (ge=1), and frozen immutability.CompactionResult: Validates construction and frozen behavior.CompressionMetadata: Validates defaultcompactions_performed=1, custom values, and frozen behavior.Tests align well with the model field constraints defined in
src/synthorg/engine/compaction/models.py.tests/unit/engine/test_prompt.py (1)
475-479: LGTM!The
estimate_conversation_tokensstub correctly satisfies the expandedPromptTokenEstimatorprotocol interface. Returning0is appropriate for this test double since the test (test_custom_estimator_used) only validates thatestimate_tokensis called during prompt construction.src/synthorg/engine/prompt_template.py (1)
186-191: LGTM!The Context Budget section follows the established template pattern for optional sections (conditional rendering with
{% if context_budget %}). Placing it at the end of the template is appropriate for auxiliary runtime information that shouldn't interrupt core agent instructions.tests/unit/engine/test_context_budget_properties.py (1)
17-88: LGTM!Well-designed property-based tests using Hypothesis:
- Strategies appropriately model the domain (non-negative fill tokens, optional capacity with minimum 1).
@settings(max_examples=200)aligns with theciprofile per coding guidelines.- Tests cover key invariants: formatting never crashes, fill percent is non-negative when capacity exists, and token estimation is non-negative.
Based on learnings: "Use Hypothesis for property-based testing with
given+settings. Profiles:ci(200 examples, default)..."src/synthorg/engine/compaction/protocol.py (1)
1-24: LGTM!Clean type alias definition following the established
CheckpointCallbackpattern. The signature correctly models the compaction callback semantics:
- Input: current
AgentContext- Output: new
AgentContextwith compressed conversation (compaction ran) orNone(no compaction needed)The
TYPE_CHECKINGguard appropriately avoids runtime circular imports.src/synthorg/engine/checkpoint/resume.py (2)
72-93: LGTM!The compression-aware reconciliation message enhancement correctly informs the agent about prior compaction when resuming from a checkpoint. The conditional logic appropriately:
- Includes archived turn count when
compression_metadataexists- Falls back to the simpler message format when no compression occurred
This aligns with the PR objective for "compressed checkpoint recovery."
128-142: LGTM!The
compaction_callbackis correctly propagated when creating new loop instances during checkpoint recovery. This ensures that:
ReactLoop(line 133) andPlanExecuteLoop(line 141) preserve compaction behavior- The pattern matches existing callback propagation for
approval_gateandstagnation_detectortests/unit/engine/test_context_budget.py (1)
24-200: LGTM!Comprehensive test coverage for context budget functionality:
- ContextBudgetIndicator: Validates formatting with known/unknown capacity, fill percent calculation edge cases (including zero fill), and frozen immutability.
- estimate_context_fill: Correctly tests empty conversation, messages with overhead, and tool definitions.
- make_context_indicator: Tests factory with both capacity scenarios.
- update_context_fill: Validates fill token updates.
- estimate_conversation_tokens: Edge cases including empty tuple, single message arithmetic, and empty content overhead.
Good use of
pytest.approxfor floating-point comparisons.src/synthorg/engine/compaction/__init__.py (1)
1-20: LGTM!Clean package initialization with well-structured public API exports. The docstring clearly describes the compaction hook's purpose, and the
__all__correctly exposes the four public symbols from the subpackage.tests/unit/engine/compaction/test_summarizer.py (2)
1-36: LGTM!Well-structured test helpers.
_msgprovides a clean factory for chat messages, and_build_contextproperly usesfrom_identityfollowed bymodel_copy(update=...)to set up test contexts with specific fill levels and conversation states.
39-232: LGTM!Comprehensive test coverage for the compaction callback factory. Tests exercise:
- Threshold logic (below/above)
- Edge cases (too few messages, nothing to archive, unknown capacity)
- Multi-step compaction with metadata increment verification
The assertions correctly verify system message preservation, summary content, and compression metadata state.
src/synthorg/engine/compaction/models.py (1)
1-103: LGTM!Well-designed frozen Pydantic models following project conventions:
- All models use
ConfigDict(frozen=True)for immutability- Field constraints are appropriate (e.g.,
gt=0.0, le=100.0for percentages,ge=2for minimum turns)- Google-style docstrings with
Attributessections- Sensible defaults aligned with conservative compaction behavior
Based on learnings: "Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state."
src/synthorg/engine/react_loop.py (4)
13-15: LGTM!Clean integration of compaction callback into ReactLoop:
- Event constant imported from the correct domain-specific module
- Docstring properly documents the new parameter
- Constructor stores the callback following the same pattern as other optional callbacks
Also applies to: 53-53, 82-98
110-113: LGTM!Property accessor follows the established pattern for
approval_gateandstagnation_detector.
208-215: LGTM!Compaction is correctly invoked at turn boundaries after stagnation detection. The placement ensures the turn is fully processed before attempting compression, and the context is updated only if compaction succeeds.
311-339: LGTM!Error isolation pattern correctly mirrors the checkpoint callback handling:
MemoryErrorandRecursionErrorare re-raised (non-recoverable)- Other exceptions are logged with structured kwargs and swallowed to avoid interrupting execution
- Uses
logger.exceptionfor full traceback captureThe
# type: ignore[misc]comment is appropriate since the None-check happens at the call site (line 209).src/synthorg/engine/compaction/summarizer.py (5)
1-31: LGTM!Clean module structure with proper observability setup. The
_MAX_SUMMARY_CHARSconstant appropriately limits summary length to prevent unbounded growth.
34-58: LGTM!Factory follows a clean closure pattern, capturing config and estimator. The inner
_compactfunction is markedasyncto satisfy theCompactionCallbackprotocol even though_do_compactionis synchronous—this allows future async operations if needed.
61-109: LGTM!Guard conditions are well-ordered:
- Check fill percentage exists and exceeds threshold
- Check minimum message count
- Check that there are messages to archive after preservation
Logging at appropriate levels (DEBUG for skips, INFO for start).
111-155: LGTM!Core compression logic is sound:
- System message preserved at index 0
- Archivable messages extracted correctly
- Summary injected as a SYSTEM message (appropriate for context)
- CompressionMetadata properly accumulates archived_turns across multiple compactions
- Fill re-estimated after compression
158-184: LGTM!
_build_summaryhandles edge cases well:
- Only extracts from ASSISTANT messages (user queries don't need summarization)
- Truncates individual snippets to 100 chars
- Falls back to a generic archival message when no suitable content exists
- Final summary capped at
_MAX_SUMMARY_CHARSsrc/synthorg/observability/events/context_budget.py (1)
1-16: LGTM!Well-structured event constants following the project's observability conventions:
Final[str]typing for immutability- Consistent dot-separated naming (
context_budget.<category>.<action>)- Comprehensive coverage of fill tracking, compaction lifecycle, and indicator injection
src/synthorg/engine/plan_execute_loop.py (3)
17-19: LGTM!Clean integration following the same pattern as ReactLoop. The compaction_callback parameter is properly documented and stored.
Also applies to: 76-76, 102-120
137-140: LGTM!Property accessor follows the established pattern.
692-699: LGTM!Compaction correctly invoked at turn boundaries within step execution. The placement before stagnation detection differs slightly from ReactLoop (which does it after), but this ordering is acceptable—both approaches process the complete turn before compacting.
src/synthorg/engine/context_budget.py (4)
1-27: LGTM!Clean module setup with appropriate observability infrastructure. The
_TOOL_DEFINITION_TOKEN_OVERHEADconstant (50 tokens per tool) is a reasonable heuristic for tool definition serialization overhead.
29-80: LGTM!Well-designed frozen model with:
- Proper
@computed_fieldforfill_percent(avoids storing redundant data)- Null-safe percentage calculation
- Human-readable
format()output with comma-separated numbers and percentage display- Handles unknown capacity gracefully in formatting
83-109: LGTM!Factory correctly constructs the indicator from context fields and logs at DEBUG level with structured kwargs.
112-138: LGTM!Clean estimation function with keyword-only arguments for clarity. The formula correctly sums system prompt tokens, conversation tokens, and tool definition overhead.
src/synthorg/engine/context.py (1)
176-188: Nice use of@computed_fieldforcontext_fill_percent.Keeping the percentage derived from the authoritative token counts/capacity avoids stale state as the context evolves. Based on learnings "Use computed_field for derived values instead of storing + validating redundant fields."
src/synthorg/engine/prompt.py (1)
705-758: Nice: the budget indicator is threaded through both render and trim paths.That keeps
estimated_tokensaligned with the final prompt after trimming.
src/synthorg/engine/prompt.py
Outdated
| def estimate_conversation_tokens( | ||
| self, | ||
| messages: tuple[ChatMessage, ...], | ||
| ) -> int: | ||
| """Estimate total tokens across all messages. | ||
|
|
||
| Sums ``len(content) // 4 + overhead`` per message. | ||
| Tool results use their content field for estimation. | ||
|
|
||
| Args: | ||
| messages: The conversation messages to estimate. | ||
|
|
||
| Returns: | ||
| Estimated total token count (minimum 0). | ||
| """ | ||
| total = 0 | ||
| for msg in messages: | ||
| content = msg.content or "" | ||
| if msg.tool_result is not None: | ||
| content = msg.tool_result.content or "" | ||
| total += len(content) // 4 + self._PER_MESSAGE_OVERHEAD | ||
| return total |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the ChatMessage model definition. If it exposes token-bearing fields
# beyond `content` / `tool_result`, fold them into estimate_conversation_tokens().
rg -n -A120 -B5 --type=py '^class ChatMessage\b' .Repository: Aureliolo/synthorg
Length of output: 9326
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find ToolCall definition to see what fields it contains
rg -n -A30 -B2 '^class ToolCall\b' src/synthorg/providers/models.pyRepository: Aureliolo/synthorg
Length of output: 1242
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for tests of estimate_conversation_tokens
rg -n "estimate_conversation_tokens" --type=py .
# Also check if there are any comments in the prompt.py file about tool_calls estimation
echo "---"
rg -n -B5 -A5 "tool_call" src/synthorg/engine/prompt.pyRepository: Aureliolo/synthorg
Length of output: 1364
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look at the main test for estimate_conversation_tokens
sed -n '172,220p' tests/unit/engine/test_context_budget.py
echo "---"
# Check if any test creates ChatMessage with tool_calls
rg -n "tool_calls=" tests/unit/engine/test_context_budget.py -A3 -B3Repository: Aureliolo/synthorg
Length of output: 1018
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check all tests for messages with tool_calls
rg -n "tool_calls" tests/unit/engine/test_context_budget.py -A3 -B3
echo "---"
# Check if there's a test for tool_calls in the test_prompt.py mock
rg -n "tool_calls" tests/unit/engine/test_prompt.py -A3 -B3Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# View the full estimate_conversation_tokens implementation and surrounding context
sed -n '150,180p' src/synthorg/engine/prompt.py
echo "---"
# Check if there are any comments in the class about limitations
sed -n '85,160p' src/synthorg/engine/prompt.py | grep -A10 -B10 "DefaultTokenEstimator"Repository: Aureliolo/synthorg
Length of output: 1518
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look for any other token estimation for tool_calls elsewhere in the codebase
rg -n "tool_calls" src/synthorg/engine/ --type=py
echo "---"
# Check if there's any serialization/JSON encoding of tool_calls to understand token impact
rg -n "json" src/synthorg/engine/prompt.py -i -A2 -B2Repository: Aureliolo/synthorg
Length of output: 3404
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check if there are any assistant messages with tool_calls created in tests
rg -n "MessageRole.ASSISTANT" tests/unit/engine/test_context_budget.py -A5 -B2 | head -60
echo "---"
# Check the broader test structure to see if tool_calls scenarios are tested anywhere
rg -n "ToolCall" tests/unit/engine/ -lRepository: Aureliolo/synthorg
Length of output: 1004
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check if tool_calls are intentionally omitted with any comments
rg -n "tool_call" src/synthorg/engine/prompt.py -B3 -A3
# Verify tool_calls do get sent to LLM providers (proof they consume tokens)
rg -n "tool_calls" src/synthorg/providers/ --type=py | head -20Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
#!/bin/bash
set -euo pipefochk
# Check how tool_calls are serialized/sent to providers
rg -n "to_dict\|model_dump\|serialize" src/synthorg/providers/ --type=py | grep -i tool
echo "---"
# Check if tool_calls on assistant messages actually get sent to LLM providers
rg -n "tool_calls" src/synthorg/providers/anthropic_provider.py -B2 -A2
echo "---"
# Verify completion request structure includes tool_calls
rg -n "class.*Request" src/synthorg/providers/models.py | head -5Repository: Aureliolo/synthorg
Length of output: 116
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check how tool_calls are serialized/sent to providers
rg -n "to_dict\|model_dump\|serialize" src/synthorg/providers/ --type=py | grep -i tool
echo "---"
# Check if tool_calls on assistant messages actually get sent to LLM providers
rg -n "tool_calls" src/synthorg/providers/anthropic_provider.py -B2 -A2
echo "---"
# Verify completion request structure includes tool_calls
rg -n "class.*Request" src/synthorg/providers/models.py | head -5Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find CompletionRequest definition to see if it includes messages with tool_calls
rg -n "class CompletionRequest" src/synthorg/providers/models.py -A30
echo "---"
# Check an actual provider implementation to see how messages are used
fd "openai_provider.py" --type f src/ 2>/dev/null | head -1 | xargs -I {} sed -n '1,100p' {}Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Simple check: look for CompletionRequest definition
rg -n "^class CompletionRequest" src/synthorg/ --type=py -A20Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for any TODOs, FIXMEs, or limitations mentioning tool_calls in token estimation
rg -n "TODO\|FIXME\|limitation\|XXX" src/synthorg/engine/prompt.py -i -A2 -B2
echo "---"
# Check the entire DefaultTokenEstimator class definition
sed -n '85,210p' src/synthorg/engine/prompt.pyRepository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read the entire prompt.py file to see the full context
wc -l src/synthorg/engine/prompt.py
echo "---"
# Read the DefaultTokenEstimator class
head -250 src/synthorg/engine/prompt.py | tail -150Repository: Aureliolo/synthorg
Length of output: 5094
Include tool_calls in token estimation for ASSISTANT messages.
The estimator currently ignores msg.tool_calls, which are token-bearing on ASSISTANT messages (id, name, arguments dict). When ASSISTANT messages include tool calls, the conversation token count is undercounted, causing context compaction to trigger later than planned. Update estimate_conversation_tokens() to sum token estimates for each ToolCall in the tuple alongside content and tool_result.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/engine/prompt.py` around lines 155 - 176, The estimator in
estimate_conversation_tokens currently omits msg.tool_calls, undercounting
tokens for ASSISTANT messages with tool invocations; update
estimate_conversation_tokens to, for each message that has a non-empty
msg.tool_calls tuple, iterate its ToolCall entries and add an estimated token
count for each (include the ToolCall.id, ToolCall.name and a serialized form of
ToolCall.arguments), e.g. serialize arguments (json.dumps or similar), measure
lengths and convert to token estimates (length // 4) and add those to total (you
may also include per-call overhead similar to self._PER_MESSAGE_OVERHEAD); use
the existing ChatMessage, ToolCall attributes (msg.tool_calls, msg.tool_result,
msg.content) to locate where to add this logic.
| @pytest.mark.unit | ||
| class TestCompactionConfig: |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add pytest.mark.timeout(30) to comply with test guidelines.
Per coding guidelines, tests should have a 30-second timeout. Consider adding pytestmark at the module level for consistency with other test files.
💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
-@pytest.mark.unit
class TestCompactionConfig:Then remove individual @pytest.mark.unit decorators from each class.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.unit | |
| class TestCompactionConfig: | |
| pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)] | |
| class TestCompactionConfig: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/compaction/test_models.py` around lines 13 - 14, The test
module currently uses per-class decorators like `@pytest.mark.unit` (e.g.,
TestCompactionConfig); add a module-level pytestmark = [pytest.mark.unit,
pytest.mark.timeout(30)] to enforce the 30s timeout per guidelines and then
remove individual `@pytest.mark.unit` decorators from test classes such as
TestCompactionConfig so the module-level marks apply consistently.
| @pytest.mark.unit | ||
| class TestContextBudgetIndicatorProperties: |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add pytest.mark.timeout(30) for test timeout compliance.
Per coding guidelines, tests should have a 30-second timeout.
💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
-@pytest.mark.unit
class TestContextBudgetIndicatorProperties:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_context_budget_properties.py` around lines 13 - 14,
Add a 30-second timeout decorator to the test class by applying
pytest.mark.timeout(30) to TestContextBudgetIndicatorProperties; ensure pytest
is imported and annotate the class with `@pytest.mark.timeout`(30) so all tests in
the class inherit the timeout requirement.
| @pytest.mark.unit | ||
| class TestContextBudgetIndicator: |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add pytest.mark.timeout(30) for test timeout compliance.
Per coding guidelines, tests should have a 30-second timeout. Consider using pytestmark at module level.
💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
-@pytest.mark.unit
class TestContextBudgetIndicator:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_context_budget.py` around lines 20 - 21, Add a
30-second pytest timeout to the test module by declaring a module-level pytest
mark (e.g., pytestmark = pytest.mark.timeout(30)) so all tests in this file,
including the TestContextBudgetIndicator class, inherit the timeout; place the
pytestmark declaration near the top of the file alongside existing pytest
imports to satisfy the coding guideline.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/engine/plan_execute_loop.py (1)
684-698:⚠️ Potential issue | 🟠 MajorCompaction is skipped on step-completion turns.
When
_run_step_turn()returns(ctx, bool)(Line 686),_execute_step()returns immediately and never reaches the compaction block (Lines 690-697). This means compaction only runs on continuing turns, not on terminal turns of a step.♻️ Proposed fix
while ctx.has_turns_remaining: result = await self._run_step_turn( ctx, provider, executor_model, config, tool_defs, tool_invoker, turns, budget_checker, shutdown_checker, ) if isinstance(result, ExecutionResult): return result if isinstance(result, tuple): - return result - ctx = result + ctx, step_ok = result + compacted = await invoke_compaction( + ctx, + self._compaction_callback, + ctx.turn_count, + ) + if compacted is not None: + ctx = compacted + return ctx, step_ok + ctx = result # Context compaction at turn boundaries compacted = await invoke_compaction( ctx, self._compaction_callback, ctx.turn_count, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/plan_execute_loop.py` around lines 684 - 698, When _execute_step() receives a tuple result from _run_step_turn() (i.e., (ctx, bool)), it currently returns immediately and skips context compaction; modify _execute_step() so it extracts ctx from the tuple result, calls invoke_compaction(ctx, self._compaction_callback, ctx.turn_count), updates ctx if compaction returns a new context, and then returns the original tuple with the potentially compacted ctx and the original boolean flag. Make this change in the branch that handles tuple results so invoke_compaction is applied to the ctx before returning, using the same invoke_compaction, _compaction_callback, and ctx.turn_count symbols already present.
♻️ Duplicate comments (4)
tests/unit/engine/test_context_budget_properties.py (1)
13-14: 🧹 Nitpick | 🔵 TrivialAdd
pytest.mark.timeout(30)for test timeout compliance.Per coding guidelines, tests should have a 30-second timeout. Consider using
pytestmarkat module level to apply both markers.💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)] + + -@pytest.mark.unit class TestContextBudgetIndicatorProperties: """Hypothesis property tests for ContextBudgetIndicator."""Then remove the individual
@pytest.mark.unitdecorators from both classes.Also applies to: 69-70
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/engine/test_context_budget_properties.py` around lines 13 - 14, Add a module-level pytestmark to apply both the unit marker and a 30-second timeout by defining pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)] at the top of the test module, then remove the per-class decorator(s) such as `@pytest.mark.unit` from TestContextBudgetIndicatorProperties (and the other test class referenced) so the markers are applied consistently via pytestmark.tests/unit/engine/test_context_budget.py (2)
107-114: 🧹 Nitpick | 🔵 TrivialImport
_TOOL_DEFINITION_TOKEN_OVERHEADinstead of hardcoding 150.The assertion hardcodes
100 + 150where 150 = 3 × 50 (tool overhead). If_TOOL_DEFINITION_TOKEN_OVERHEADchanges, this test will fail with a confusing message. Import the constant to make the relationship explicit.♻️ Proposed fix
from synthorg.engine.context_budget import ( ContextBudgetIndicator, + _TOOL_DEFINITION_TOKEN_OVERHEAD, estimate_context_fill, make_context_indicator, update_context_fill, )def test_with_tools(self) -> None: result = estimate_context_fill( system_prompt_tokens=100, conversation=(), tool_definitions_count=3, ) - # 3 tools * 50 overhead = 150 - assert result == 100 + 150 + # 3 tools * overhead + assert result == 100 + 3 * _TOOL_DEFINITION_TOKEN_OVERHEAD🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/engine/test_context_budget.py` around lines 107 - 114, The test test_with_tools currently hardcodes the tool overhead (100 + 150) instead of using the constant; update the test to import and use _TOOL_DEFINITION_TOKEN_OVERHEAD and compute expected as system_prompt_tokens + tool_definitions_count * _TOOL_DEFINITION_TOKEN_OVERHEAD (referencing the test function test_with_tools and the constant _TOOL_DEFINITION_TOKEN_OVERHEAD) so the assertion becomes dynamic and will remain correct if the overhead value changes.
22-23: 🧹 Nitpick | 🔵 TrivialAdd
pytest.mark.timeout(30)for test timeout compliance.Per coding guidelines, tests should have a 30-second timeout.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/engine/test_context_budget.py` around lines 22 - 23, The test class TestContextBudgetIndicator is missing a timeout decorator; add pytest.mark.timeout(30) as a decorator on the TestContextBudgetIndicator class (above its declaration) so all tests in that class inherit the 30-second timeout per guidelines; ensure you import pytest at top if not already present and apply the decorator directly to the class definition.src/synthorg/engine/prompt.py (1)
36-39:⚠️ Potential issue | 🟠 MajorImported estimator still undercounts assistant
tool_callspayloads.
build_system_prompt()now relies onDefaultTokenEstimatorfromsynthorg.engine.token_estimation, whose conversation estimator currently countscontent/tool_resultbut nottool_calls. Prompt size is therefore under-estimated when assistant tool calls are present, which delays trimming.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/prompt.py` around lines 36 - 39, build_system_prompt() switched to DefaultTokenEstimator which currently omits assistant tool_calls payloads when estimating conversation tokens; update the estimator used or its implementation so tool_calls are counted: either revert to PromptTokenEstimator in build_system_prompt() or modify DefaultTokenEstimator (and its conversation estimator logic) to include assistant.tool_calls (and nested tool_result contents) in the token count. Locate the estimator classes in synthorg.engine.token_estimation (DefaultTokenEstimator, PromptTokenEstimator) and the build_system_prompt function, and ensure the conversation tokenization logic accounts for the assistant message field "tool_calls" the same way it handles "content" and "tool_result".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 185-190: The code in _build_summary currently emits
CONTEXT_BUDGET_COMPACTION_SKIPPED when no assistant snippets are found but still
returns a fallback summary and allows compaction to proceed, which mislabels the
event; update the logging so that CONTEXT_BUDGET_COMPACTION_SKIPPED is only used
when compaction truly is skipped, and for the "no assistant content" path either
remove that skipped log or replace it with a distinct event (e.g.,
CONTEXT_BUDGET_COMPACTION_FALLBACK or CONTEXT_BUDGET_NO_ASSISTANT_CONTENT) and
keep the fallback return behavior; modify the logger.debug call in
_build_summary (and related constants if needed) so metrics/alerts reflect the
actual outcome.
- Around line 101-126: The code incorrectly assumes conversation[0] is a system
message; instead, scan conversation to find the first message with role
MessageRole.SYSTEM (e.g., index_system = next((i for i,m in
enumerate(conversation) if m.role == MessageRole.SYSTEM), None)). If a system
message exists, keep it (use that index to build archivable =
conversation[index_system+1:-preserve_count] and recent =
conversation[-preserve_count:]) and place the summary after the existing system
message; if no system message is found, treat all but the preserved recent turns
as archivable (archivable = conversation[:-preserve_count]) and prepend the
generated summary_msg (role=MessageRole.SYSTEM) before recent turns. Update
construction of compressed_conversation accordingly and ensure _build_summary,
summary_msg, summary_tokens and estimator usage remain the same.
In `@src/synthorg/engine/context_budget.py`:
- Around line 121-147: The estimate_context_fill function should validate
numeric inputs at its boundary: check that system_prompt_tokens and
tool_definitions_count are non-negative before computing the sum (and optionally
validate estimator is not an invalid type); if either is negative, raise a
ValueError with a clear message; keep the existing logic (use estimator or
DefaultTokenEstimator, call estimate_conversation_tokens, and apply
_TOOL_DEFINITION_TOKEN_OVERHEAD) and reference the same symbols
(estimate_context_fill, DefaultTokenEstimator, estimator,
_TOOL_DEFINITION_TOKEN_OVERHEAD, conversation) so callers receive immediate,
deterministic errors for invalid inputs.
In `@tests/unit/engine/test_context_budget.py`:
- Around line 204-206: The test method test_capacity_tokens_zero_rejected is
testing ContextBudgetIndicator validation but is currently placed with
DefaultTokenEstimator tests; move the test into the TestContextBudgetIndicator
test class so it lives alongside other ContextBudgetIndicator tests, keeping the
same body (with pytest.raises(ValidationError):
ContextBudgetIndicator(fill_tokens=100, capacity_tokens=0)) and adjusting
indentation/placement so it becomes a method of TestContextBudgetIndicator and
is removed from the DefaultTokenEstimator class.
---
Outside diff comments:
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 684-698: When _execute_step() receives a tuple result from
_run_step_turn() (i.e., (ctx, bool)), it currently returns immediately and skips
context compaction; modify _execute_step() so it extracts ctx from the tuple
result, calls invoke_compaction(ctx, self._compaction_callback, ctx.turn_count),
updates ctx if compaction returns a new context, and then returns the original
tuple with the potentially compacted ctx and the original boolean flag. Make
this change in the branch that handles tuple results so invoke_compaction is
applied to the ctx before returning, using the same invoke_compaction,
_compaction_callback, and ctx.turn_count symbols already present.
---
Duplicate comments:
In `@src/synthorg/engine/prompt.py`:
- Around line 36-39: build_system_prompt() switched to DefaultTokenEstimator
which currently omits assistant tool_calls payloads when estimating conversation
tokens; update the estimator used or its implementation so tool_calls are
counted: either revert to PromptTokenEstimator in build_system_prompt() or
modify DefaultTokenEstimator (and its conversation estimator logic) to include
assistant.tool_calls (and nested tool_result contents) in the token count.
Locate the estimator classes in synthorg.engine.token_estimation
(DefaultTokenEstimator, PromptTokenEstimator) and the build_system_prompt
function, and ensure the conversation tokenization logic accounts for the
assistant message field "tool_calls" the same way it handles "content" and
"tool_result".
In `@tests/unit/engine/test_context_budget_properties.py`:
- Around line 13-14: Add a module-level pytestmark to apply both the unit marker
and a 30-second timeout by defining pytestmark = [pytest.mark.unit,
pytest.mark.timeout(30)] at the top of the test module, then remove the
per-class decorator(s) such as `@pytest.mark.unit` from
TestContextBudgetIndicatorProperties (and the other test class referenced) so
the markers are applied consistently via pytestmark.
In `@tests/unit/engine/test_context_budget.py`:
- Around line 107-114: The test test_with_tools currently hardcodes the tool
overhead (100 + 150) instead of using the constant; update the test to import
and use _TOOL_DEFINITION_TOKEN_OVERHEAD and compute expected as
system_prompt_tokens + tool_definitions_count * _TOOL_DEFINITION_TOKEN_OVERHEAD
(referencing the test function test_with_tools and the constant
_TOOL_DEFINITION_TOKEN_OVERHEAD) so the assertion becomes dynamic and will
remain correct if the overhead value changes.
- Around line 22-23: The test class TestContextBudgetIndicator is missing a
timeout decorator; add pytest.mark.timeout(30) as a decorator on the
TestContextBudgetIndicator class (above its declaration) so all tests in that
class inherit the 30-second timeout per guidelines; ensure you import pytest at
top if not already present and apply the decorator directly to the class
definition.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a8ac89dc-7fcb-4e10-ba46-bd9f8dc30786
📒 Files selected for processing (20)
CLAUDE.mddocs/design/engine.mddocs/roadmap/open-questions.mdsrc/synthorg/engine/__init__.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/token_estimation.pysrc/synthorg/observability/events/context_budget.pytests/unit/engine/compaction/test_models.pytests/unit/engine/compaction/test_summarizer.pytests/unit/engine/test_context_budget.pytests/unit/engine/test_context_budget_properties.pytests/unit/engine/test_prompt.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Deploy Preview
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:syntax (no parentheses) — PEP 758 except syntax enforced by ruff on Python 3.14.
Add type hints to all public functions; enforce mypy strict mode compliance.
Use Google-style docstrings on all public classes and functions; enforce via ruff D rules.
Keep functions under 50 lines and files under 800 lines.
Handle errors explicitly and never silently swallow exceptions.
Validate at system boundaries (user input, external APIs, config files).
Set line length to 88 characters (ruff enforces).
Useuvfor Python dependency management. Install withuv sync, run tests with-n autofor parallelism, lint withruff check, format withruff format, type-check withmypy.
Scripts inscripts/have relaxed ruff rules:
Files:
tests/unit/engine/test_context_budget_properties.pysrc/synthorg/engine/checkpoint/resume.pytests/unit/engine/compaction/test_models.pytests/unit/engine/compaction/test_summarizer.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/plan_execute_loop.pytests/unit/engine/test_context_budget.pytests/unit/engine/test_prompt.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/token_estimation.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/react_loop.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow. Maintain 80% minimum code coverage.
Useasyncio_mode = 'auto'for async tests — no manual@pytest.mark.asyncioneeded. Set test timeout to 30 seconds per test. Always include-n autofor parallelism via pytest-xdist.
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned test code, docstrings, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,test-provider,test-small-001, etc. Vendor names may only appear in Operations design page, .claude/ files, or third-party import paths.
Use Hypothesis for property-based testing with@given+@settings. Profiles:ci(200 examples, default) anddev(1000 examples) viaHYPOTHESIS_PROFILEenv var. Never skip, dismiss, or ignore flaky tests — fix them fully. Mocktime.monotonic()andasyncio.sleep()for deterministic timing-sensitive tests instead of widening margins.
Files:
tests/unit/engine/test_context_budget_properties.pytests/unit/engine/compaction/test_models.pytests/unit/engine/compaction/test_summarizer.pytests/unit/engine/test_context_budget.pytests/unit/engine/test_prompt.py
**/*.{md,py,ts,tsx,vue,go,yml,yaml,json,toml}
📄 CodeRabbit inference engine (CLAUDE.md)
Use conventional commits:
<type>: <description>where types are feat, fix, refactor, docs, test, chore, perf, ci. Commits must be GPG/SSH signed on main branch.
Files:
tests/unit/engine/test_context_budget_properties.pydocs/roadmap/open-questions.mdsrc/synthorg/engine/checkpoint/resume.pytests/unit/engine/compaction/test_models.pytests/unit/engine/compaction/test_summarizer.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/plan_execute_loop.pytests/unit/engine/test_context_budget.pytests/unit/engine/test_prompt.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/prompt.pyCLAUDE.mdsrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/token_estimation.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/react_loop.pydocs/design/engine.md
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Read relevant
docs/design/page before implementing features or planning issues. Design spec is the starting point for architecture, data models, and behavior.
Files:
docs/roadmap/open-questions.mdCLAUDE.mddocs/design/engine.md
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Documentation in Markdown built with Zensical. Design spec: 7 pages under
docs/design/(index, agents, organization, communication, engine, memory, operations). Architecture:docs/architecture/. Roadmap:docs/roadmap/. Security:docs/security.md. Licensing:docs/licensing.md. Reference:docs/reference/. REST API:docs/rest-api.mdwith generated Scalar UI. Library reference:docs/api/(auto-generated from docstrings via mkdocstrings + Griffe).
Files:
docs/roadmap/open-questions.mddocs/design/engine.md
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Create new objects instead of mutating existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping. Fordict/listfields in frozen Pydantic models, rely onfrozen=Trueandcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (withmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Adopt Pydantic v2 conventions: use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfromcore.typesfor all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Every module with business logic must have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging,logging.getLogger(), orprint()in application code.
Always useloggeras the variable name (not_logger, notlog). Use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,PROVIDER_CALL_STARTfromevents.provider). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Always use structured kwargs for logging:logger.info(EVENT, key=value)— never use string interpolation likelogger.info('msg %s', val).
Log all error paths at WARNING or ERROR with context before raising. Log all state transitions at INFO. Log object creation, internal flow, and entry/exit of key functions at DEBUG.
Do NOT useimport loggingorlogging.getLogger(). Pure data models, ...
Files:
src/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/token_estimation.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/react_loop.py
src/synthorg/engine/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.
Files:
src/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/token_estimation.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/react_loop.py
🧠 Learnings (28)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/engine/**/*.py : Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.
Applied to files:
tests/unit/engine/test_context_budget_properties.pytests/unit/engine/compaction/test_models.pytests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to tests/**/*.py : Use `asyncio_mode = 'auto'` for async tests — no manual `pytest.mark.asyncio` needed. Set test timeout to 30 seconds per test. Always include `-n auto` for parallelism via pytest-xdist.
Applied to files:
tests/unit/engine/test_context_budget_properties.pytests/unit/engine/compaction/test_models.pytests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).
Applied to files:
tests/unit/engine/test_context_budget_properties.pytests/unit/engine/compaction/test_models.pytests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to tests/**/*.py : Mark tests with `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow`. Maintain 80% minimum code coverage.
Applied to files:
tests/unit/engine/test_context_budget_properties.pytests/unit/engine/compaction/test_models.pytests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Applied to files:
tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings`. Profiles: `ci` (200 examples, default) and `dev` (1000 examples) via `HYPOTHESIS_PROFILE` env var. Never skip, dismiss, or ignore flaky tests — fix them fully. Mock `time.monotonic()` and `asyncio.sleep()` for deterministic timing-sensitive tests instead of widening margins.
Applied to files:
tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/engine/**/*.py : Agent orchestration with TaskEngine (centralized single-writer task state), task lifecycle, recovery, shutdown, workspace isolation. Coordination: multi-agent pipeline via TopologyDispatcher protocol (4 dispatchers: SAS/centralized/decentralized/context-dependent), wave execution, workspace lifecycle, CoordinationSectionConfig bridge, build_coordinator factory. Approval gate: escalation detection, context parking/resume, EscalationInfo/ResumePayload. Stagnation detection: StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection. Checkpoint recovery: per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy. AgentRuntimeState: lightweight per-agent execution status for dashboard + recovery.
Applied to files:
docs/roadmap/open-questions.mdsrc/synthorg/engine/__init__.pysrc/synthorg/engine/plan_execute_loop.pyCLAUDE.mdsrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/react_loop.pydocs/design/engine.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration sampling, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping).
Applied to files:
docs/roadmap/open-questions.mdCLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/context_budget.pyCLAUDE.mdsrc/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Always use `logger` as the variable name (not `_logger`, not `log`). Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/context_budget.pyCLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/budget/**/*.py : Cost tracking with pre-flight/in-flight budget checks, auto-downgrade, billing periods, cost tiers, quota/subscription tracking. CFO cost optimization: anomaly detection, efficiency analysis, downgrade recommendations, approval decisions. Budget errors: `BudgetExhaustedError`, `DailyLimitExceededError`, `QuotaExhaustedError`.
Applied to files:
src/synthorg/observability/events/context_budget.pytests/unit/engine/test_context_budget.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/loop_helpers.pydocs/design/engine.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising. Log all state transitions at INFO. Log object creation, internal flow, and entry/exit of key functions at DEBUG.
Applied to files:
src/synthorg/observability/events/context_budget.pyCLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/tools/**/*.py : Tool registry with built-in tools: file_system/, git (SSRF prevention via git_url_validator), sandbox/, code_runner. MCP bridge (mcp/). Role-based access. Approval tool (request_human_approval).
Applied to files:
src/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/react_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/security/**/*.py : SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).
Applied to files:
src/synthorg/engine/plan_execute_loop.pyCLAUDE.mdsrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/react_loop.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to docs/**/*.md : Documentation in Markdown built with Zensical. Design spec: 7 pages under `docs/design/` (index, agents, organization, communication, engine, memory, operations). Architecture: `docs/architecture/`. Roadmap: `docs/roadmap/`. Security: `docs/security.md`. Licensing: `docs/licensing.md`. Reference: `docs/reference/`. REST API: `docs/rest-api.md` with generated Scalar UI. Library reference: `docs/api/` (auto-generated from docstrings via mkdocstrings + Griffe).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/memory/**/*.py : Pluggable MemoryBackend protocol with backends/ adapters (Mem0 at backends/mem0/). Retrieval pipeline: ranking, RRF fusion, injection, context formatting, non-inferable filtering. Shared org memory (org/). Consolidation/archival (consolidation/).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Always use structured kwargs for logging: `logger.info(EVENT, key=value)` — never use string interpolation like `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging`, `logging.getLogger()`, or `print()` in application code.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Do NOT use `import logging` or `logging.getLogger()`. Pure data models, enums, and re-exports do NOT need logging setup.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-17T07:15:40.520Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T07:15:40.520Z
Learning: Applies to src/synthorg/api/**/*.py : Litestar REST + WebSocket API with JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint. Use RFC 9457 structured errors: `ErrorCategory`, `ErrorCode`, `ErrorDetail`, `ProblemDetail`, `CATEGORY_TITLES`, `category_title`, `category_type_uri`, content negotiation.
Applied to files:
src/synthorg/engine/loop_helpers.py
🧬 Code graph analysis (13)
tests/unit/engine/test_context_budget_properties.py (2)
src/synthorg/engine/context_budget.py (3)
ContextBudgetIndicator(32-84)format(66-84)fill_percent(60-64)src/synthorg/engine/token_estimation.py (3)
DefaultTokenEstimator(48-90)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-90)
src/synthorg/engine/checkpoint/resume.py (3)
src/synthorg/engine/plan_execute_loop.py (4)
compaction_callback(136-138)config(121-123)approval_gate(126-128)stagnation_detector(131-133)src/synthorg/engine/react_loop.py (3)
compaction_callback(109-111)approval_gate(99-101)stagnation_detector(104-106)src/synthorg/engine/stagnation/detector.py (1)
config(46-48)
tests/unit/engine/compaction/test_models.py (1)
src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(42-73)
tests/unit/engine/compaction/test_summarizer.py (3)
src/synthorg/engine/compaction/models.py (1)
CompactionConfig(10-39)src/synthorg/engine/compaction/summarizer.py (1)
make_compaction_callback(37-61)src/synthorg/engine/context.py (2)
AgentContext(104-400)from_identity(190-225)
src/synthorg/engine/__init__.py (1)
src/synthorg/engine/token_estimation.py (2)
DefaultTokenEstimator(48-90)PromptTokenEstimator(15-45)
src/synthorg/engine/compaction/summarizer.py (3)
src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(42-73)src/synthorg/engine/token_estimation.py (6)
DefaultTokenEstimator(48-90)PromptTokenEstimator(15-45)estimate_tokens(22-31)estimate_tokens(58-67)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-90)src/synthorg/engine/context.py (3)
AgentContext(104-400)context_fill_percent(180-187)with_compression(302-324)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
invoke_compaction(654-689)src/synthorg/engine/react_loop.py (1)
compaction_callback(109-111)
tests/unit/engine/test_context_budget.py (3)
src/synthorg/engine/context.py (6)
AgentContext(104-400)from_identity(190-225)context_fill_percent(180-187)with_context_fill(289-300)with_compression(302-324)to_snapshot(371-395)src/synthorg/engine/context_budget.py (6)
ContextBudgetIndicator(32-84)estimate_context_fill(121-147)make_context_indicator(87-118)update_context_fill(150-186)format(66-84)fill_percent(60-64)src/synthorg/engine/token_estimation.py (3)
DefaultTokenEstimator(48-90)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-90)
tests/unit/engine/test_prompt.py (1)
src/synthorg/engine/token_estimation.py (1)
DefaultTokenEstimator(48-90)
src/synthorg/engine/context_budget.py (2)
src/synthorg/engine/token_estimation.py (3)
DefaultTokenEstimator(48-90)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-90)src/synthorg/engine/context.py (2)
AgentContext(104-400)with_context_fill(289-300)
src/synthorg/engine/prompt.py (1)
src/synthorg/engine/token_estimation.py (2)
DefaultTokenEstimator(48-90)PromptTokenEstimator(15-45)
src/synthorg/engine/compaction/__init__.py (1)
src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(42-73)
src/synthorg/engine/react_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
invoke_compaction(654-689)src/synthorg/engine/plan_execute_loop.py (3)
compaction_callback(136-138)approval_gate(126-128)stagnation_detector(131-133)
🔇 Additional comments (20)
src/synthorg/observability/events/context_budget.py (1)
1-15: LGTM!Clean, minimal event constants module following the established patterns. Correctly organized by lifecycle phase (fill tracking, compaction lifecycle, indicator injection) with appropriate
Final[str]typing. Pure data module appropriately omits logging setup per coding guidelines.src/synthorg/engine/__init__.py (1)
186-189: LGTM!Clean internal refactoring relocating token estimation exports to the dedicated
token_estimationmodule. Public API surface remains unchanged.tests/unit/engine/compaction/test_models.py (1)
12-74: LGTM!Comprehensive test coverage for
CompactionConfigandCompressionMetadatavalidation boundaries, defaults, and immutability. Tests are well-structured and cover the critical edge cases (threshold bounds, minimum constraints, frozen model behavior).tests/unit/engine/test_prompt.py (2)
31-31: LGTM!Import correctly updated to reflect the token estimation module relocation.
475-479: LGTM!
CountingEstimatornow correctly implements the fullPromptTokenEstimatorprotocol by addingestimate_conversation_tokens. Returning 0 is appropriate here since this test only verifiesestimate_tokensis called during prompt construction.docs/roadmap/open-questions.md (1)
12-12: LGTM!Clear documentation updates reflecting the partial resolution of context window management. Good practice to track remaining work (LLM-based summarization, tiktoken estimator, AgentEngine wiring) for future iterations.
Also applies to: 24-24
tests/unit/engine/compaction/test_summarizer.py (1)
39-232: LGTM!Excellent test coverage for the compaction callback factory, including:
- Threshold boundary behavior (below/above)
- Edge cases (too few messages, nothing to archive, unknown capacity)
- Multiple compaction increment tracking
- System message preservation and summary injection
Tests are well-structured with clear setup helpers and focused assertions.
src/synthorg/engine/checkpoint/resume.py (3)
72-93: LGTM!Well-designed compression-aware reconciliation messaging. The conditional content construction provides helpful context to the agent about prior compaction, including the number of archived turns, which aids in understanding the conversation state upon resume.
128-142: LGTM!Clean propagation of
compaction_callbackwhen reconstructing loop instances with checkpoint callbacks. Follows the established pattern for other loop properties (approval_gate,stagnation_detector).
176-177: LGTM!Correct use of PEP 758
except A, B:syntax for re-raisingMemoryErrorandRecursionError. These catastrophic errors should not be masked by the best-effort cleanup logic—good defensive coding.Also applies to: 193-194
CLAUDE.md (2)
125-125: LGTM!Accurately documents the new context budget management and compaction capabilities added to the engine package.
199-199: LGTM!Context budget event constants properly added to the comprehensive event naming reference list, maintaining consistency with existing domain event patterns.
src/synthorg/engine/react_loop.py (1)
36-36: LGTM! Clean integration of compaction callback following established patterns.The compaction hook integration mirrors the existing
checkpoint_callback,stagnation_detector, andapproval_gatepatterns:
- TYPE_CHECKING import for the protocol type
- Optional constructor parameter with underscore-prefixed storage
- Property accessor for external access
- Invocation at turn boundaries with result replacement
The placement after stagnation detection is appropriate — compaction should run after determining the loop will continue.
Also applies to: 51-51, 80-82, 91-96, 108-111, 206-213
src/synthorg/engine/compaction/__init__.py (1)
1-18: LGTM! Clean package initialization with explicit public API.The
__all__export list correctly exposes the three primary types consumers need:CompactionCallback,CompactionConfig, andCompressionMetadata.docs/design/engine.md (1)
587-663: LGTM! Comprehensive documentation for the context budget management feature.The new section accurately documents:
- Context fill tracking fields on
AgentContext- Soft budget indicator format and injection point
CompactionCallbacktype alias and wiring patternCompactionConfigparameters with defaults- Compressed checkpoint recovery behavior
- Loop-specific integration points (ReactLoop vs PlanExecuteLoop)
The documentation aligns with the implementation in the reviewed code files.
tests/unit/engine/test_context_budget_properties.py (1)
17-39: LGTM! Solid property test for format robustness.The strategy correctly handles the
capacity_tokensconstraint (gt=0in the model) by usingmin_value=1in thest.integersstrategy.src/synthorg/engine/token_estimation.py (2)
14-45: LGTM! Clean protocol definition and heuristic implementation.The
PromptTokenEstimatorprotocol is appropriately@runtime_checkablefor duck typing, andDefaultTokenEstimatorprovides a reasonablelen(text) // 4baseline. The docstring clearly notes this is a rough estimate suitable for swapping with a tiktoken-based estimator for precision.Also applies to: 48-90
84-89: No action needed. The code correctly prioritizestool_result.contentovermsg.contentwhen both could exist. This is intentional and explicitly documented in the method docstring: "Tool results use their content field for estimation." Sincetool_resultis marked for tool messages andcontentis optional, this behavior aligns with theChatMessageschema design.src/synthorg/engine/loop_helpers.py (1)
653-689: LGTM!invoke_compactionfollows established helper patterns.The function correctly:
- Returns early with
Nonewhen no callback is configured- Awaits the callback and returns the compacted context on success
- Re-raises
MemoryErrorandRecursionErrorunconditionally- Logs other exceptions with structured kwargs and returns
None(non-fatal)This matches the error isolation pattern used by
check_stagnation,check_shutdown, andcheck_budget.src/synthorg/engine/compaction/models.py (1)
10-39: LGTM! Well-structured frozen Pydantic models with appropriate constraints.Both models follow project conventions:
ConfigDict(frozen=True)for immutability- Google-style docstrings with
Attributessections- Field constraints that prevent invalid states (
gt,ge,le)- Sensible defaults for
CompactionConfig(80% threshold, 4 min messages, 3 preserved turns)The
CompressionMetadatacorrectly requires explicit values forcompression_point,archived_turns, andsummary_tokens(no defaults), whilecompactions_performeddefaults to 1 for the first compaction.Also applies to: 42-73
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tests/unit/engine/test_context_budget.py (1)
23-24: 🧹 Nitpick | 🔵 TrivialConsider adding module-level timeout marker.
Per coding guidelines, tests should have a 30-second timeout. A module-level
pytestmarkcould enforce this:pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]However, if timeout enforcement is configured globally in
pytest.iniorpyproject.toml, this may be unnecessary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/engine/test_context_budget.py` around lines 23 - 24, Add a module-level pytestmark to enforce the 30s timeout by defining pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)] at the top of the test module (where the TestContextBudgetIndicator class is declared); alternatively, if timeouts are already enforced globally via pytest.ini/pyproject.toml, confirm that and skip adding the marker.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 137-143: The archived_turn_count calculation currently uses
len(archivable) // 2 which assumes perfect user+assistant pairs; instead compute
archived_turn_count by counting actual assistant-turn messages in the archivable
list to reflect real turns (e.g., count items where role/type indicates
assistant or equivalent), then construct CompressionMetadata using that count
(keep references to archived_turn_count, archivable, CompressionMetadata, and
ctx.turn_count in summarizer.py); ensure prior_archived is still added and
summary_tokens/compactions_performed remain unchanged so the compressed metadata
accurately reflects archived assistant turns.
In `@src/synthorg/engine/token_estimation.py`:
- Around line 94-99: The token estimate for tool calls currently only accounts
for tc.name and tc.arguments; include the tool call identifier as well (the
serialized id field) when computing tc_tokens so the estimate matches the
comment ("id, name, serialized arguments"). Update the calculation in the
token_estimation logic that computes tc_tokens (where tc.name, tc.arguments and
self._PER_MESSAGE_OVERHEAD are used) to also add the serialized length
contribution of tc.id (e.g., include len(str(tc.id)) // 4 or equivalent
serialization length) before adding to total.
---
Duplicate comments:
In `@tests/unit/engine/test_context_budget.py`:
- Around line 23-24: Add a module-level pytestmark to enforce the 30s timeout by
defining pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)] at the top of
the test module (where the TestContextBudgetIndicator class is declared);
alternatively, if timeouts are already enforced globally via
pytest.ini/pyproject.toml, confirm that and skip adding the marker.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f740a26f-eb5d-4a84-aed9-dfdbb2509067
📒 Files selected for processing (7)
src/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/token_estimation.pytests/unit/engine/test_context_budget.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax:except A, B:(no parentheses) — ruff enforces this on Python 3.14.
Line length: 88 characters (ruff).
Files:
src/synthorg/engine/plan_execute_loop.pytests/unit/engine/test_context_budget.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/token_estimation.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Models: Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Use@computed_fieldfor derived values instead of storing + validating redundant fields. UseNotBlankStrfor all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Async concurrency: preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Functions: < 50 lines, files < 800 lines.
Errors: handle explicitly, never silently swallow.
Validate: at system boundaries (user input, external APIs, config files).
Files:
src/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/token_estimation.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging/logging.getLogger()/print()in application code.
Logging variable name: alwayslogger(not_logger, notlog).
Event names: always use constants from the domain-specific module undersynthorg.observability.events(e.g.,PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget, etc.). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Structured kwargs in logging: alwayslogger.info(EVENT, key=value)— neverlogger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Tests must usetest-provider,test-small-001, etc.
Files:
src/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/token_estimation.py
src/synthorg/{providers,engine}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedErrorsignals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respectsRateLimitError.retry_afterfrom providers — automatically pauses future requests.
Files:
src/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/token_estimation.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow. Coverage: 80% minimum. Async:asyncio_mode = "auto"— no manual@pytest.mark.asyncioneeded. Timeout: 30 seconds per test. Parallelism: always include-n autowhen running pytest, never run tests sequentially.
Parametrize: Prefer@pytest.mark.parametrizefor testing similar cases.
Property-based testing: Python uses Hypothesis (@given+@settings). Run dev profile with 1000 examples:HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Flaky tests: NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic instead of widening timing margins.
Files:
tests/unit/engine/test_context_budget.py
🧠 Learnings (4)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.
Applied to files:
tests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).
Applied to files:
tests/unit/engine/test_context_budget.py
📚 Learning: 2026-03-17T11:37:41.549Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:37:41.549Z
Learning: Applies to src/**/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing + validating redundant fields. Use `NotBlankStr` for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/engine/context.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/engine/context.py
🧬 Code graph analysis (5)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
invoke_compaction(654-689)src/synthorg/engine/react_loop.py (1)
compaction_callback(109-111)
src/synthorg/engine/compaction/summarizer.py (3)
src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(42-73)src/synthorg/engine/token_estimation.py (6)
DefaultTokenEstimator(48-100)PromptTokenEstimator(15-45)estimate_tokens(22-31)estimate_tokens(58-67)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-100)src/synthorg/engine/context.py (3)
AgentContext(104-412)context_fill_percent(180-187)with_compression(308-336)
src/synthorg/engine/context.py (1)
src/synthorg/engine/compaction/models.py (1)
CompressionMetadata(42-73)
src/synthorg/engine/checkpoint/resume.py (2)
src/synthorg/engine/plan_execute_loop.py (1)
compaction_callback(136-138)src/synthorg/engine/react_loop.py (1)
compaction_callback(109-111)
src/synthorg/engine/context_budget.py (2)
src/synthorg/engine/token_estimation.py (4)
DefaultTokenEstimator(48-100)PromptTokenEstimator(15-45)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-100)src/synthorg/engine/context.py (2)
AgentContext(104-412)with_context_fill(289-306)
🔇 Additional comments (11)
src/synthorg/engine/context.py (1)
289-336: Validation forfill_tokenshas been added — past concern addressed.Both
with_context_fill()andwith_compression()now validate thatfill_tokens >= 0and raiseValueErrorif negative. TheCompressionMetadatamodel (incompaction/models.py) uses PydanticField(ge=0)constraints oncompression_point,archived_turns, andsummary_tokens, so invalid metadata is rejected at construction time.src/synthorg/engine/compaction/summarizer.py (2)
191-196: Misleading event name for fallback summary path.When no assistant content is found, the code logs
CONTEXT_BUDGET_COMPACTION_COMPLETEDwithreason="no_assistant_content_for_summary", but compaction hasn't completed yet at this point — this is called from within_build_summary()beforectx.with_compression(). Consider using a distinct event likeCONTEXT_BUDGET_SUMMARY_FALLBACKor moving this log after the actual compaction completes.This was flagged in a previous review. The event name changed from
COMPACTION_SKIPPEDtoCOMPACTION_COMPLETED, but the fundamental issue remains — logging a "completed" event during summary construction is misleading.
101-118: System message detection and preservation logic looks correct.The code now properly checks
conversation[0].role == MessageRole.SYSTEMbefore treating it as a system message to preserve, addressing the previous review concern. The slicing logic forarchivableandrecentis correct.src/synthorg/engine/context_budget.py (2)
148-157: Input validation for non-negative token counts has been added.
estimate_context_fill()now validates thatsystem_prompt_tokensandtool_definitions_countare non-negative, raisingValueErrorwith descriptive messages. This addresses the previous review concern about invalid inputs.
187-196: Now logs the newly computed fill percentage.The function correctly computes
new_pctfrom the freshfillvalue and logs it, rather than logging the stalectx.context_fill_percent. This addresses the previous review concern.tests/unit/engine/test_context_budget.py (1)
9-18: Past review concerns addressed: constant imported and test relocated.The
_TOOL_DEFINITION_TOKEN_OVERHEADconstant is now imported (line 10) and used in the assertion (line 118), avoiding hardcoded magic numbers. Thetest_capacity_tokens_zero_rejectedtest has been moved intoTestContextBudgetIndicator(lines 72-74).src/synthorg/engine/plan_execute_loop.py (1)
687-706: Compaction callback integration follows established patterns.Compaction is invoked at both step-completion turns and general turn boundaries using the shared
invoke_compaction()helper fromloop_helpers.py. Error isolation is handled by the helper (logs but doesn't propagate non-fatal exceptions). This mirrors theReactLoopimplementation and addresses the previous review comment about extracting shared logic.src/synthorg/engine/checkpoint/resume.py (3)
72-84: Compression-aware reconciliation message is well-constructed.When resuming from a checkpoint with compression metadata, the reconciliation message now informs the agent about archived turns. The conditional construction with an empty string fallback keeps the message clean when no compaction occurred.
124-138: Compaction callback correctly propagated to new loop instances.Both
ReactLoopandPlanExecuteLoopinstantiations now includecompaction_callback=loop.compaction_callback, ensuring the compaction behavior is preserved when injecting checkpoint callbacks.
172-173: PEP 758 except syntax used for critical exceptions.The
except MemoryError, RecursionError:syntax follows PEP 758 (Python 3.14) as specified in coding guidelines. These critical errors are correctly re-raised while other exceptions are logged as warnings.Also applies to: 189-190
src/synthorg/engine/token_estimation.py (1)
86-89: No issues found. The message role constraints (ChatMessagevalidation insrc/synthorg/providers/models.py) guarantee thattool_resultis exclusive to TOOL messages and cannot coexist with meaningfulmsg.content. The logic at lines 87–88 correctly prioritizestool_result.contentfor TOOL messages and falls back tomsg.contentfor other message types, preventing any token underestimation.
- Add context_fill_tokens, context_capacity_tokens, compression_metadata fields and context_fill_percent computed property to AgentContext - Add with_context_fill() and with_compression() evolution methods - Extend PromptTokenEstimator protocol with estimate_conversation_tokens() - Create ContextBudgetIndicator model with format() for soft budget display - Add _SECTION_CONTEXT_BUDGET to system prompt template and builder - Create compaction subpackage: CompactionCallback type alias, CompactionConfig, CompactionResult, CompressionMetadata models - Implement OldestTurnsSummarizer via make_compaction_callback() factory - Wire compaction_callback into ReactLoop and PlanExecuteLoop constructors with turn-boundary invocation and error isolation - Update checkpoint/resume.py: pass through compaction_callback in make_loop_with_callback(), compression-aware reconciliation messages - Add observability events for context budget operations - 35 new tests: unit, property-based (Hypothesis), compaction models and summarizer Closes #416
- Add gt=0 constraint to ContextBudgetIndicator.capacity_tokens (ZeroDivisionError) - Document post-compaction fill estimate approximation in summarizer - Fix stale fill_percent in update_context_fill log entry - Rename min_turns_to_compact to min_messages_to_compact (semantic accuracy) - Extract _invoke_compaction to shared invoke_compaction() in loop_helpers - Extract PromptTokenEstimator/DefaultTokenEstimator to token_estimation.py - Remove unused CompactionResult model and CONTEXT_BUDGET_THRESHOLD_REACHED event - Auto-derive archived_blocks from compression_metadata in make_context_indicator - Add MemoryError/RecursionError guard to cleanup_checkpoint_artifacts - Log fallback path in _build_summary for non-assistant messages - Update CLAUDE.md: package structure + 6 context_budget event constants - Add context budget management section to docs/design/engine.md - Update docs/roadmap/open-questions.md (partially resolved) - Add 8 missing tests: AgentContext evolution methods, fill percent, from_identity capacity, snapshot fields, compression indicator
…mini - Validate fill_tokens >= 0 in with_context_fill/with_compression - Include tool_calls in token estimation for ASSISTANT messages - Fix archived_turns to count turns (msg_count // 2) not messages - Guard system message role in compaction (don't assume index 0) - Simplify reconciliation content (reduce duplication per Gemini) - Fix COMPACTION_SKIPPED event used on non-skipped path - Validate non-negative system_prompt_tokens/tool_definitions_count - Fix compaction skipped on step-completion turns in PlanExecuteLoop - Move test_capacity_tokens_zero_rejected to correct test class - Import _TOOL_DEFINITION_TOKEN_OVERHEAD constant in tests
86511ea to
eb6cda4
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 190-196: Replace the use of CONTEXT_BUDGET_COMPACTION_COMPLETED in
the fallback branch where snippets is empty with a distinct event constant (e.g.
CONTEXT_BUDGET_COMPACTION_FALLBACK) so the logger.debug call (currently passing
reason="no_assistant_content_for_summary" and archived_count=len(messages))
emits a dedicated observable event; add the new constant (or import it if
defined elsewhere) and update the logger.debug invocation in the snippets-empty
branch to use CONTEXT_BUDGET_COMPACTION_FALLBACK while keeping the same
structured fields and the existing return value.
In `@src/synthorg/engine/prompt_template.py`:
- Around line 186-191: The DEFAULT_TEMPLATE/system prompt rendering was modified
(the context_budget block added) but PROMPT_TEMPLATE_VERSION was not updated;
update the PROMPT_TEMPLATE_VERSION constant in prompt_template.py to a new
semantically-incremented version string (e.g., "1.4.1" or following your
project's versioning policy) so that any metadata/caching/snapshot flows keyed
on PROMPT_TEMPLATE_VERSION reflect the template change; ensure the version
string at the top (PROMPT_TEMPLATE_VERSION) is incremented and run any tests
that validate template-version consistency.
In `@tests/unit/engine/compaction/test_summarizer.py`:
- Around line 39-41: Add a module-level pytestmark to enforce the 30s timeout by
declaring pytestmark = [pytest.mark.timeout(30)] after the imports (and before
helper functions), and remove the `@pytest.mark.unit` decorator from the
TestMakeCompactionCallback class so the class remains a unit test but uses the
module-level timeout; ensure the symbol TestMakeCompactionCallback keeps its
docstring and test methods unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1b3a1877-df4a-462e-acb4-04dc7b885e50
📒 Files selected for processing (25)
CLAUDE.mddocs/design/engine.mddocs/roadmap/open-questions.mdsrc/synthorg/engine/__init__.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/compaction/protocol.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/context.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/prompt_template.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/token_estimation.pysrc/synthorg/observability/events/context_budget.pytests/unit/engine/compaction/__init__.pytests/unit/engine/compaction/test_models.pytests/unit/engine/compaction/test_summarizer.pytests/unit/engine/test_context_budget.pytests/unit/engine/test_context_budget_properties.pytests/unit/engine/test_prompt.pytests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff
Files:
tests/unit/engine/test_context_budget.pytests/unit/engine/test_prompt.pysrc/synthorg/engine/prompt_template.pytests/unit/engine/compaction/test_models.pysrc/synthorg/engine/compaction/protocol.pysrc/synthorg/engine/compaction/models.pytests/unit/engine/test_context_budget_properties.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/token_estimation.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/context_budget.pytests/unit/engine/compaction/test_summarizer.pysrc/synthorg/engine/__init__.pytests/unit/observability/test_events.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/context.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Maintain 80% minimum code coverage — enforced in CI
Test timeout: 30 seconds per test
Prefer@pytest.mark.parametrizefor testing similar cases
Tests must usetest-provider,test-small-001, and similar generic vendor-agnostic names instead of real vendor names
Use Python Hypothesis for property-based testing:@given+@settingswith HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic.
Files:
tests/unit/engine/test_context_budget.pytests/unit/engine/test_prompt.pytests/unit/engine/compaction/test_models.pytests/unit/engine/test_context_budget_properties.pytests/unit/engine/compaction/test_summarizer.pytests/unit/observability/test_events.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections usecopy.deepcopy()at construction andMappingProxyTypewrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models withmodel_copy(update=...)for runtime state
Use@computed_fieldfor derived values instead of storing and validating redundant fields in Pydantic models
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger:from synthorg.observability import get_loggerthen assignlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code — only use the centralized logger fromsynthorg.observability
Logger variable must always be namedlogger(not_logger, notlog)
Event names must always use constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: alwayslogger.info(EVENT, key=value)— neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...
Files:
src/synthorg/engine/prompt_template.pysrc/synthorg/engine/compaction/protocol.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/token_estimation.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/context.py
src/synthorg/engine/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown
Files:
src/synthorg/engine/prompt_template.pysrc/synthorg/engine/compaction/protocol.pysrc/synthorg/engine/compaction/models.pysrc/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/token_estimation.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/context_budget.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/compaction/__init__.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/prompt.pysrc/synthorg/engine/context.py
src/synthorg/engine/checkpoint/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Checkpoint recovery strategy requires per-turn persistence and heartbeat detection with CheckpointRecoveryStrategy config
Files:
src/synthorg/engine/checkpoint/resume.py
🧠 Learnings (31)
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/budget/**/*.py : Cost tracking system must track budget, enforce pre-flight/in-flight checks, auto-downgrade, track billing periods, cost tiers, quotas, and detect anomalies
Applied to files:
tests/unit/engine/test_context_budget.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/context_budget.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.
Applied to files:
tests/unit/engine/test_context_budget.pytests/unit/engine/compaction/test_models.pytests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test
Applied to files:
tests/unit/engine/test_context_budget.pytests/unit/engine/compaction/test_models.pytests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).
Applied to files:
tests/unit/engine/test_context_budget.pytests/unit/engine/compaction/test_models.pytests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic.
Applied to files:
tests/unit/engine/test_context_budget.pytests/unit/engine/compaction/test_models.pytests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to tests/**/*.py : Mark tests with `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow`
Applied to files:
tests/unit/engine/compaction/test_models.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with `model_copy(update=...)` for runtime state
Applied to files:
src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Pydantic models must use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`)
Applied to files:
src/synthorg/engine/compaction/models.pysrc/synthorg/engine/context.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Use `computed_field` for derived values instead of storing and validating redundant fields in Pydantic models
Applied to files:
src/synthorg/engine/compaction/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Applied to files:
tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to tests/**/*.py : Use Python Hypothesis for property-based testing: `given` + `settings` with HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)
Applied to files:
tests/unit/engine/test_context_budget_properties.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/context_budget.pysrc/synthorg/engine/loop_helpers.pytests/unit/observability/test_events.pyCLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Event names must always use constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/observability/events/context_budget.pysrc/synthorg/engine/loop_helpers.pyCLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/engine/**/*.py : Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown
Applied to files:
src/synthorg/engine/__init__.pyCLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/tools/**/*.py : Tools must support role-based access control and require approval via `request_human_approval` tool
Applied to files:
src/synthorg/engine/react_loop.pysrc/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/engine/checkpoint/**/*.py : Checkpoint recovery strategy requires per-turn persistence and heartbeat detection with CheckpointRecoveryStrategy config
Applied to files:
src/synthorg/engine/checkpoint/resume.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine includes hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration sampling)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Engine coordination supports 4 dispatchers (SAS/centralized/decentralized/context-dependent) with wave execution and workspace lifecycle integration
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/communication/meeting/**/*.py : Meeting protocol uses round-robin, position papers, structured phases with scheduler (frequency, participant resolver) and orchestrator
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST import logger: `from synthorg.observability import get_logger` then assign `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Observability system requires structured logging, correlation tracking, and log sinks — all business logic must use `get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Never use `import logging` / `logging.getLogger()` / `print()` in application code — only use the centralized logger from `synthorg.observability`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Logger variable must always be named `logger` (not `_logger`, not `log`)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Use structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/engine/context.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/engine/context.py
🧬 Code graph analysis (15)
tests/unit/engine/test_context_budget.py (4)
src/synthorg/engine/compaction/models.py (1)
CompressionMetadata(42-73)src/synthorg/engine/context.py (6)
AgentContext(104-412)from_identity(190-225)context_fill_percent(180-187)with_context_fill(289-306)with_compression(308-336)to_snapshot(383-407)src/synthorg/engine/context_budget.py (6)
ContextBudgetIndicator(32-84)estimate_context_fill(121-157)make_context_indicator(87-118)update_context_fill(160-196)format(66-84)fill_percent(60-64)src/synthorg/engine/token_estimation.py (3)
DefaultTokenEstimator(48-101)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-101)
tests/unit/engine/test_prompt.py (1)
src/synthorg/engine/token_estimation.py (1)
DefaultTokenEstimator(48-101)
tests/unit/engine/compaction/test_models.py (1)
src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(42-73)
src/synthorg/engine/compaction/protocol.py (1)
src/synthorg/engine/context.py (1)
AgentContext(104-412)
tests/unit/engine/test_context_budget_properties.py (2)
src/synthorg/engine/context_budget.py (3)
ContextBudgetIndicator(32-84)format(66-84)fill_percent(60-64)src/synthorg/engine/token_estimation.py (3)
DefaultTokenEstimator(48-101)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-101)
src/synthorg/engine/compaction/summarizer.py (3)
src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(42-73)src/synthorg/engine/token_estimation.py (6)
DefaultTokenEstimator(48-101)PromptTokenEstimator(15-45)estimate_tokens(22-31)estimate_tokens(58-67)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-101)src/synthorg/engine/context.py (3)
AgentContext(104-412)context_fill_percent(180-187)with_compression(308-336)
src/synthorg/engine/loop_helpers.py (2)
src/synthorg/engine/plan_execute_loop.py (1)
compaction_callback(136-138)src/synthorg/engine/react_loop.py (1)
compaction_callback(109-111)
src/synthorg/engine/context_budget.py (2)
src/synthorg/engine/token_estimation.py (4)
DefaultTokenEstimator(48-101)PromptTokenEstimator(15-45)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-101)src/synthorg/engine/context.py (2)
AgentContext(104-412)with_context_fill(289-306)
tests/unit/engine/compaction/test_summarizer.py (3)
src/synthorg/engine/compaction/models.py (1)
CompactionConfig(10-39)src/synthorg/engine/compaction/summarizer.py (1)
make_compaction_callback(37-61)src/synthorg/engine/context.py (2)
AgentContext(104-412)from_identity(190-225)
src/synthorg/engine/__init__.py (1)
src/synthorg/engine/token_estimation.py (2)
DefaultTokenEstimator(48-101)PromptTokenEstimator(15-45)
src/synthorg/engine/react_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
invoke_compaction(654-689)src/synthorg/engine/plan_execute_loop.py (3)
compaction_callback(136-138)approval_gate(126-128)stagnation_detector(131-133)
src/synthorg/engine/compaction/__init__.py (1)
src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(42-73)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/loop_helpers.py (1)
invoke_compaction(654-689)src/synthorg/engine/react_loop.py (1)
compaction_callback(109-111)
src/synthorg/engine/checkpoint/resume.py (2)
src/synthorg/engine/plan_execute_loop.py (5)
compaction_callback(136-138)PlanExecuteLoop(84-930)config(121-123)approval_gate(126-128)stagnation_detector(131-133)src/synthorg/engine/react_loop.py (3)
compaction_callback(109-111)approval_gate(99-101)stagnation_detector(104-106)
src/synthorg/engine/context.py (1)
src/synthorg/engine/compaction/models.py (1)
CompressionMetadata(42-73)
🪛 LanguageTool
CLAUDE.md
[style] ~199-~199: Since ownership is already implied, this phrasing may be redundant.
Context: ...vents.context_budget). Each domain has its own module — see src/synthorg/observabilit...
(PRP_OWN)
🔇 Additional comments (41)
docs/roadmap/open-questions.md (1)
12-12: Clear and accurate partial-resolution update.The wording at Line 12 and Line 24 cleanly reflects what is implemented now vs what is still pending, which keeps roadmap risk tracking actionable.
Also applies to: 24-24
src/synthorg/engine/compaction/protocol.py (1)
14-24: Compaction callback contract is well defined.The alias and attached docstring clearly define the async callback behavior and return semantics for loop integration.
src/synthorg/engine/compaction/models.py (1)
10-39: Model design and validation look solid.Both models are frozen, typed, and constrained appropriately for config/metadata boundaries.
As per coding guidelines, "Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with
model_copy(update=...)for runtime state" and "Pydantic models must use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict)."Also applies to: 42-73
src/synthorg/engine/compaction/__init__.py (1)
8-18: Good explicit compaction package API surface.The re-exports and
__all__keep the public interface discoverable and stable.CLAUDE.md (1)
125-125: Documentation updates are useful and aligned with the new feature set.The engine scope summary and observability event guidance now reflect context-budget compaction work clearly.
Also applies to: 199-199
src/synthorg/observability/events/context_budget.py (1)
6-15: Event constants are consistent and well-scoped.The new context-budget lifecycle constants follow the established domain event naming pattern.
Based on learnings, "Event names must always use constants from domain-specific modules under
synthorg.observability.events... Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT."tests/unit/observability/test_events.py (1)
229-229: Good test expectation update for event-module discovery.Adding
context_budgetin the expected set keeps discovery assertions aligned with the new observability domain.tests/unit/engine/compaction/test_models.py (1)
12-74: LGTM — tests correctly validate model constraints and immutability.The test coverage for
CompactionConfigandCompressionMetadatais comprehensive, covering defaults, boundary conditions, and frozen behavior. The assertions align with the model definitions insrc/synthorg/engine/compaction/models.py.src/synthorg/engine/__init__.py (1)
186-189: LGTM — import source relocation maintains public API compatibility.The
DefaultTokenEstimatorandPromptTokenEstimatorimports are correctly relocated fromsynthorg.engine.prompttosynthorg.engine.token_estimation. The__all__exports remain unchanged, preserving backward compatibility for consumers importing fromsynthorg.engine.tests/unit/engine/test_prompt.py (2)
31-31: LGTM — import updated to match new module location.The import correctly references
synthorg.engine.token_estimationwhereDefaultTokenEstimatornow lives.
475-479: LGTM — test double updated to satisfy extended protocol.The
CountingEstimatornow implementsestimate_conversation_tokensto satisfy thePromptTokenEstimatorprotocol. Returning0is appropriate since these tests focus onestimate_tokenscall counting.src/synthorg/engine/checkpoint/resume.py (3)
72-84: LGTM — compression-aware reconciliation message.The reconciliation content now informs the agent about prior compaction when resuming from a compressed checkpoint. The conditional note construction is clean and provides useful context for the agent to understand the conversation state.
129-129: LGTM — compaction_callback correctly propagated through loop reconstruction.Both
ReactLoopandPlanExecuteLoopnow receive thecompaction_callbackfrom the original loop instance when rebuilding with a checkpoint callback. This maintains compaction behavior across checkpoint/resume cycles.Also applies to: 137-137
172-173: LGTM — critical exceptions re-raised in cleanup paths.
MemoryErrorandRecursionErrorare correctly re-raised before the generic exception handler, following the established pattern inreact_loop.pyandloop_helpers.py. This ensures non-recoverable errors propagate rather than being silently logged.Also applies to: 189-190
tests/unit/engine/compaction/test_summarizer.py (1)
43-232: LGTM — comprehensive compaction callback tests.The test suite thoroughly covers the compaction callback behavior:
- Threshold-based triggering (below/above)
- Edge cases (too few messages, nothing to archive, unknown capacity)
- Multiple compactions with counter increment
- Proper verification of system message preservation and summary injection
src/synthorg/engine/react_loop.py (3)
80-96: LGTM — compaction_callback follows established hook patterns.The
compaction_callbackparameter is integrated consistently with existing hooks (checkpoint_callback,stagnation_detector,approval_gate), including proper docstring documentation and private storage.
108-111: LGTM — property accessor matches other hook accessors.The
compaction_callbackproperty follows the same pattern asapproval_gateandstagnation_detectorproperties.
206-213: LGTM — compaction invoked at turn boundaries with error isolation.Compaction is correctly invoked after stagnation detection at turn boundaries, using the shared
invoke_compaction()helper which handles error isolation. The context is updated only when compaction returns a non-None result.docs/design/engine.md (1)
587-663: LGTM — comprehensive design documentation for context budget management.The new section thoroughly documents:
- Context fill tracking fields and estimation
- Soft budget indicator format and injection
- Compaction hook configuration and behavior
- Compressed checkpoint recovery messaging
- Loop integration points
The documentation aligns well with the implementation across
context_budget.py,compaction/modules, and execution loops.tests/unit/engine/test_context_budget.py (1)
23-304: LGTM — comprehensive context budget test coverage.The test suite thoroughly covers:
ContextBudgetIndicatorformatting, computed properties, and validationestimate_context_fillwith various input combinationsmake_context_indicatorfactory behaviorupdate_context_fillhelperDefaultTokenEstimator.estimate_conversation_tokensedge casesAgentContextbudget field integration including compression metadataThe import of
_TOOL_DEFINITION_TOKEN_OVERHEADat line 10 and its use at line 118 addresses the previous review concern about hardcoded values. Thetest_capacity_tokens_zero_rejectedtest is correctly placed inTestContextBudgetIndicator.tests/unit/engine/test_context_budget_properties.py (3)
13-14: Addpytest.mark.timeout(30)for test timeout compliance.Per coding guidelines, tests should have a 30-second timeout. This was flagged in a previous review but appears unaddressed.
💡 Suggested fix
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)] + + -@pytest.mark.unit class TestContextBudgetIndicatorProperties:
17-66: LGTM!Well-designed property-based tests covering edge cases for
ContextBudgetIndicator. The Hypothesis strategies correctly match the model's field constraints.
69-88: LGTM!The property test correctly verifies non-negative token estimation. Consider adding a follow-up test covering
ASSISTANTmessages withtool_callsto exercise that estimation path.src/synthorg/engine/loop_helpers.py (2)
16-18: LGTM!Event constant properly imported from the domain-specific observability module.
654-689: LGTM!The
invoke_compactionhelper follows the established patterns in this module (matchingcheck_shutdown,check_budget, etc.) with proper error isolation, logging, and PEP 758 exception syntax.src/synthorg/engine/plan_execute_loop.py (3)
48-48: LGTM!Correctly imports the shared
invoke_compactionhelper, addressing the previous review comment about extracting duplicated compaction logic.
100-118: LGTM!The
compaction_callbackparameter and property follow the established patterns for other callbacks in this class (checkpoint_callback,stagnation_detector).
686-706: Verify compaction isn't invoked twice for the same context state.When
resultis a tuple (step completion), compaction runs at lines 689-695, then returns immediately at line 696. WhenresultisAgentContext(tool calls completed, loop continues), compaction runs at lines 700-706. This looks correct — only one path is taken per loop iteration.src/synthorg/engine/token_estimation.py (2)
14-45: LGTM!Well-designed protocol with
runtime_checkabledecorator enabling isinstance checks. Clear docstrings following Google style.
48-101: LGTM!The implementation correctly handles all token-bearing fields including tool call IDs (addressing past review feedback). The per-message overhead applied to both messages and tool calls accounts for structural tokens.
src/synthorg/engine/compaction/summarizer.py (2)
37-61: LGTM!Well-structured factory function that creates a compaction callback closure. The default to
DefaultTokenEstimator()is appropriate.
137-143:archived_turn_countassumes perfectly paired user+assistant messages.
len(archivable) // 2underestimates archived turns when the conversation contains multiple tool result messages or sequential assistant messages. This was noted in a prior review.♻️ Count actual assistant messages for accuracy
- archived_turn_count = len(archivable) // 2 + archived_turn_count = sum( + 1 for m in archivable if m.role == MessageRole.ASSISTANT + )src/synthorg/engine/context_budget.py (3)
32-84: LGTM!Well-designed frozen Pydantic model with proper field constraints (
ge=0,gt=0), computed property, and human-readable formatting. Theformat()method handles both known and unknown capacity cases.
121-157: LGTM!Input validation at lines 148-153 addresses the previous review feedback. The function correctly validates system boundaries per coding guidelines.
160-196: LGTM!The function now correctly logs the newly computed fill percentage (lines 187-194), addressing the previous review feedback.
src/synthorg/engine/context.py (3)
85-93: LGTM!New snapshot fields properly typed with defaults and constraints matching the source
AgentContextfields.
161-187: LGTM!The new context budget fields are well-designed with appropriate constraints. The
computed_fieldforcontext_fill_percentcorrectly handles theNonecapacity case, and division by zero is prevented by thegt=0constraint oncontext_capacity_tokens.
308-336: LGTM!The
with_compressionmethod properly validatesfill_tokens. TheCompressionMetadatamodel already enforcesge=0constraints oncompression_pointandarchived_turns, so additional validation here would be redundant.src/synthorg/engine/prompt.py (3)
36-39: LGTM!Token estimation imports correctly moved from local definitions to the dedicated
token_estimationmodule.
138-139: LGTM!The
context_budget_indicatorparameter is properly threaded through the rendering pipeline, and the new_SECTION_CONTEXT_BUDGETis correctly added to sections when the indicator is provided.Also applies to: 484-486
389-409: LGTM!The
context_budgetis correctly added to the template context, making it available for rendering in both default and custom templates.
| {% if context_budget %} | ||
|
|
||
| ## Context Budget | ||
|
|
||
| {{ context_budget }} | ||
| {% endif %} |
There was a problem hiding this comment.
Bump prompt template version when changing DEFAULT_TEMPLATE.
Line 186-191 changes the rendered system prompt shape, but Line 20 still reports PROMPT_TEMPLATE_VERSION = "1.4.0". This can cause version/behavior drift in any metadata, caching, or snapshot flows keyed by template version.
Suggested fix
-PROMPT_TEMPLATE_VERSION: Final[str] = "1.4.0"
+PROMPT_TEMPLATE_VERSION: Final[str] = "1.5.0"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/engine/prompt_template.py` around lines 186 - 191, The
DEFAULT_TEMPLATE/system prompt rendering was modified (the context_budget block
added) but PROMPT_TEMPLATE_VERSION was not updated; update the
PROMPT_TEMPLATE_VERSION constant in prompt_template.py to a new
semantically-incremented version string (e.g., "1.4.1" or following your
project's versioning policy) so that any metadata/caching/snapshot flows keyed
on PROMPT_TEMPLATE_VERSION reflect the template change; ensure the version
string at the top (PROMPT_TEMPLATE_VERSION) is incremented and run any tests
that validate template-version consistency.
| @pytest.mark.unit | ||
| class TestMakeCompactionCallback: | ||
| """make_compaction_callback factory and compaction logic.""" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add pytest.mark.timeout(30) for test timeout compliance.
Per coding guidelines, tests should have a 30-second timeout. Consider adding a module-level pytestmark declaration.
💡 Suggested fix
Add at the module level (after imports, before helper functions):
+pytestmark = [pytest.mark.unit, pytest.mark.timeout(30)]
+
+
def _msg(role: MessageRole, content: str) -> ChatMessage:Then remove the @pytest.mark.unit decorator from the class.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/compaction/test_summarizer.py` around lines 39 - 41, Add a
module-level pytestmark to enforce the 30s timeout by declaring pytestmark =
[pytest.mark.timeout(30)] after the imports (and before helper functions), and
remove the `@pytest.mark.unit` decorator from the TestMakeCompactionCallback class
so the class remains a unit test but uses the module-level timeout; ensure the
symbol TestMakeCompactionCallback keeps its docstring and test methods
unchanged.
- Add distinct CONTEXT_BUDGET_COMPACTION_FALLBACK event for no-assistant summary fallback path (per CodeRabbit) - Reset PROMPT_TEMPLATE_VERSION to "1.0.0" — no users, no migrations, no caching depends on it yet. Will bump on first production deploy.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 65-169: The function _do_compaction is too long; extract the
skip-check and metadata construction into helpers to reduce its size: implement
_should_skip_compaction(ctx, config) that returns a skip-reason string or None
(handling fill_pct threshold and min_messages_to_compact checks and logging
decisions), and _build_compression_metadata(ctx, archived_turn_count,
summary_tokens) that returns a CompressionMetadata object (accumulating
prior.compactions_performed and prior.archived_turns). Then refactor
_do_compaction to call _should_skip_compaction and _build_compression_metadata,
keep summary creation via _build_summary and token estimation with estimator
unchanged, and ensure _do_compaction only coordinates these steps so its body
falls under the 50-line limit.
In `@tests/unit/engine/test_prompt.py`:
- Around line 475-480: The test double method estimate_conversation_tokens
currently accepts messages: object which is too permissive; change its parameter
type to match the production estimator signature (e.g., use the same concrete
type/import used in production such as List[Message], Sequence[PromptMessage],
or whatever type the real estimator uses) and update any necessary test imports
so the test will fail if the estimator interface drifts; keep the method name
estimate_conversation_tokens and only change the parameter type annotation to
the exact production type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b59709a4-64a8-4645-af5f-b34aa4b5483e
📒 Files selected for processing (5)
src/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/prompt_template.pysrc/synthorg/observability/events/context_budget.pytests/unit/engine/test_prompt.pytests/unit/memory/org/test_prompt_integration.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Build Backend
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff
Files:
src/synthorg/engine/compaction/summarizer.pytests/unit/memory/org/test_prompt_integration.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/prompt_template.pytests/unit/engine/test_prompt.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections usecopy.deepcopy()at construction andMappingProxyTypewrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models withmodel_copy(update=...)for runtime state
Use@computed_fieldfor derived values instead of storing and validating redundant fields in Pydantic models
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger:from synthorg.observability import get_loggerthen assignlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code — only use the centralized logger fromsynthorg.observability
Logger variable must always be namedlogger(not_logger, notlog)
Event names must always use constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: alwayslogger.info(EVENT, key=value)— neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...
Files:
src/synthorg/engine/compaction/summarizer.pysrc/synthorg/observability/events/context_budget.pysrc/synthorg/engine/prompt_template.py
src/synthorg/engine/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown
Files:
src/synthorg/engine/compaction/summarizer.pysrc/synthorg/engine/prompt_template.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Maintain 80% minimum code coverage — enforced in CI
Test timeout: 30 seconds per test
Prefer@pytest.mark.parametrizefor testing similar cases
Tests must usetest-provider,test-small-001, and similar generic vendor-agnostic names instead of real vendor names
Use Python Hypothesis for property-based testing:@given+@settingswith HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic.
Files:
tests/unit/memory/org/test_prompt_integration.pytests/unit/engine/test_prompt.py
🧠 Learnings (4)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/context_budget.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/budget/**/*.py : Cost tracking system must track budget, enforce pre-flight/in-flight checks, auto-downgrade, track billing periods, cost tiers, quotas, and detect anomalies
Applied to files:
src/synthorg/observability/events/context_budget.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Event names must always use constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/observability/events/context_budget.py
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/security/**/*.py : Progressive trust strategy supports 4 modes (disabled/weighted/per-category/milestone) with autonomy level presets and change strategy
Applied to files:
src/synthorg/engine/prompt_template.py
🧬 Code graph analysis (3)
src/synthorg/engine/compaction/summarizer.py (3)
src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(42-73)src/synthorg/engine/token_estimation.py (3)
DefaultTokenEstimator(48-101)estimate_tokens(22-31)estimate_tokens(58-67)src/synthorg/engine/context.py (3)
AgentContext(104-412)context_fill_percent(180-187)with_compression(308-336)
tests/unit/memory/org/test_prompt_integration.py (1)
tests/unit/engine/test_prompt.py (1)
test_template_version_frozen(547-549)
tests/unit/engine/test_prompt.py (2)
src/synthorg/engine/token_estimation.py (1)
DefaultTokenEstimator(48-101)tests/unit/memory/org/test_prompt_integration.py (1)
test_template_version_frozen(51-52)
🔇 Additional comments (11)
src/synthorg/engine/prompt_template.py (2)
20-23: Version freeze rationale is clear and consistent.The inline note and constant update make the versioning intent explicit and maintainable.
189-194: Conditional Context Budget rendering looks correct.Good use of a guard to include the section only when context budget data is present.
tests/unit/memory/org/test_prompt_integration.py (1)
51-52: Version-freeze assertion update is aligned with production code.This keeps integration coverage in sync with
PROMPT_TEMPLATE_VERSION.tests/unit/engine/test_prompt.py (2)
31-31: Import relocation is correctly reflected in tests.Good alignment with the estimator module split.
547-549: Prompt version test rename and expectation update are good.This makes the test intent explicit and synchronized with the current versioning policy.
src/synthorg/observability/events/context_budget.py (1)
1-18: Well-structured event constants module.The constants follow the established
domain.category.actionnaming pattern and correctly useFinal[str]typing. The logical grouping (fill tracking, compaction lifecycle, indicator injection) aids discoverability.src/synthorg/engine/compaction/summarizer.py (5)
1-32: Imports and logger setup are well-organized.Proper use of
TYPE_CHECKINGblock for type-only imports, correct logger instantiation viaget_logger(__name__), and event constants imported from the domain-specific module.
38-62: Clean factory pattern implementation.The factory correctly creates a closure capturing config and estimator, with sensible defaulting to
DefaultTokenEstimator. The async wrapper appropriately matches theCompactionCallbackprotocol.
138-144:archived_turn_countstill assumes perfectly paired messages.This was flagged in a previous review: the
len(archivable) // 2calculation underestimates turn count when tool result messages or sequential assistant messages exist.♻️ Previously suggested fix
- archived_turn_count = len(archivable) // 2 + archived_turn_count = sum( + 1 for m in archivable if m.role == MessageRole.ASSISTANT + )
172-203: Summary building logic is well-implemented.Good use of the dedicated
CONTEXT_BUDGET_COMPACTION_FALLBACKevent for the no-assistant-content edge case. The character capping and snippet extraction are sensible defaults.
151-153: No changes needed —estimate_conversation_tokensis properly defined and tested.The method exists on the
PromptTokenEstimatorprotocol (line 33 oftoken_estimation.py) and is fully implemented inDefaultTokenEstimator(line 69). It is already in use elsewhere in the codebase and has comprehensive test coverage.> Likely an incorrect or invalid review comment.
| def _do_compaction( | ||
| ctx: AgentContext, | ||
| config: CompactionConfig, | ||
| estimator: PromptTokenEstimator, | ||
| ) -> AgentContext | None: | ||
| """Core compaction logic. | ||
|
|
||
| Args: | ||
| ctx: Current agent context. | ||
| config: Compaction configuration. | ||
| estimator: Token estimator. | ||
|
|
||
| Returns: | ||
| New compacted ``AgentContext`` or ``None`` if no compaction needed. | ||
| """ | ||
| fill_pct = ctx.context_fill_percent | ||
| if fill_pct is None or fill_pct < config.fill_threshold_percent: | ||
| return None | ||
|
|
||
| conversation = ctx.conversation | ||
| if len(conversation) < config.min_messages_to_compact: | ||
| logger.debug( | ||
| CONTEXT_BUDGET_COMPACTION_SKIPPED, | ||
| execution_id=ctx.execution_id, | ||
| reason="too_few_messages", | ||
| message_count=len(conversation), | ||
| min_required=config.min_messages_to_compact, | ||
| ) | ||
| return None | ||
|
|
||
| logger.info( | ||
| CONTEXT_BUDGET_COMPACTION_STARTED, | ||
| execution_id=ctx.execution_id, | ||
| fill_percent=fill_pct, | ||
| message_count=len(conversation), | ||
| ) | ||
|
|
||
| # Preserve a leading SYSTEM message only when present. | ||
| # preserve_recent_turns * 2 for user+assistant pairs. | ||
| preserve_count = config.preserve_recent_turns * 2 | ||
| head: tuple[ChatMessage, ...] = () | ||
| start_idx = 0 | ||
| if conversation and conversation[0].role == MessageRole.SYSTEM: | ||
| head = (conversation[0],) | ||
| start_idx = 1 | ||
|
|
||
| if preserve_count >= len(conversation) - start_idx: | ||
| logger.debug( | ||
| CONTEXT_BUDGET_COMPACTION_SKIPPED, | ||
| execution_id=ctx.execution_id, | ||
| reason="nothing_to_archive", | ||
| preserve_count=preserve_count, | ||
| message_count=len(conversation), | ||
| ) | ||
| return None | ||
|
|
||
| archivable = conversation[start_idx:-preserve_count] | ||
| recent = conversation[-preserve_count:] | ||
|
|
||
| summary_text = _build_summary(archivable) | ||
| summary_msg = ChatMessage( | ||
| role=MessageRole.SYSTEM, | ||
| content=summary_text, | ||
| ) | ||
| summary_tokens = estimator.estimate_tokens(summary_text) | ||
|
|
||
| compressed_conversation = (*head, summary_msg, *recent) | ||
|
|
||
| # Build compression metadata. | ||
| prior = ctx.compression_metadata | ||
| compactions_count = prior.compactions_performed + 1 if prior is not None else 1 | ||
| prior_archived = prior.archived_turns if prior is not None else 0 | ||
|
|
||
| archived_turn_count = len(archivable) // 2 | ||
| metadata = CompressionMetadata( | ||
| compression_point=ctx.turn_count, | ||
| archived_turns=prior_archived + archived_turn_count, | ||
| summary_tokens=summary_tokens, | ||
| compactions_performed=compactions_count, | ||
| ) | ||
|
|
||
| # Re-estimate fill with compressed conversation. This counts | ||
| # conversation tokens only — system prompt and tool overhead are | ||
| # excluded because the compaction callback does not have access to | ||
| # those values. The execution loop's next call to | ||
| # ``update_context_fill`` will restore the full estimate. | ||
| new_fill = estimator.estimate_conversation_tokens( | ||
| compressed_conversation, | ||
| ) | ||
|
|
||
| logger.info( | ||
| CONTEXT_BUDGET_COMPACTION_COMPLETED, | ||
| execution_id=ctx.execution_id, | ||
| original_messages=len(conversation), | ||
| compacted_messages=len(compressed_conversation), | ||
| archived_turns=archived_turn_count, | ||
| summary_tokens=summary_tokens, | ||
| compactions_total=compactions_count, | ||
| ) | ||
|
|
||
| return ctx.with_compression( | ||
| metadata, | ||
| compressed_conversation, | ||
| new_fill, | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Function exceeds 50-line limit.
_do_compaction spans ~90 executable lines, exceeding the project's 50-line function limit. Consider extracting helpers for distinct responsibilities.
♻️ Suggested extraction points
def _should_skip_compaction(
ctx: AgentContext,
config: CompactionConfig,
) -> str | None:
"""Return skip reason or None if compaction should proceed."""
fill_pct = ctx.context_fill_percent
if fill_pct is None or fill_pct < config.fill_threshold_percent:
return "below_threshold"
if len(ctx.conversation) < config.min_messages_to_compact:
return "too_few_messages"
return None
def _build_compression_metadata(
ctx: AgentContext,
archived_turn_count: int,
summary_tokens: int,
) -> CompressionMetadata:
"""Construct compression metadata accumulating prior compactions."""
prior = ctx.compression_metadata
return CompressionMetadata(
compression_point=ctx.turn_count,
archived_turns=(prior.archived_turns if prior else 0) + archived_turn_count,
summary_tokens=summary_tokens,
compactions_performed=(prior.compactions_performed + 1) if prior else 1,
)Then _do_compaction becomes a coordinating function calling these helpers.
As per coding guidelines: "Functions must be less than 50 lines".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/engine/compaction/summarizer.py` around lines 65 - 169, The
function _do_compaction is too long; extract the skip-check and metadata
construction into helpers to reduce its size: implement
_should_skip_compaction(ctx, config) that returns a skip-reason string or None
(handling fill_pct threshold and min_messages_to_compact checks and logging
decisions), and _build_compression_metadata(ctx, archived_turn_count,
summary_tokens) that returns a CompressionMetadata object (accumulating
prior.compactions_performed and prior.archived_turns). Then refactor
_do_compaction to call _should_skip_compaction and _build_compression_metadata,
keep summary creation via _build_summary and token estimation with estimator
unchanged, and ensure _do_compaction only coordinates these steps so its body
falls under the 50-line limit.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/synthorg/engine/compaction/summarizer.py (1)
194-199: 🧹 Nitpick | 🔵 Trivial
archived_turnscalculation assumes paired messages.Line 196 uses
len(archivable) // 2which assumes archivable messages are perfectly paired user+assistant turns. If the conversation contains tool result messages, multiple assistant messages in sequence, or other patterns, this underestimates the actual archived turn count.Consider counting actual assistant messages for a more accurate turn count:
♻️ Proposed fix
+ archived_turn_count = sum( + 1 for m in archivable if m.role == MessageRole.ASSISTANT + ) + metadata = CompressionMetadata( compression_point=ctx.turn_count, - archived_turns=prior_archived + len(archivable) // 2, + archived_turns=prior_archived + archived_turn_count, summary_tokens=summary_tokens, compactions_performed=compactions_count, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/compaction/summarizer.py` around lines 194 - 199, The archived_turns field incorrectly assumes paired user+assistant messages by using len(archivable) // 2; instead compute the archived turn count by counting actual assistant turns in the archivable list to handle tool results or consecutive assistant messages. Update the code that constructs CompressionMetadata (where CompressionMetadata is instantiated and variables ctx.turn_count, prior_archived, archivable are in scope) to replace len(archivable) // 2 with a computed count of assistant messages (e.g., sum over archivable checking the message role or is_assistant property) and add that count to prior_archived when setting archived_turns.tests/unit/engine/test_prompt.py (1)
475-480:⚠️ Potential issue | 🟡 MinorUse the concrete conversation message type in the test double signature.
Line 477 uses
tuple[object, ...], which is too permissive and can hide interface drift versus the production estimator contract (tuple[ChatMessage, ...]).#!/bin/bash set -euo pipefail echo "Production signature:" rg -nP --type=py -C2 'def estimate_conversation_tokens\(|messages:\s*tuple\[ChatMessage,\s*\.\.\.\]' src/synthorg/engine/token_estimation.py echo echo "Test double signature:" rg -nP --type=py -C3 'class CountingEstimator|def estimate_conversation_tokens|messages:\s*tuple\[.*\]' tests/unit/engine/test_prompt.pyExpected result: production uses
tuple[ChatMessage, ...]; test double should match it exactly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/engine/test_prompt.py` around lines 475 - 480, The test double's method signature for estimate_conversation_tokens uses the permissive tuple[object, ...]; update the CountingEstimator (or whichever test stub defines estimate_conversation_tokens) to use the concrete conversation message type tuple[ChatMessage, ...] to mirror the production contract in token_estimation.py so the test will catch interface drift; import or reference the same ChatMessage type used by production and adjust the method signature accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 194-199: The archived_turns field incorrectly assumes paired
user+assistant messages by using len(archivable) // 2; instead compute the
archived turn count by counting actual assistant turns in the archivable list to
handle tool results or consecutive assistant messages. Update the code that
constructs CompressionMetadata (where CompressionMetadata is instantiated and
variables ctx.turn_count, prior_archived, archivable are in scope) to replace
len(archivable) // 2 with a computed count of assistant messages (e.g., sum over
archivable checking the message role or is_assistant property) and add that
count to prior_archived when setting archived_turns.
In `@tests/unit/engine/test_prompt.py`:
- Around line 475-480: The test double's method signature for
estimate_conversation_tokens uses the permissive tuple[object, ...]; update the
CountingEstimator (or whichever test stub defines estimate_conversation_tokens)
to use the concrete conversation message type tuple[ChatMessage, ...] to mirror
the production contract in token_estimation.py so the test will catch interface
drift; import or reference the same ChatMessage type used by production and
adjust the method signature accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 313fd83e-8897-4072-93da-b95836dd112b
📒 Files selected for processing (2)
src/synthorg/engine/compaction/summarizer.pytests/unit/engine/test_prompt.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff
Files:
src/synthorg/engine/compaction/summarizer.pytests/unit/engine/test_prompt.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections usecopy.deepcopy()at construction andMappingProxyTypewrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models withmodel_copy(update=...)for runtime state
Use@computed_fieldfor derived values instead of storing and validating redundant fields in Pydantic models
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger:from synthorg.observability import get_loggerthen assignlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code — only use the centralized logger fromsynthorg.observability
Logger variable must always be namedlogger(not_logger, notlog)
Event names must always use constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: alwayslogger.info(EVENT, key=value)— neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...
Files:
src/synthorg/engine/compaction/summarizer.py
src/synthorg/engine/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown
Files:
src/synthorg/engine/compaction/summarizer.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Maintain 80% minimum code coverage — enforced in CI
Test timeout: 30 seconds per test
Prefer@pytest.mark.parametrizefor testing similar cases
Tests must usetest-provider,test-small-001, and similar generic vendor-agnostic names instead of real vendor names
Use Python Hypothesis for property-based testing:@given+@settingswith HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic.
Files:
tests/unit/engine/test_prompt.py
🧠 Learnings (2)
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Functions must be less than 50 lines, files must be less than 800 lines
Applied to files:
src/synthorg/engine/compaction/summarizer.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Functions: < 50 lines, files < 800 lines.
Applied to files:
src/synthorg/engine/compaction/summarizer.py
🧬 Code graph analysis (1)
tests/unit/engine/test_prompt.py (2)
src/synthorg/engine/token_estimation.py (1)
DefaultTokenEstimator(48-101)tests/unit/memory/org/test_prompt_integration.py (1)
test_template_version_frozen(51-52)
🔇 Additional comments (6)
src/synthorg/engine/compaction/summarizer.py (4)
1-36: LGTM!Clean module setup: proper TYPE_CHECKING guard, centralized logger, and domain-specific event constants.
38-62: LGTM!Well-designed factory with proper type hints, keyword-only parameters, and sensible default for the estimator. The async wrapper around sync logic is appropriate given the CPU-bound operations.
65-131: LGTM!Well-structured orchestration function with proper early returns, state transition logging at INFO level, and immutable context updates via
with_compression(). The refactoring into_split_conversationand_compresshelpers keeps this under the 50-line limit.
203-234: LGTM!Clean summary construction with appropriate snippet extraction, length capping, and proper use of the distinct
CONTEXT_BUDGET_COMPACTION_FALLBACKevent for the no-content edge case.tests/unit/engine/test_prompt.py (2)
31-31: Import path update looks correct.This aligns the test with the estimator’s new module location.
547-549: Version-freeze test update is clear and consistent.The renamed test and explicit
1.0.0assertion are consistent with related prompt-version tests.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 204-235: The fallback log in _build_summary uses
CONTEXT_BUDGET_COMPACTION_FALLBACK without an execution_id, breaking event
correlation; change _build_summary to accept an execution_id parameter (e.g.,
execution_id: str | None), include execution_id in the logger.debug call
alongside reason and archived_count, and ensure the caller (_compress) passes
the current execution_id into _build_summary when it invokes it so all
compaction events include execution_id for observability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a7ee13b8-e2c0-4f6b-bbc4-e043f0355a49
📒 Files selected for processing (2)
src/synthorg/engine/compaction/summarizer.pytests/unit/engine/test_prompt.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: CLA Signature Check
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff
Files:
src/synthorg/engine/compaction/summarizer.pytests/unit/engine/test_prompt.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections usecopy.deepcopy()at construction andMappingProxyTypewrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models withmodel_copy(update=...)for runtime state
Use@computed_fieldfor derived values instead of storing and validating redundant fields in Pydantic models
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger:from synthorg.observability import get_loggerthen assignlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code — only use the centralized logger fromsynthorg.observability
Logger variable must always be namedlogger(not_logger, notlog)
Event names must always use constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: alwayslogger.info(EVENT, key=value)— neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...
Files:
src/synthorg/engine/compaction/summarizer.py
src/synthorg/engine/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown
Files:
src/synthorg/engine/compaction/summarizer.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Maintain 80% minimum code coverage — enforced in CI
Test timeout: 30 seconds per test
Prefer@pytest.mark.parametrizefor testing similar cases
Tests must usetest-provider,test-small-001, and similar generic vendor-agnostic names instead of real vendor names
Use Python Hypothesis for property-based testing:@given+@settingswith HYPOTHESIS_PROFILE env var (ci: 200 examples default, dev: 1000 examples)
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic.
Files:
tests/unit/engine/test_prompt.py
🧠 Learnings (2)
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Functions must be less than 50 lines, files must be less than 800 lines
Applied to files:
src/synthorg/engine/compaction/summarizer.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Functions: < 50 lines, files < 800 lines.
Applied to files:
src/synthorg/engine/compaction/summarizer.py
🧬 Code graph analysis (1)
tests/unit/engine/test_prompt.py (2)
src/synthorg/engine/token_estimation.py (1)
DefaultTokenEstimator(48-101)tests/unit/memory/org/test_prompt_integration.py (1)
test_template_version_frozen(51-52)
🔇 Additional comments (7)
tests/unit/engine/test_prompt.py (3)
31-37: Import alignment looks correct.The updated
DefaultTokenEstimatorimport path and addedChatMessageimport match the new token-estimation surface used by the tests.
476-480: Test double now correctly mirrors estimator contract.Using
messages: tuple[ChatMessage, ...]inestimate_conversation_tokensmakes this double interface-accurate and protects against signature drift.
548-550: Template version freeze test is clear and consistent.The renamed test plus explicit
PROMPT_TEMPLATE_VERSION == "1.0.0"assertion clearly enforces the frozen-version expectation.src/synthorg/engine/compaction/summarizer.py (4)
1-35: LGTM!Module setup follows all guidelines: proper logger initialization via
get_logger(__name__), TYPE_CHECKING guard for forward references, and event constants imported from the observability domain module.
38-62: LGTM!Clean factory pattern with keyword-only arguments, proper type annotations, and well-documented behavior. The async wrapper appropriately delegates to the synchronous core logic.
65-130: Function still exceeds 50-line guideline.While the function has been improved by extracting
_split_conversationand_compress, it spans 66 lines. Consider extracting the skip-check logic (lines 80-93) into a helper like_should_skip_compaction() -> str | Noneto bring it under the limit.As per coding guidelines: "Functions must be less than 50 lines".
[duplicate_comment, suggest_optional_refactor]♻️ Suggested extraction
def _should_skip_compaction( ctx: AgentContext, config: CompactionConfig, ) -> str | None: """Return skip reason or None if compaction should proceed.""" fill_pct = ctx.context_fill_percent if fill_pct is None or fill_pct < config.fill_threshold_percent: return "below_threshold" if len(ctx.conversation) < config.min_messages_to_compact: return "too_few_messages" return NoneThen simplify
_do_compaction:def _do_compaction(...) -> AgentContext | None: - fill_pct = ctx.context_fill_percent - if fill_pct is None or fill_pct < config.fill_threshold_percent: - return None - - conversation = ctx.conversation - if len(conversation) < config.min_messages_to_compact: - logger.debug( - CONTEXT_BUDGET_COMPACTION_SKIPPED, - execution_id=ctx.execution_id, - reason="too_few_messages", - message_count=len(conversation), - min_required=config.min_messages_to_compact, - ) - return None + skip_reason = _should_skip_compaction(ctx, config) + if skip_reason: + if skip_reason == "too_few_messages": + logger.debug( + CONTEXT_BUDGET_COMPACTION_SKIPPED, + execution_id=ctx.execution_id, + reason=skip_reason, + message_count=len(ctx.conversation), + min_required=config.min_messages_to_compact, + ) + return None + + conversation = ctx.conversation + fill_pct = ctx.context_fill_percent
133-201: LGTM!Well-structured helper functions with clear responsibilities. The
_split_conversationcorrectly handles the presence or absence of a leading SYSTEM message, and_compressaccurately counts archived assistant turns for metadata.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/synthorg/engine/compaction/summarizer.py (1)
65-130: 🛠️ Refactor suggestion | 🟠 MajorRefactor
_do_compactionto satisfy the 50-line function limit.This function currently exceeds the repository maximum and should be split into small helpers (e.g., skip checks, logging+metrics payload build, and finalize/apply compression).
As per coding guidelines: "Functions must be less than 50 lines, files must be less than 800 lines".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/compaction/summarizer.py` around lines 65 - 130, _refactor _do_compaction to be under 50 lines by extracting its logical sections into small helper functions: pull the early skip logic (fill threshold and min message check + the CONTEXT_BUDGET_COMPACTION_SKIPPED logging) into a helper like _should_skip_compaction(ctx, config) that returns None or proceeds; extract the start/complete logging and metrics payload construction into helpers (e.g., _log_compaction_started(ctx, fill_pct) and _log_compaction_completed(ctx, conversation, compressed, metadata, summary_tokens)); and extract the finalization step that re-estimates tokens and returns ctx.with_compression(metadata, compressed, new_fill) into a helper like _finalize_compaction(ctx, estimator, compressed, metadata, summary_tokens). Keep calls to _split_conversation and _compress in the main _do_compaction flow, and ensure all moved logic still uses the same symbols (_do_compaction, _split_conversation, _compress, estimator, ctx.with_compression) so behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 150-167: The logic only preserves the first leading SYSTEM message
(head = (conversation[0],)) causing earlier SYSTEM summaries to be included in
archivable and dropped; change the preservation to capture all leading SYSTEM
messages by advancing start_idx while conversation[start_idx].role ==
MessageRole.SYSTEM and set head = tuple(conversation[:start_idx]); then compute
archivable = conversation[start_idx:-preserve_count] and recent as before so
_build_summary() won’t discard previously created SYSTEM summaries.
---
Duplicate comments:
In `@src/synthorg/engine/compaction/summarizer.py`:
- Around line 65-130: _refactor _do_compaction to be under 50 lines by
extracting its logical sections into small helper functions: pull the early skip
logic (fill threshold and min message check + the
CONTEXT_BUDGET_COMPACTION_SKIPPED logging) into a helper like
_should_skip_compaction(ctx, config) that returns None or proceeds; extract the
start/complete logging and metrics payload construction into helpers (e.g.,
_log_compaction_started(ctx, fill_pct) and _log_compaction_completed(ctx,
conversation, compressed, metadata, summary_tokens)); and extract the
finalization step that re-estimates tokens and returns
ctx.with_compression(metadata, compressed, new_fill) into a helper like
_finalize_compaction(ctx, estimator, compressed, metadata, summary_tokens). Keep
calls to _split_conversation and _compress in the main _do_compaction flow, and
ensure all moved logic still uses the same symbols (_do_compaction,
_split_conversation, _compress, estimator, ctx.with_compression) so behavior is
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 287c2f19-c42f-493a-b984-b6c5b35fa46c
📒 Files selected for processing (1)
src/synthorg/engine/compaction/summarizer.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses for exception handling on Python 3.14 — ruff enforces PEP 758 except syntax
Type hints required on all public functions and classes — enforce with mypy strict mode
Docstrings must use Google style on all public classes and functions — enforced by ruff D rules
Line length must be 88 characters — enforced by ruff
Files:
src/synthorg/engine/compaction/summarizer.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections usecopy.deepcopy()at construction andMappingProxyTypewrapping.
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models withmodel_copy(update=...)for runtime state
Use@computed_fieldfor derived values instead of storing and validating redundant fields in Pydantic models
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Every module with business logic MUST import logger:from synthorg.observability import get_loggerthen assignlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code — only use the centralized logger fromsynthorg.observability
Logger variable must always be namedlogger(not_logger, notlog)
Event names must always use constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured kwargs in logging: alwayslogger.info(EVENT, key=value)— neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
NEVER use real vendor names (Anthropic, OpenAI,...
Files:
src/synthorg/engine/compaction/summarizer.py
src/synthorg/engine/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Task engine must use centralized single-writer TaskEngine with task lifecycle, recovery, and shutdown
Files:
src/synthorg/engine/compaction/summarizer.py
🧠 Learnings (2)
📚 Learning: 2026-03-17T11:39:10.447Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:39:10.447Z
Learning: Applies to src/synthorg/**/*.py : Functions must be less than 50 lines, files must be less than 800 lines
Applied to files:
src/synthorg/engine/compaction/summarizer.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Functions: < 50 lines, files < 800 lines.
Applied to files:
src/synthorg/engine/compaction/summarizer.py
🧬 Code graph analysis (1)
src/synthorg/engine/compaction/summarizer.py (4)
src/synthorg/engine/compaction/models.py (2)
CompactionConfig(10-39)CompressionMetadata(42-73)src/synthorg/engine/token_estimation.py (6)
DefaultTokenEstimator(48-101)PromptTokenEstimator(15-45)estimate_conversation_tokens(33-45)estimate_conversation_tokens(69-101)estimate_tokens(22-31)estimate_tokens(58-67)src/synthorg/engine/context.py (3)
AgentContext(104-412)context_fill_percent(180-187)with_compression(308-336)src/synthorg/engine/context_budget.py (1)
fill_percent(60-64)
Previously only conversation[0] was preserved, dropping prior compaction summaries (also SYSTEM role) on subsequent compactions.
🤖 I have created a release *beep* *boop* --- ## [0.3.1](v0.3.0...v0.3.1) (2026-03-17) ### Features * **api:** RFC 9457 Phase 2 — ProblemDetail and content negotiation ([#496](#496)) ([30f7c49](30f7c49)) * **cli:** verify container image signatures and SLSA provenance on pull ([#492](#492)) ([bef272d](bef272d)), closes [#491](#491) * **engine:** implement context budget management in execution loops ([#520](#520)) ([181eb8a](181eb8a)), closes [#416](#416) * implement settings persistence layer (DB-backed config) ([#495](#495)) ([4bd99f7](4bd99f7)), closes [#450](#450) * **memory:** implement dual-mode archival in memory consolidation ([#524](#524)) ([4603c9e](4603c9e)), closes [#418](#418) * migrate config consumers to read through SettingsService ([#510](#510)) ([32f553d](32f553d)) * **settings:** implement settings change subscriptions for service hot-reload ([#526](#526)) ([53f908e](53f908e)), closes [#503](#503) * **settings:** register API config in SettingsService with 2-phase init ([#518](#518)) ([29f7481](29f7481)) * **tools:** add SSRF prevention for git clone URLs ([#505](#505)) ([492dd0d](492dd0d)) * **tools:** wire RootConfig.git_clone to GitCloneTool instantiation ([#519](#519)) ([b7d8172](b7d8172)) ### Bug Fixes * **api:** replace JWT query parameter with one-time ticket for WebSocket auth ([#493](#493)) ([22a25f6](22a25f6)), closes [#343](#343) ### Documentation * add uv cache lock contention handling to worktree skill ([#500](#500)) ([bd85a8d](bd85a8d)) * document RFC 9457 dual response formats in OpenAPI schema ([#506](#506)) ([8dd2524](8dd2524)) ### Maintenance * upgrade jsdom from 28 to 29 ([#499](#499)) ([1ea2249](1ea2249)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
AgentContext—context_fill_tokens,context_capacity_tokens,compression_metadatafields withcontext_fill_percentcomputed propertyContextBudgetIndicatormodel with formatted soft budget display ([Context: 12,450/16,000 tokens (78%) | 0 archived blocks]) injected into system prompts via new_SECTION_CONTEXT_BUDGETPromptTokenEstimatorprotocol withestimate_conversation_tokens()for per-message conversation size estimationcompaction/subpackage withCompactionCallbacktype alias,CompactionConfig,CompactionResult,CompressionMetadatamodels, andmake_compaction_callback()factory (oldest-turns summarizer)compaction_callbackintoReactLoopandPlanExecuteLoopconstructors with turn-boundary invocation and error isolation (matchingcheckpoint_callback/stagnation_detectorpatterns)checkpoint/resume.py— pass throughcompaction_callbackinmake_loop_with_callback(), compression-aware reconciliation messages for compressed checkpoint recoverycontext_budgetobservability event constantsTest plan
Review notes
ModelCapabilities.max_context_tokensalready existed — no provider layer changes neededCloses #416