feat: fine-tuning pipeline + CompositeBackend + workflow lifecycle#1065
feat: fine-tuning pipeline + CompositeBackend + workflow lifecycle#1065
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughImplements a complete embedding fine‑tuning subsystem and UI plus memory backend extensions. Adds a five‑stage fine‑tune pipeline, a background async FineTuneOrchestrator (start/resume/cancel/recover) with CancellationToken support, SQLite persistence and schema for runs and checkpoints, preflight checks, checkpoint CRUD and run-list admin endpoints, and WebSocket progress events. Expands memory models and observability events, adds CompositeBackend and InMemoryBackend with namespace routing, updates memory factory/mappers/config, refactors imports and logging, and adds extensive backend unit tests and frontend API, store, pages, and components for fine‑tuning. Suggested labels
Comment |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Code Review
This pull request introduces a complete embedding fine-tuning pipeline and a composite memory backend supporting namespace-based routing. The implementation includes a background orchestrator for synthetic data generation, negative mining, contrastive training, and evaluation, along with SQLite repositories for persistence and a new web dashboard. Critical feedback identifies several Python 3 syntax errors in exception handling and performance issues related to blocking I/O in async methods and inefficient non-batched model encoding. Additionally, a potential stability issue regarding GPU memory exhaustion during model evaluation was noted.
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
| except MemoryError, RecursionError: | ||
| raise |
| except MemoryError, RecursionError: | ||
| raise |
| except MemoryError, RecursionError: | ||
| raise |
| except MemoryError, RecursionError: | ||
| raise |
| except MemoryError, RecursionError: | ||
| raise |
| data: FineTuneRequest, | ||
| ) -> ApiResponse[PreflightResult]: | ||
| """Run pre-flight validation checks.""" | ||
| checks = list(_run_preflight_checks(data)) |
There was a problem hiding this comment.
The _run_preflight_checks function performs blocking I/O (specifically rglob and file scanning in _check_documents). Calling it directly in an async controller method will block the event loop, degrading performance for all concurrent requests. This should be offloaded to a worker thread using asyncio.to_thread.
checks = await asyncio.to_thread(_run_preflight_checks, data)| triples: list[dict[str, object]] = [] | ||
| for i, query in enumerate(queries): | ||
| if cancellation is not None and i % 50 == 0: | ||
| cancellation.check() | ||
| query_emb = await asyncio.to_thread( | ||
| model.encode, | ||
| [query], | ||
| show_progress_bar=False, | ||
| ) | ||
| sims = _cosine_similarities(query_emb[0], passage_embeddings) | ||
| positive_sim = sims[i] | ||
| margin = 0.95 * positive_sim | ||
| candidates = sorted( | ||
| ((j, s) for j, s in enumerate(sims) if j != i and s < margin), | ||
| key=lambda x: x[1], | ||
| reverse=True, | ||
| )[:top_k] | ||
| negatives = [passages[j] for j, _ in candidates] | ||
| triples.append( | ||
| { | ||
| "query": query, | ||
| "positive": passages[i], | ||
| "negatives": negatives, | ||
| }, | ||
| ) | ||
| if progress_callback: | ||
| progress_callback((i + 1) / len(queries)) | ||
|
|
There was a problem hiding this comment.
Encoding queries one-by-one in a loop is highly inefficient. SentenceTransformer.encode is optimized for batch processing, which leverages vectorization and GPU parallelism. Moving the encoding step outside the loop will significantly reduce the overhead of thread spawning and model invocation.
query_embeddings = await asyncio.to_thread(
model.encode,
queries,
show_progress_bar=False,
)
triples: list[dict[str, object]] = []
for i, query_emb in enumerate(query_embeddings):
if cancellation is not None and i % 50 == 0:
cancellation.check()
sims = _cosine_similarities(query_emb, passage_embeddings)
positive_sim = sims[i]
margin = 0.95 * positive_sim
candidates = sorted(
((j, s) for j, s in enumerate(sims) if j != i and s < margin),
key=lambda x: x[1],
reverse=True,
)[:top_k]
negatives = [passages[j] for j, _ in candidates]
triples.append(
{
"query": queries[i],
"positive": passages[i],
"negatives": negatives,
},
)
if progress_callback:
progress_callback((i + 1) / len(queries))| finetuned = await asyncio.to_thread( | ||
| st.SentenceTransformer, | ||
| checkpoint_path, | ||
| ) | ||
| base = await asyncio.to_thread(st.SentenceTransformer, base_model) | ||
|
|
There was a problem hiding this comment.
Loading both the fine-tuned model and the base model into memory simultaneously can lead to Out-Of-Memory (OOM) errors, especially on GPUs with limited VRAM. Consider loading the fine-tuned model, encoding the validation data, deleting the model instance (and clearing the GPU cache), and then repeating the process for the base model.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1065 +/- ##
==========================================
- Coverage 90.77% 89.69% -1.08%
==========================================
Files 722 730 +8
Lines 41099 42539 +1440
Branches 4100 4267 +167
==========================================
+ Hits 37306 38157 +851
- Misses 3085 3636 +551
- Partials 708 746 +38 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds an end-to-end embedding fine-tuning feature (backend pipeline + persistence + admin APIs + dashboard UI), introduces a namespace-routed Composite memory backend (with a new session-scoped InMemoryBackend), and extends workflow-related UI/layout and backend lifecycle wiring.
Changes:
- Add fine-tuning pipeline stages, orchestrator, persistence tables/repos, admin endpoints, and dashboard page with WebSocket-driven progress.
- Add CompositeBackend + InMemoryBackend and introduce
namespacesupport across memory models and Mem0 mapping/filtering. - Update assorted workflow UI spacing and add/extend tests for new persistence/memory/fine-tuning behaviors.
Reviewed changes
Copilot reviewed 52 out of 54 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| web/src/stores/fine-tuning.ts | New Zustand store for fine-tuning status, CRUD actions, and WS updates. |
| web/src/router/routes.ts | Adds a Fine-Tuning route constant. |
| web/src/router/index.tsx | Registers FineTuningPage in the router. |
| web/src/pages/workflows/WorkflowCreateDrawer.tsx | Updates spacing class usage. |
| web/src/pages/workflow-editor/WorkflowNodeDrawer.tsx | Updates spacing class usage. |
| web/src/pages/FineTuningPage.tsx | New fine-tuning settings page with WS subscription and sections. |
| web/src/pages/fine-tuning/RunHistoryTable.tsx | New run history table UI. |
| web/src/pages/fine-tuning/PreflightResultPanel.tsx | New preflight results UI panel. |
| web/src/pages/fine-tuning/PipelineStepper.tsx | New 5-stage pipeline stepper UI. |
| web/src/pages/fine-tuning/PipelineProgressBar.tsx | New progress bar UI with ARIA attributes. |
| web/src/pages/fine-tuning/PipelineControlPanel.tsx | New run start/cancel + preflight controls (incl. “Advanced” section). |
| web/src/pages/fine-tuning/DependencyMissingBanner.tsx | New dependency-missing banner with install instructions. |
| web/src/pages/fine-tuning/CheckpointTable.tsx | New checkpoint table UI + deploy/rollback/delete actions. |
| web/src/components/layout/Sidebar.tsx | Adds Fine-Tuning nav entry in the sidebar. |
| web/src/api/types.ts | Extends WsEventType union with fine-tune event types. |
| web/src/api/endpoints/fine-tuning.ts | New API client module/types for fine-tuning endpoints. |
| web/CLAUDE.md | Updates web/ directory documentation to include fine-tuning domain. |
| tests/unit/persistence/sqlite/test_migrations.py | Updates expected SQLite tables/indexes for new fine-tune schema. |
| tests/unit/persistence/sqlite/test_fine_tune_repo.py | Adds unit tests for fine-tune run/checkpoint SQLite repositories. |
| tests/unit/memory/test_models.py | Adds tests for namespace defaults/validation + MemoryQuery.namespaces. |
| tests/unit/memory/embedding/test_fine_tune.py | Expands fine-tune stage tests (helpers, cancellation, metrics). |
| tests/unit/memory/embedding/test_fine_tune_orchestrator.py | Adds orchestrator lifecycle/cancel/recovery/status tests. |
| tests/unit/memory/embedding/test_fine_tune_models.py | Adds fine-tune model tests (request/status/config/metrics/preflight). |
| tests/unit/memory/backends/mem0/test_mappers.py | Adds namespace mapping/filtering tests for Mem0 backend helpers. |
| tests/unit/memory/backends/inmemory/test_adapter.py | Adds comprehensive tests for the new InMemoryBackend. |
| tests/unit/memory/backends/inmemory/init.py | Test package marker for inmemory backend tests. |
| tests/unit/memory/backends/composite/test_config.py | Adds tests for CompositeBackendConfig validation/defaults. |
| tests/unit/memory/backends/composite/test_adapter.py | Adds tests for CompositeBackend routing, fan-out, and prefixed IDs. |
| tests/unit/memory/backends/composite/init.py | Test package marker for composite backend tests. |
| tests/unit/api/auth/test_session_store.py | Stabilizes time-dependent session store tests via patched now(). |
| src/synthorg/persistence/sqlite/schema.sql | Adds fine_tune_runs + fine_tune_checkpoints tables and indexes. |
| src/synthorg/persistence/sqlite/fine_tune_repo.py | Adds SQLite repositories for fine-tune runs and checkpoints. |
| src/synthorg/observability/events/memory.py | Adds new fine-tune + composite structured logging event constants. |
| src/synthorg/memory/models.py | Adds namespace fields to request/entry + namespaces filter to queries. |
| src/synthorg/memory/factory.py | Adds backend factory wiring for inmemory and composite backends. |
| src/synthorg/memory/errors.py | Adds FineTuneDependencyError and FineTuneCancelledError. |
| src/synthorg/memory/embedding/fine_tune.py | Implements pipeline stages + dependency gating + evaluation helpers. |
| src/synthorg/memory/embedding/fine_tune_orchestrator.py | Adds background orchestrator with cancellation + WS emits. |
| src/synthorg/memory/embedding/fine_tune_models.py | Adds fine-tune request/status/run/checkpoint/config/metrics/preflight models. |
| src/synthorg/memory/embedding/cancellation.py | Adds cooperative CancellationToken implementation. |
| src/synthorg/memory/config.py | Extends memory config for composite/inmemory backend selection + validation. |
| src/synthorg/memory/backends/mem0/mappers.py | Adds namespace metadata + post-filter support for namespaces. |
| src/synthorg/memory/backends/inmemory/adapter.py | Adds a session-scoped dict-based MemoryBackend implementation. |
| src/synthorg/memory/backends/inmemory/init.py | Exports InMemoryBackend. |
| src/synthorg/memory/backends/composite/config.py | Adds CompositeBackendConfig model. |
| src/synthorg/memory/backends/composite/adapter.py | Adds CompositeBackend routing/fan-out + prefixed IDs. |
| src/synthorg/memory/backends/composite/init.py | Exports CompositeBackend and CompositeBackendConfig. |
| src/synthorg/memory/backends/init.py | Exports composite/inmemory backends from backends package. |
| src/synthorg/engine/workflow/execution_observer.py | Adds logger wiring (minor). |
| src/synthorg/engine/agent_engine.py | Adjusts imports to be lazy for procedural proposer/pipeline. |
| src/synthorg/api/ws_models.py | Adds fine-tune WS event types to the backend enum. |
| src/synthorg/api/state.py | Adds FineTuneOrchestrator to AppState wiring. |
| src/synthorg/api/controllers/memory.py | Implements fine-tune orchestration endpoints + preflight + checkpoint APIs. |
| CLAUDE.md | Updates repository-level docs to reflect new memory backends + fine-tune repos. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const wsHandler = useCallback( | ||
| (event: { payload: Record<string, unknown> }) => { | ||
| handleWsEvent(event.payload) | ||
| }, |
There was a problem hiding this comment.
WebSocket handlers registered via useWebSocketStore receive a full WsEvent (with event_type/channel/payload). This handler forwards only event.payload to the store, so fine-tuning events will be ignored because event_type is missing. Pass the entire event into the store (or change the store to accept WsEvent and read event.event_type + event.payload).
web/src/stores/fine-tuning.ts
Outdated
| handleWsEvent: (payload) => { | ||
| const eventType = payload.event_type as string | undefined | ||
| if (!eventType?.startsWith('memory.fine_tune.')) return | ||
|
|
||
| const data = (payload.payload ?? payload) as Record<string, unknown> | ||
| const currentStatus = get().status | ||
|
|
||
| if (eventType === 'memory.fine_tune.progress') { | ||
| set({ |
There was a problem hiding this comment.
handleWsEvent currently expects a plain payload object and reads payload.event_type, but WebSocket dispatch passes a WsEvent object (event_type/channel/timestamp/payload). Align this store with other stores (e.g. tasks/meetings) by accepting WsEvent and extracting event.event_type and event.payload; otherwise fine-tuning WS updates will not apply.
| const handlePreflight = () => { | ||
| const request: StartFineTuneRequest = { source_dir: sourceDir } | ||
| void runPreflightCheck(request) | ||
| } | ||
|
|
||
| const handleStart = () => { | ||
| const request: StartFineTuneRequest = { source_dir: sourceDir } | ||
| void startRun(request) | ||
| } |
There was a problem hiding this comment.
The “Advanced Options” inputs are currently non-functional: their values are hard-coded and their onChange handlers are no-ops, and start/preflight requests only include source_dir. Either wire these fields into component state and include the overrides in StartFineTuneRequest, or remove/disable the UI until supported to avoid misleading users.
| if torch.cuda.is_available(): | ||
| props = torch.cuda.get_device_properties(0) | ||
| vram_gb = props.total_mem / (1024**3) | ||
| return PreflightCheck( | ||
| name="gpu", | ||
| status="pass", | ||
| message=f"GPU available: {props.name}", | ||
| detail=f"VRAM: {vram_gb:.1f} GB", | ||
| ) |
There was a problem hiding this comment.
GPU VRAM detection uses props.total_mem, but torch.cuda.get_device_properties() exposes total_memory. As written this will raise AttributeError on GPU hosts and incorrectly report GPU status. Use the correct attribute when computing vram_gb.
| import torch # noqa: PLC0415 | ||
|
|
||
| if not torch.cuda.is_available(): | ||
| return 16 | ||
| props = torch.cuda.get_device_properties(0) | ||
| vram_gb = props.total_mem / (1024**3) | ||
| if vram_gb >= 40: # noqa: PLR2004 | ||
| return 128 | ||
| if vram_gb >= 16: # noqa: PLR2004 | ||
| return 64 | ||
| if vram_gb >= 8: # noqa: PLR2004 | ||
| return 32 |
There was a problem hiding this comment.
Batch size recommendation also uses props.total_mem (nonexistent) instead of torch’s total_memory, which will make recommendations fail on GPU hosts. Use total_memory (and keep the same GB thresholds).
| checks = list(_run_preflight_checks(data)) | ||
| batch_size = _recommend_batch_size() | ||
| result = PreflightResult( | ||
| checks=tuple(checks), | ||
| recommended_batch_size=batch_size, | ||
| ) | ||
| logger.info( | ||
| MEMORY_FINE_TUNE_PREFLIGHT_COMPLETED, | ||
| can_proceed=result.can_proceed, | ||
| check_count=len(checks), | ||
| ) | ||
| return ApiResponse(data=result) |
There was a problem hiding this comment.
Preflight currently only runs dependencies/GPU/doc-count checks. The PR description and issue requirements also call for disk space estimation (warn/fail with actionable detail). Consider adding a disk-space check (e.g. using shutil.disk_usage on output_dir/source_dir filesystem) and including it in the returned checks list.
| def _check_gpu() -> PreflightCheck: | ||
| """Best-effort GPU availability check.""" | ||
| try: | ||
| import torch # type: ignore[import-not-found] # noqa: PLC0415 | ||
|
|
||
| if torch.cuda.is_available(): | ||
| props = torch.cuda.get_device_properties(0) | ||
| vram_gb = props.total_mem / (1024**3) | ||
| return PreflightCheck( | ||
| name="gpu", | ||
| status="pass", | ||
| message=f"GPU available: {props.name}", | ||
| detail=f"VRAM: {vram_gb:.1f} GB", | ||
| ) | ||
| return PreflightCheck( | ||
| name="gpu", | ||
| status="warn", | ||
| message="No GPU detected -- training will be slow", | ||
| detail="CPU-only mode", | ||
| ) | ||
| except ImportError: |
There was a problem hiding this comment.
_check_gpu() returns status="warn" when no GPU is detected, but the fine-tuning requirements state stages 2–3 require GPU and should fail early with a clear message. Consider making this a "fail" (or at least gate can_proceed) when torch is present but cuda.is_available() is false, to avoid starting runs that will predictably be impractical/unsupported.
| async def cancel(self) -> None: | ||
| """Cancel the active pipeline run.""" | ||
| if self._cancellation is not None: | ||
| self._cancellation.cancel() | ||
| logger.info(MEMORY_FINE_TUNE_CANCELLED) | ||
|
|
||
| async def recover_interrupted(self) -> int: | ||
| """Mark interrupted runs as FAILED on startup.""" | ||
| return await self._run_repo.mark_interrupted() | ||
|
|
There was a problem hiding this comment.
Startup recovery isn’t actually enforced yet: recover_interrupted() exists but there are no call sites in the codebase, so RUNNING stages may remain stuck after a restart. Wire this into API startup (after repositories are available) so active runs are marked FAILED with "interrupted by restart" as intended.
| split_idx = max(1, int(len(all_pairs) * (1 - validation_split))) | ||
| training = all_pairs[:split_idx] | ||
| validation = all_pairs[split_idx:] | ||
|
|
There was a problem hiding this comment.
generate_training_data() can produce an empty validation set (e.g. when len(all_pairs) is small, split_idx becomes len(all_pairs)), which will later make evaluate_checkpoint() fail with “Validation data is empty”. Consider clamping split_idx to ensure both splits are non-empty (e.g. 1 <= split_idx <= len(all_pairs)-1) or explicitly validate a minimum number of pairs up front with a clear error.
| for i, query in enumerate(queries): | ||
| if cancellation is not None and i % 50 == 0: | ||
| cancellation.check() | ||
| query_emb = await asyncio.to_thread( | ||
| model.encode, | ||
| [query], | ||
| show_progress_bar=False, | ||
| ) | ||
| sims = _cosine_similarities(query_emb[0], passage_embeddings) |
There was a problem hiding this comment.
mine_hard_negatives() encodes each query individually inside the loop (model.encode([query]) per item), which can be extremely slow for large datasets. Consider batching query encodes (encode all queries once) and then computing similarities, or at least encoding queries in chunks, to keep runtime feasible.
There was a problem hiding this comment.
Actionable comments posted: 33
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Around line 101-103: The documentation contradicts itself about available
memory backends: update the sentence that currently states "mem0 is the
currently only backend" to reflect the current reality described under the
memory/ and backends/ listings—either state that multiple backends are supported
(e.g., mem0, Qdrant+SQLite, inmemory, composite routing) or explicitly mark mem0
as the default/legacy backend; edit the CLAUDE.md section that mentions "mem0 is
the 'currently only backend'" to reference the backends/ subsection and align
wording (e.g., "mem0 (default/legacy) and other backends such as Qdrant+SQLite,
inmemory and composite routing are supported") so the memory/ and backends/
descriptions are consistent.
In `@src/synthorg/api/controllers/memory.py`:
- Around line 160-164: The preflight response omits the disk-space gate: ensure
you include the disk-space check result in the checks list before constructing
PreflightResult so can_proceed reflects storage limits. Call the existing
disk-space check helper (e.g. _run_disk_space_check or _disk_space_check) and
append its result to checks (the list produced by _run_preflight_checks()) prior
to building the tuple passed to PreflightResult (the same change must be applied
in both construction sites around the PreflightResult usage at lines shown).
Verify the aggregated can_proceed logic consumes that disk check result.
- Around line 226-244: Rollback currently only deactivates checkpoints and never
reapplies the saved backup config; after fetching cp with
SQLiteFineTuneCheckpointRepository.get_checkpoint and verifying
cp.backup_config_json you must deserialize that JSON and apply it to the live
deployment/config store (e.g., via a new or existing repository method such as
SQLiteFineTuneCheckpointRepository.apply_backup_config or
activate_checkpoint_with_config), ensure checkpoint active flags are updated
(instead of only calling deactivate_all) and persist the change, then re-query
with get_checkpoint and return the updated record; if no helper exists add a
repository method that accepts checkpoint_id and backup_config_json to perform
the config restore and activation.
- Around line 78-86: The handlers dereference app_state.fine_tune_orchestrator
unconditionally (e.g., the block that sets orchestrator =
app_state.fine_tune_orchestrator and calls await orchestrator.start(data));
guard this optional field before use and return the same controlled API error
used by get_fine_tune_status when fine-tuning is unavailable. Update each
affected handler (the start/trigger block around orchestrator.start, and the
similar uses at the other two locations referenced) to check if
app_state.fine_tune_orchestrator is None and if so return/raise the standardized
API error response; otherwise proceed to assign to orchestrator and call its
methods.
- Around line 160-161: The preflight probes run heavy synchronous work inside an
async handler: move both _run_preflight_checks() and _recommend_batch_size() off
the event loop by running them in threads and coordinating them with structured
concurrency; in the async controller handler, use asyncio.TaskGroup to launch
asyncio.to_thread(_run_preflight_checks, data) and
asyncio.to_thread(_recommend_batch_size) in parallel, await the TaskGroup,
collect the results into checks and batch_size, and propagate any exceptions so
the handler remains non-blocking.
- Around line 421-428: Update both _check_gpu and _recommend_batch_size to
access props.total_memory instead of props.total_mem when reading VRAM; in
addition, change _recommend_batch_size's exception handling to mirror
_check_gpu's pattern by re-raising system exceptions (MemoryError,
RecursionError) and only swallowing other exceptions (returning None) so system
errors propagate while other failures produce a None recommendation. Ensure you
modify the references to props and the except block around the computation in
_recommend_batch_size accordingly.
In `@src/synthorg/api/state.py`:
- Around line 389-395: set_fine_tune_orchestrator currently overwrites an
existing orchestrator silently; change it to first check
self._fine_tune_orchestrator and if it's already set log a WARNING/ERROR with
context and raise an exception to prevent silent reconfiguration, otherwise set
it and emit an INFO-level lifecycle transition log indicating the old->new
wiring; update the method (set_fine_tune_orchestrator) to follow the same
logging/guard pattern used by the other set_* lifecycle methods.
In `@src/synthorg/memory/backends/composite/adapter.py`:
- Around line 176-181: Add a brief clarifying comment above the nested async
function _check (and/or where backend: MemoryBackend = b is defined) explaining
that using a default parameter captures the current value of b to avoid Python's
late-binding closure issue; reference the async helper _check, the default arg
backend: MemoryBackend = b, results.append(await backend.health_check()), and
tg.create_task(_check()) so future maintainers understand this pattern and why
the default parameter is required.
In `@src/synthorg/memory/backends/inmemory/adapter.py`:
- Around line 336-346: The clear() method is synchronous while other CRUD
methods (store, retrieve, get, delete, count) are async; change clear to an
async method (async def clear(self, agent_id: NotBlankStr) -> int) and call
await self._require_connected() at the start for consistency with other
operations, then pop the agent store (self._store.pop(str(agent_id), {})) and
return its length as before so behavior is unchanged but the API and connection
checks match the other methods.
- Around line 81-87: The disconnect method lacks the same _connect_lock
protection used by connect, creating a race between connect() and disconnect();
wrap the body of disconnect (the checks and setting of self._connected and
logger calls—MEMORY_BACKEND_DISCONNECTING and MEMORY_BACKEND_DISCONNECTED) in
the same async lock acquisition used by connect (i.e., acquire the instance's
_connect_lock via async context or explicit acquire/release), perform the
idempotent check and state change while holding the lock, and then release the
lock so connect() and disconnect() are symmetric and thread-safe.
In `@src/synthorg/memory/embedding/cancellation.py`:
- Around line 7-45: The module lacks structured logging: import get_logger from
synthorg.observability and create logger = get_logger(__name__), then add
logger.info in CancellationToken.cancel to record the state transition (include
context like token id or class name), add logger.debug or info in the
is_cancelled property access if desired, and add logger.warning or logger.error
in CancellationToken.check immediately before raising FineTuneCancelledError
(include the same context and the message "Fine-tuning pipeline run was
cancelled"); update references to CancellationToken.cancel, .is_cancelled, and
.check to ensure logs are emitted where state changes or errors occur.
In `@src/synthorg/memory/embedding/fine_tune_models.py`:
- Around line 245-294: The FineTuneRun model allows inconsistent terminal-state
combinations (e.g., stage == FineTuneStage.FAILED with error None, terminal
stages with completed_at None, or non-terminal stages having completed_at set);
add a model_validator(mode="after") on FineTuneRun that enforces these
invariants by checking the stage, error, completed_at, and stages_completed
fields and raising ValueError with clear messages when invariants are violated
(examples: when stage.is_terminal() then ensure completed_at is not None; when
stage == FineTuneStage.FAILED ensure error is not None; when not
stage.is_terminal() ensure completed_at is None; validate that stages_completed
only contains stages prior to current stage and that progress is None for
terminal stages or within 0..1 otherwise). Ensure the validator runs after
initialization and references FineTuneRun, stage, error, completed_at,
stages_completed, and progress to locate the code to change.
In `@src/synthorg/memory/embedding/fine_tune_orchestrator.py`:
- Around line 197-201: The cancel() method currently signals cancellation via
self._cancellation.cancel() but returns before the background task finishes;
update cancel() to await the task stored in self._cancellation (e.g., await
self._cancellation) so callers know the run has stopped, and handle/ignore
asyncio.CancelledError and optional timeout using asyncio.wait_for if needed,
then log MEMORY_FINE_TUNE_CANCELLED after the awaited task completes; reference
the existing cancel(), self._cancellation, and MEMORY_FINE_TUNE_CANCELLED
symbols when making this change.
- Around line 457-483: The progress callback _cb mutates shared state
self._current_run from threads; either document the thread-safety assumption or
make updates thread-safe by scheduling the mutation on the event loop or using a
lock: inside _cb (returned by _make_progress_cb) avoid direct assignment from
worker threads and instead obtain the main asyncio loop and call
loop.call_soon_threadsafe(...) to perform the model_copy + self._current_run
assignment and _emit_ws, or protect the assignment with a threading.Lock; update
the code around references to self._current_run, run_id, run, and _emit_ws
accordingly and add a short comment describing the chosen approach.
In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 224-226: The extractive-query fallback in the function that builds
the query uses sentences = chunk.split(".") and first = sentences[0].strip()
which can yield an empty string or an overly long query; update the logic to
trim chunk first, check if the trimmed chunk is empty and return a safe default
(e.g., "Find information about: [no content]") or a short placeholder, and if
not empty use the first sentence but cap its length (e.g., first_sentence[:100])
so the returned string from the function (the "Find information about: {first}"
return) is neither empty nor unbounded; reference variables sentences, first and
chunk when applying these checks.
In `@src/synthorg/persistence/sqlite/schema.sql`:
- Around line 418-426: The fine_tune_checkpoints table allows arbitrary integers
and multiple rows marked active; change the column definition for is_active to
enforce boolean values (INTEGER NOT NULL DEFAULT 0 CHECK(is_active IN (0,1)))
and add a schema-level uniqueness constraint for the single-active invariant by
creating a partial unique index that enforces at most one row with is_active=1
(e.g., CREATE UNIQUE INDEX ... ON fine_tune_checkpoints(is_active) WHERE
is_active=1); update the existing idx_ftc_active to the new CHECK and add the
partial unique index (referencing the table fine_tune_checkpoints, column
is_active and index name idx_ftc_active / new unique index name) so the DB
prevents invalid values and multiple active rows.
In `@tests/unit/memory/backends/composite/test_adapter.py`:
- Around line 293-384: The suite lacks a degraded fan-out test: add a new async
test method (e.g., test_retrieve_degraded_fanout) in TestRetrieveFanout that
constructs a CompositeBackend with one child backend whose retrieve() raises and
one healthy child that returns entries, then call connected.retrieve (or
CompositeBackend.retrieve) with an appropriate MemoryQuery and assert the
composite returns the healthy child's entries (ignoring the failing child) and
preserves merge ordering across children; implement the failing child as a small
stub/mock backend whose retrieve raises an exception, register both children
into the CompositeBackend used by the test, and assert results contain only the
healthy contents and are ordered as expected.
In `@tests/unit/memory/backends/inmemory/test_adapter.py`:
- Around line 229-245: Add tests that verify TTL/expiration behavior by mocking
time.monotonic() and asyncio.sleep() around existing tests like
test_get_missing_returns_none and test_namespace_preserved: store an entry with
a short TTL using connected.store(..., ttl=...), assert it is returned by
connected.get(), connected.retrieve(), and counted by connected.count(), then
advance the mocked monotonic time past the TTL (or await a mocked asyncio.sleep)
and assert the entry is no longer returned by connected.get(),
connected.retrieve(), and that connected.count() decrements; reference the
InMemoryBackend instance named connected and its methods get, store, retrieve,
and count when adding these assertions.
In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py`:
- Around line 64-74: The helper _request currently creates real files under
tmp_path (src/doc.txt) but returns a FineTuneRequest with synthetic POSIX paths
(/test/src, /test/out), which is confusing; update _request to either stop
creating the filesystem artifacts if they are unused, or create the files at the
same paths used in the FineTuneRequest (or make the intent explicit) — modify
the function _request so it either (a) removes the tmp_path/src creation and
doc.txt write, or (b) set source_dir and output_dir to the real tmp_path-based
paths (e.g., str(src) and str(tmp_path/"out")), or add a clear docstring above
_request explaining that mocks bypass file access and the created files are
intentionally unused; pick one consistent approach and update tests that rely on
_request accordingly.
- Around line 101-102: The test uses fixed asyncio.sleep() calls to wait for
background work (in tests/unit/memory/embedding/test_fine_tune_orchestrator.py);
replace these timing-based waits by awaiting a deterministic completion signal
from the orchestrator: modify the orchestrator to expose a completion
Future/Event or its running Task (e.g., orchestrator._current_task or
orchestrator.completion_event) and change the test to await that signal (e.g.,
await orchestrator._current_task or await orchestrator.completion_event.wait())
so tests block deterministically instead of using asyncio.sleep(0.05/0.1/0.2);
alternatively, mock the async timing functions (asyncio.sleep/time.monotonic) in
the test if exposing a completion signal is not feasible.
- Around line 221-224: The test is coupling to CancellationToken internals by
calling token._event.wait; add a public blocking API on CancellationToken (e.g.,
wait_until_cancelled or wait_until_cancelled_async) and update the test to call
that instead of token._event.wait; implement the new method to delegate to the
internal threading.Event.wait (or await via asyncio.to_thread) and preserve
existing behavior (return/raise consistently with token.check()), and keep the
token.check() call in the test unchanged so the test still verifies cancellation
behavior using the public method.
In `@tests/unit/memory/test_models.py`:
- Around line 118-133: Combine the two duplicate tests into a single
parameterized test: replace test_namespace_empty_rejected and
test_namespace_whitespace_rejected with a `@pytest.mark.parametrize` decorated
function (e.g. test_invalid_namespace_rejected) that iterates over tuples of
(namespace, expected_match) and asserts
MemoryStoreRequest(category=MemoryCategory.WORKING, namespace=namespace,
content="c") raises ValidationError with the expected_match. Reference
MemoryStoreRequest and MemoryCategory.WORKING in the test and keep the same
ValidationError assertions for the provided namespaces "" and " ".
In `@web/src/pages/fine-tuning/CheckpointTable.tsx`:
- Around line 39-114: The checkpoint row JSX inside the checkpoints.map is too
large and should be extracted into a new presentational component, e.g.,
CheckpointTableRow; create a new component that accepts a checkpoint prop plus
the actions/handlers (deployCheckpointAction, rollbackCheckpointAction,
setDeletingId) and any helpers used (formatBytes, MetricDelta, StatusBadge,
Button) and move the entire <tr> rendering logic into it, then replace the map
body with checkpoints.map(cp => <CheckpointTableRow key={cp.id} checkpoint={cp}
deploy={deployCheckpointAction} rollback={rollbackCheckpointAction}
setDeletingId={setDeletingId} />) so all formatting, status display and mutation
triggers remain unchanged but live in the new component.
In `@web/src/pages/fine-tuning/DependencyMissingBanner.tsx`:
- Around line 3-14: Replace the hand-rolled div in the DependencyMissingBanner
component with the shared UI primitives from web/src/components/ui/—at minimum
use SectionCard to wrap the content, and compose it with the appropriate
header/content subcomponents (e.g., CardHeader/CardTitle or the SectionCard
title prop and CardContent) and the existing code/mono text primitive from the
ui library instead of raw <code> markup; preserve the message text and
role="alert" semantics and ensure classes for spacing/typography are removed in
favor of the ui components so the banner follows the dashboard design system.
In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx`:
- Around line 67-72: The advanced controls (rendered when showAdvanced is true)
are no-ops because InputField instances for "Epochs", "Learning Rate", and
"Batch Size" use fixed values and empty onChange handlers and the start action
(StartFineTuneRequest) only sends source_dir; fix by adding component state
(e.g., epochs, learningRate, batchSize initialized from
preflight?.recommended_batch_size) and pass them as value props, implement
onChange handlers to update those state variables, and include these fields in
the payload built by StartFineTuneRequest (or alternatively hide the entire
showAdvanced block until these fields are fully supported).
- Around line 14-26: Preflight validation is being reused across edits to
sourceDir so stale results can gate new requests; update the code to invalidate
the preflight state whenever the request changes by clearing the preflight
before running a new check or when sourceDir is modified: call the state setter
that clears the preflight result (e.g., setPreflight(undefined) or equivalent)
inside setSourceDir's updater and immediately before calling runPreflightCheck
in handlePreflight and before calling startRun in handleStart (and similarly in
the other handlers referenced on lines ~34-35 and ~49) so each new request
triggers a fresh validation.
In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx`:
- Around line 9-10: The computed percentage pct in PipelineProgressBar uses
progress directly, which can produce values outside 0–100; clamp progress to the
valid range before converting (e.g., ensure progress is between 0 and 1, then
compute pct) so that aria-valuenow and the width style always receive values
within 0–100; update the logic around the pct calculation and any other usages
(e.g., the aria-valuenow attribute and the inline width style) to use the
clamped value derived from the progress prop or state.
- Around line 19-31: The PipelineProgressBar component is re-implementing a
progress primitive using raw divs (pct variable); replace that markup with the
shared UI Progress component from web/src/components/ui (import Progress and its
child ProgressIndicator or the appropriate API), pass pct as the
value/aria-valuenow equivalent and keep aria-label, min/max props, and the same
classes/transition via existing component props or wrapper className so styling
and behavior match the app-wide progress primitive.
In `@web/src/pages/fine-tuning/PipelineStepper.tsx`:
- Around line 24-59: The connector uses the current step's isComplete so the bar
into the active step stays muted; fix by computing connectorComplete = i <=
currentIdx (while keeping isComplete = i < currentIdx and isCurrent = s.key ===
stage) and use connectorComplete for the connector class; also extract the JSX
inside STAGES.map into a new StepItem (or StepRenderer) component that accepts
props (s, i, currentIdx, stage) and renders the connector, circle and label,
then replace the inline map body with <StepItem ... /> to keep the map simple.
In `@web/src/pages/fine-tuning/RunHistoryTable.tsx`:
- Around line 7-11: STAGE_STATUS_MAP in RunHistoryTable.tsx currently maps only
complete/failed/idle so in-progress stages fall back to 'idle'; update
STAGE_STATUS_MAP to include the in-progress stage keys generating_data,
mining_negatives, training, evaluating, and deploying and map them to 'active'
(so the component that uses STAGE_STATUS_MAP to render badges shows a non-idle
badge for in-progress stages).
- Around line 38-63: Extract the JSX inside runs.map into a new presentational
component (e.g., RunHistoryRow) that accepts a single prop "run" and reuses
existing helpers (formatDuration, formatStage), constants (STAGE_STATUS_MAP),
and components (StatusBadge); replace the inline JSX in RunHistoryTable's
runs.map with <RunHistoryRow key={run.id} run={run} /> and implement
RunHistoryRow to render the same <tr> structure, preserving the key, date
formatting (new Date(run.started_at).toLocaleString()), duration fallback,
stages_completed count, and run.config.source_dir so the table behavior remains
identical but the map body is simplified.
- Around line 70-75: The formatDuration function incorrectly uses Math.round for
minute calculations which can roll over to 60m; update formatDuration so seconds
(<60) still use Math.round, but for minute-only and hour+minute branches compute
minutes with Math.floor((seconds % 3600) / 60) or Math.floor(seconds / 60)
respectively (and compute hours with Math.floor(seconds / 3600)), ensuring no
60-minute value appears and no extra rollover logic is needed.
In `@web/src/stores/fine-tuning.ts`:
- Around line 107-114: runPreflightCheck never clears prior errors, so add
clearing of the error state when starting and on successful completion: in
runPreflightCheck (the async function that calls runPreflight) update the
initial set call to include error: null (e.g., set({ loading: true, preflight:
null, error: null })) and ensure the success branch also clears error (e.g.,
set({ preflight: result, loading: false, error: null })); leave the catch to set
the failure message as it does.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 71295937-4984-484b-907f-a1cd8cd06d5e
📒 Files selected for processing (54)
CLAUDE.mdsrc/synthorg/api/controllers/memory.pysrc/synthorg/api/state.pysrc/synthorg/api/ws_models.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/workflow/execution_observer.pysrc/synthorg/memory/backends/__init__.pysrc/synthorg/memory/backends/composite/__init__.pysrc/synthorg/memory/backends/composite/adapter.pysrc/synthorg/memory/backends/composite/config.pysrc/synthorg/memory/backends/inmemory/__init__.pysrc/synthorg/memory/backends/inmemory/adapter.pysrc/synthorg/memory/backends/mem0/mappers.pysrc/synthorg/memory/config.pysrc/synthorg/memory/embedding/cancellation.pysrc/synthorg/memory/embedding/fine_tune.pysrc/synthorg/memory/embedding/fine_tune_models.pysrc/synthorg/memory/embedding/fine_tune_orchestrator.pysrc/synthorg/memory/errors.pysrc/synthorg/memory/factory.pysrc/synthorg/memory/models.pysrc/synthorg/observability/events/memory.pysrc/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/persistence/sqlite/schema.sqltests/unit/api/auth/test_session_store.pytests/unit/memory/backends/composite/__init__.pytests/unit/memory/backends/composite/test_adapter.pytests/unit/memory/backends/composite/test_config.pytests/unit/memory/backends/inmemory/__init__.pytests/unit/memory/backends/inmemory/test_adapter.pytests/unit/memory/backends/mem0/test_mappers.pytests/unit/memory/embedding/test_fine_tune.pytests/unit/memory/embedding/test_fine_tune_models.pytests/unit/memory/embedding/test_fine_tune_orchestrator.pytests/unit/memory/test_models.pytests/unit/persistence/sqlite/test_fine_tune_repo.pytests/unit/persistence/sqlite/test_migrations.pyweb/CLAUDE.mdweb/src/api/endpoints/fine-tuning.tsweb/src/api/types.tsweb/src/components/layout/Sidebar.tsxweb/src/pages/FineTuningPage.tsxweb/src/pages/fine-tuning/CheckpointTable.tsxweb/src/pages/fine-tuning/DependencyMissingBanner.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsxweb/src/pages/fine-tuning/PipelineProgressBar.tsxweb/src/pages/fine-tuning/PipelineStepper.tsxweb/src/pages/fine-tuning/PreflightResultPanel.tsxweb/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/workflow-editor/WorkflowNodeDrawer.tsxweb/src/pages/workflows/WorkflowCreateDrawer.tsxweb/src/router/index.tsxweb/src/router/routes.tsweb/src/stores/fine-tuning.ts
| <div | ||
| className="h-2 w-full overflow-hidden rounded-full bg-muted" | ||
| role="progressbar" | ||
| aria-valuenow={pct} | ||
| aria-valuemin={0} | ||
| aria-valuemax={100} | ||
| aria-label="Fine-tuning progress" | ||
| > | ||
| <div | ||
| className="h-full rounded-full bg-accent transition-all duration-300" | ||
| style={{ width: `${pct}%` }} | ||
| /> | ||
| </div> |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use the shared UI progress component instead of custom progress markup.
This page-level component recreates a progress primitive; please reuse the existing UI component to keep behavior and styling consistent across pages.
As per coding guidelines, "ALWAYS reuse existing components from web/src/components/ui/ before creating new ones."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx` around lines 19 - 31, The
PipelineProgressBar component is re-implementing a progress primitive using raw
divs (pct variable); replace that markup with the shared UI Progress component
from web/src/components/ui (import Progress and its child ProgressIndicator or
the appropriate API), pass pct as the value/aria-valuenow equivalent and keep
aria-label, min/max props, and the same classes/transition via existing
component props or wrapper className so styling and behavior match the app-wide
progress primitive.
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (7)
web/src/pages/fine-tuning/CheckpointTable.tsx (1)
45-120: 🛠️ Refactor suggestion | 🟠 MajorExtract the checkpoint row out of the
.map()block.The row is already handling metric formatting, status presentation, and three mutations inline.
As per coding guidelines "Do not create complex (>8 line) JSX inside
.map()blocks; extract to a shared component instead".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/pages/fine-tuning/CheckpointTable.tsx` around lines 45 - 120, The map callback in CheckpointTable.tsx contains a large chunk of JSX for each row — extract that JSX into a new presentational component (e.g., CheckpointRow) that accepts a checkpoint object (cp) and the action handlers (deployCheckpointAction, rollbackCheckpointAction, setDeletingId) as props; move all logic for date/bytes/metric formatting, StatusBadge/MetricDelta usage, and the conditional Buttons into that component, then replace the inline map body with <CheckpointRow cp={cp} ... /> to keep the .map() concise and reuseable while preserving existing symbols like MetricDelta, StatusBadge, formatBytes, and the three action functions.web/src/pages/fine-tuning/PipelineStepper.tsx (1)
24-60: 🛠️ Refactor suggestion | 🟠 MajorPull the step item out of
STAGES.map().Connector state, badge state, and label rendering are all mixed into one long inline block here.
As per coding guidelines "Do not create complex (>8 line) JSX inside
.map()blocks; extract to a shared component instead".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/pages/fine-tuning/PipelineStepper.tsx` around lines 24 - 60, Extract the JSX inside STAGES.map in PipelineStepper into a new presentational component (e.g., StepItem) to keep the map callback simple; move the logic that computes isComplete, isCurrent, connectorComplete and the rendering of the connector div, badge div (checkmark/number), and label span into that component, accept props like stage, index, currentIdx, and key, and replace the inline block in STAGES.map with a single StepItem call so the map callback is under 8 lines and the state handling stays the same.web/src/pages/fine-tuning/RunHistoryTable.tsx (1)
45-70: 🛠️ Refactor suggestion | 🟠 MajorExtract the run row out of the
.map()block.This renderer is still carrying date, duration, status, and source formatting inline, which makes the table harder to change safely.
As per coding guidelines "Do not create complex (>8 line) JSX inside
.map()blocks; extract to a shared component instead".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/pages/fine-tuning/RunHistoryTable.tsx` around lines 45 - 70, Extract the JSX inside runs.map into a new presentational component (e.g., RunRow) that accepts a single prop like run: RunSummary (or the existing run type), moves all formatting/logic (new Date(run.started_at).toLocaleString(), formatDuration, STAGE_STATUS_MAP lookup, formatStage, run.config.source_dir, run.stages_completed.length) into that component, and returns the <tr> markup currently inline; update runs.map to just render <RunRow key={run.id} run={run} /> and ensure RunRow is typed/exported appropriately and imports StatusBadge, formatDuration, formatStage, and STAGE_STATUS_MAP so the visual output and classNames remain identical.web/src/pages/fine-tuning/PipelineControlPanel.tsx (1)
24-33:⚠️ Potential issue | 🟠 MajorTreat the advanced fields as optional overrides, not hidden defaults.
These values are always positive, so
buildRequest()sendsepochs,learning_rate, andbatch_sizeeven while Advanced is collapsed.batchSizealso never updates from a laterpreflight.recommended_batch_size, andpreflightis only cleared whensourceDirchanges. That lets stale validation gate a different request and silently pins runs tobatch_size=128.Suggested fix
- const [epochs, setEpochs] = useState('3') - const [learningRate, setLearningRate] = useState('1e-5') - const [batchSize, setBatchSize] = useState( - String(preflight?.recommended_batch_size ?? 128), - ) + const [epochs, setEpochs] = useState('') + const [learningRate, setLearningRate] = useState('') + const [batchSize, setBatchSize] = useState('') useEffect(() => { useFineTuningStore.setState({ preflight: null }) - }, [sourceDir]) + }, [sourceDir, showAdvanced, epochs, learningRate, batchSize]) + + useEffect(() => { + if (preflight?.recommended_batch_size != null && batchSize === '') { + setBatchSize(String(preflight.recommended_batch_size)) + } + }, [preflight?.recommended_batch_size, batchSize]) const buildRequest = (): StartFineTuneRequest => { const request: StartFineTuneRequest = { source_dir: sourceDir } const parsedEpochs = Number(epochs) - if (!Number.isNaN(parsedEpochs) && parsedEpochs > 0) request.epochs = parsedEpochs + if (showAdvanced && !Number.isNaN(parsedEpochs) && parsedEpochs > 0) { + request.epochs = parsedEpochs + } const parsedLr = Number(learningRate) - if (!Number.isNaN(parsedLr) && parsedLr > 0) request.learning_rate = parsedLr + if (showAdvanced && !Number.isNaN(parsedLr) && parsedLr > 0) { + request.learning_rate = parsedLr + } const parsedBatch = Number(batchSize) - if (!Number.isNaN(parsedBatch) && parsedBatch > 0) request.batch_size = parsedBatch + if (showAdvanced && !Number.isNaN(parsedBatch) && parsedBatch > 0) { + request.batch_size = parsedBatch + } return request }Also applies to: 37-54
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx` around lines 24 - 33, The advanced fields (epochs, learningRate, batchSize) are being sent as hard defaults even when Advanced is collapsed and batchSize never picks up a later preflight value; change their handling so they act as optional overrides: stop initializing batchSize/epochs/learningRate to concrete positive defaults (e.g. '3'/'1e-5'/'128'); instead initialize them to undefined or empty string and when a preflight arrives set batchSize only if the user hasn't entered an override (useEffect watching preflight to call setBatchSize(preflight.recommended_batch_size) only when current batchSize is empty). Also modify buildRequest() to include epochs, learning_rate, and batch_size only if the user has explicitly set them or Advanced is open (check the Advanced toggle state) rather than always sending the fields. Reference: epochs, setEpochs, learningRate, setLearningRate, batchSize, setBatchSize, useFineTuningStore.preflight, buildRequest.tests/unit/memory/embedding/test_fine_tune_orchestrator.py (1)
64-74: 🧹 Nitpick | 🔵 TrivialTest helper creates filesystem artifacts but uses synthetic paths.
The
_request()helper creates real files intmp_pathbut returns aFineTuneRequestwith hardcoded POSIX paths (/test/src,/test/out). The comment explains this is intentional since mocks bypass filesystem access. Consider adding a brief docstring to make this clearer for future maintainers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py` around lines 64 - 74, The test helper _request currently creates real files under tmp_path but returns a FineTuneRequest with synthetic POSIX paths ("/test/src", "/test/out") which can be confusing; update the _request function to include a concise docstring that explains it deliberately writes local artifacts for test diagnostics while returning synthetic POSIX paths because the FineTuneRequest is used with an orchestrator mock that does not access the filesystem, and mention FineTuneRequest and the tmp_path behavior so future maintainers understand the intentional mismatch.src/synthorg/memory/embedding/fine_tune_orchestrator.py (1)
213-222: 🧹 Nitpick | 🔵 Trivial
cancel()signals cancellation but doesn't await task completion.The
cancel()method sets the cancellation flag and logs, but doesn't await the background task to terminate. Callers may expect the run to be fully stopped whencancel()returns. The current behavior is cooperative cancellation that returns immediately.Consider either awaiting task completion with a timeout, or documenting this as intentional fire-and-forget cancellation in the docstring.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/embedding/fine_tune_orchestrator.py` around lines 213 - 222, The cancel() method currently sets self._cancellation and logs MEMORY_FINE_TUNE_CANCELLED but does not wait for the background pipeline to finish; update cancel() (protected by self._op_lock) to await the background task's completion (e.g., wait on the task/future that is canceled or a stored asyncio.Task representing the run) with a reasonable timeout, handle asyncio.TimeoutError and cancellation exceptions by logging and returning, and ensure the method returns only after the run task has terminated (or document in the docstring if you intentionally want fire-and-forget behavior). Reference symbols: cancel(), self._op_lock, self._cancellation, the background task/future variable that represents the active run, logger, and MEMORY_FINE_TUNE_CANCELLED.src/synthorg/api/controllers/memory.py (1)
388-396:⚠️ Potential issue | 🟠 MajorPreflight validation is still missing the disk-space gate.
The PR objectives explicitly require disk-space validation as part of pre-flight checks: "Pre-flight validation endpoint: ... document-count and disk-space checks". However,
_run_preflight_checks()only performs dependency, GPU, and document checks. This meanscan_proceedcan returntrueon low-storage hosts.🔧 Proposed fix to add disk-space check
def _run_preflight_checks( request: FineTuneRequest, ) -> list[PreflightCheck]: """Run all pre-flight validation checks.""" checks: list[PreflightCheck] = [] checks.append(_check_dependencies()) checks.append(_check_gpu()) checks.append(_check_documents(request.source_dir)) + checks.append(_check_disk_space(request.source_dir)) return checks + + +def _check_disk_space(source_dir: str) -> PreflightCheck: + """Check available disk space for fine-tuning artifacts.""" + from pathlib import Path # noqa: PLC0415 + import shutil # noqa: PLC0415 + + try: + src = Path(source_dir) + if not src.exists(): + return PreflightCheck( + name="disk_space", + status="warn", + message="Cannot check disk space (source dir missing)", + ) + usage = shutil.disk_usage(src) + free_gb = usage.free / (1024**3) + if free_gb < 5: # noqa: PLR2004 + return PreflightCheck( + name="disk_space", + status="fail", + message=f"Insufficient disk space ({free_gb:.1f} GB free)", + detail="At least 5 GB required for checkpoints and artifacts", + ) + if free_gb < 10: # noqa: PLR2004 + return PreflightCheck( + name="disk_space", + status="warn", + message=f"Low disk space ({free_gb:.1f} GB free)", + detail="10+ GB recommended", + ) + return PreflightCheck( + name="disk_space", + status="pass", + message=f"{free_gb:.1f} GB available", + ) + except MemoryError, RecursionError: + raise + except Exception as exc: + return PreflightCheck( + name="disk_space", + status="warn", + message=f"Disk space check failed: {type(exc).__name__}", + detail=str(exc), + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/api/controllers/memory.py` around lines 388 - 396, The preflight flow in _run_preflight_checks currently omits disk-space validation; add a call to the disk-space check (e.g. _check_disk_space(...)) into the checks list so disk usage is validated along with _check_dependencies, _check_gpu, and _check_documents; update _run_preflight_checks to append _check_disk_space(request.source_dir) (or appropriate path argument) so the returned checks include the disk-space PreflightCheck used by can_proceed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/page-structure.md`:
- Around line 207-208: The Fine-Tuning sub-page documentation for the memory
namespace (/settings/memory/fine-tuning) omits the checkpoint lifecycle; update
the page-structure text for the Fine-Tuning sub-page to include a concise IA
spec and route note that document checkpoint list, deploy, rollback, and delete
flows (including UI affordances, required API actions, user permissions, and
expected status transitions) alongside the existing pipeline status/history,
preflight, and start/cancel descriptions so operators can discover and manage
checkpoints from the same route.
In `@src/synthorg/api/controllers/memory.py`:
- Around line 248-258: Wrap the activation + settings update in a try/except so
activation is not left half-done: after calling repo.set_active(checkpoint_id)
attempt the settings updates via app_state.settings_service.set(...) inside a
try block; on any exception catch it, roll back the repo activation by restoring
the previous active checkpoint (use repo.get_checkpoint to read prior active id
before changing, or call a repo method to unset/deactivate the new checkpoint)
and re-raise or return a controlled error, and stop relying on
cp.backup_config_json to gate settings updates (use the incoming checkpoint's
model_path or an explicit flag instead) so settings updates reflect the current
deployment intent. Ensure you reference repo.set_active, repo.get_checkpoint,
and app_state.settings_service.set in the fix.
- Around line 284-291: The code assumes cp.backup_config_json is valid JSON and
that repo.get_checkpoint(checkpoint_id) returns a value; update the handler to
catch json.JSONDecodeError around json.loads(cp.backup_config_json) (log the
error with context and skip applying the corrupted backup or raise a controlled
ApiError) and add a null check after updated = await
repo.get_checkpoint(checkpoint_id) to handle a None result (return a 404
ApiError or appropriate ApiResponse error instead of returning None). Use the
existing symbols svc.set, repo.deactivate_all, repo.get_checkpoint,
cp.backup_config_json and ApiResponse/ApiError to locate and implement the fix.
In `@src/synthorg/memory/backends/composite/adapter.py`:
- Around line 382-423: The _fanout_retrieve path currently converts all child
exceptions into warnings and returns an empty results list even when every
backend failed; modify the logic in the async TaskGroup/_fetch closure to track
successful backend completions (e.g., increment a success_count or append
successful backend IDs when entries are retrieved) and after the TaskGroup
finishes, if success_count == 0 raise MemoryRetrievalError (include aggregated
errors for context) instead of returning [] ; keep existing logging for partial
failures (errors list and MEMORY_COMPOSITE_FANOUT_PARTIAL) but ensure callers
see an exception when no backends succeeded.
In `@src/synthorg/memory/backends/inmemory/adapter.py`:
- Around line 171-177: store() currently counts agent_store length before
removing expired entries so expired memories still consume quota; update store()
to prune expired items from self._store[str(agent_id)] before checking len and
enforcing self._max_memories_per_agent, reusing the same expiry logic used in
retrieve() (or extract that logic into a helper like _prune_expired(agent_id)
and call it from store(), retrieve(), and any direct lookup/count methods) so
counts and lookups consistently ignore expired entries and new writes are
allowed once expired items are pruned.
In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 155-157: Validate the validation_split input at the start of the
fine-tune routine (the function that accepts validation_split, ProgressCallback,
CancellationToken) and before any dataset slicing: reject values not strictly
between 0 and 1 and compute split indices such that both train and validation
get at least one example (e.g., enforce min_validation = 1 and min_train = 1,
adjust split to floor/max accordingly), throwing a clear
ValueError/ArgumentError if the dataset is too small or the split cannot satisfy
these constraints; apply the same checks where validation_split is used later
(the alternative slicing path referenced around the current use) so you never
attempt to slice into empty train/validation sets.
- Around line 628-641: The function deploy_checkpoint currently logs config_path
but never applies it; ensure that when settings_service is None and a
config_path is provided you perform the file-based update (backup existing file,
modify the embedder config entry to point to checkpoint_path, and atomically
write the updated config), and when neither settings_service nor config_path can
be used, raise/return an error before logging success; locate deploy_checkpoint
and the references to config_path and settings_service and either implement the
backup+modify+write flow for the config file or explicitly reject that code path
so deployment cannot falsely report success.
In `@src/synthorg/persistence/sqlite/fine_tune_repo.py`:
- Around line 136-143: Change parenthesized except clauses to PEP 758
comma-separated form: replace constructs like "except (sqlite3.Error,
aiosqlite.Error) as exc:" with "except sqlite3.Error, aiosqlite.Error as exc:"
(and similarly for other multi-exception handlers). Update the occurrence in the
fine-tune save block that logs MEMORY_FINE_TUNE_PERSIST_FAILED and raises
QueryError (the except handling around run.id and logger.exception), and apply
the same pattern for all other multi-exception excepts in this module (the other
occurrences noted in the review) so they use the comma-separated syntax without
parentheses.
In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx`:
- Around line 8-30: PipelineProgressBar currently coerces a null progress to 0%
which misrepresents an indeterminate state; update PipelineProgressBar (props:
stage, progress) to treat progress === null as indeterminate by: 1) not
computing/printing pct or the "{pct}%" label when progress is null, 2) omitting
aria-valuenow from the outer progressbar element in that case (keep
aria-valuemin/max), and 3) rendering an indeterminate visual (e.g., a full-width
animated/striped/gradient inner bar or a CSS class like "indeterminate") instead
of setting style.width; make the conditional logic around pct, aria-valuenow and
the inner bar appearance in the PipelineProgressBar function so assistive tech
and visuals reflect indeterminate states correctly.
In `@web/src/pages/FineTuningPage.tsx`:
- Around line 28-31: The component currently only registers a local handler via
useWebSocketStore's onChannelEvent/offChannelEvent, but doesn't request server
delivery for the 'system' channel so progress can stall; update the effect that
registers the handler (where useWebSocketStore, useShallow, onChannelEvent and
offChannelEvent are used) to also call the store's channel subscription API to
subscribe to 'system' when mounting and unsubscribe from 'system' in the cleanup
— i.e., grab the subscribe/unsubscribe functions from useWebSocketStore, call
subscribe('system') before registering the handler, and call offChannelEvent
followed by unsubscribe('system') in the effect cleanup.
---
Duplicate comments:
In `@src/synthorg/api/controllers/memory.py`:
- Around line 388-396: The preflight flow in _run_preflight_checks currently
omits disk-space validation; add a call to the disk-space check (e.g.
_check_disk_space(...)) into the checks list so disk usage is validated along
with _check_dependencies, _check_gpu, and _check_documents; update
_run_preflight_checks to append _check_disk_space(request.source_dir) (or
appropriate path argument) so the returned checks include the disk-space
PreflightCheck used by can_proceed.
In `@src/synthorg/memory/embedding/fine_tune_orchestrator.py`:
- Around line 213-222: The cancel() method currently sets self._cancellation and
logs MEMORY_FINE_TUNE_CANCELLED but does not wait for the background pipeline to
finish; update cancel() (protected by self._op_lock) to await the background
task's completion (e.g., wait on the task/future that is canceled or a stored
asyncio.Task representing the run) with a reasonable timeout, handle
asyncio.TimeoutError and cancellation exceptions by logging and returning, and
ensure the method returns only after the run task has terminated (or document in
the docstring if you intentionally want fire-and-forget behavior). Reference
symbols: cancel(), self._op_lock, self._cancellation, the background task/future
variable that represents the active run, logger, and MEMORY_FINE_TUNE_CANCELLED.
In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py`:
- Around line 64-74: The test helper _request currently creates real files under
tmp_path but returns a FineTuneRequest with synthetic POSIX paths ("/test/src",
"/test/out") which can be confusing; update the _request function to include a
concise docstring that explains it deliberately writes local artifacts for test
diagnostics while returning synthetic POSIX paths because the FineTuneRequest is
used with an orchestrator mock that does not access the filesystem, and mention
FineTuneRequest and the tmp_path behavior so future maintainers understand the
intentional mismatch.
In `@web/src/pages/fine-tuning/CheckpointTable.tsx`:
- Around line 45-120: The map callback in CheckpointTable.tsx contains a large
chunk of JSX for each row — extract that JSX into a new presentational component
(e.g., CheckpointRow) that accepts a checkpoint object (cp) and the action
handlers (deployCheckpointAction, rollbackCheckpointAction, setDeletingId) as
props; move all logic for date/bytes/metric formatting, StatusBadge/MetricDelta
usage, and the conditional Buttons into that component, then replace the inline
map body with <CheckpointRow cp={cp} ... /> to keep the .map() concise and
reuseable while preserving existing symbols like MetricDelta, StatusBadge,
formatBytes, and the three action functions.
In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx`:
- Around line 24-33: The advanced fields (epochs, learningRate, batchSize) are
being sent as hard defaults even when Advanced is collapsed and batchSize never
picks up a later preflight value; change their handling so they act as optional
overrides: stop initializing batchSize/epochs/learningRate to concrete positive
defaults (e.g. '3'/'1e-5'/'128'); instead initialize them to undefined or empty
string and when a preflight arrives set batchSize only if the user hasn't
entered an override (useEffect watching preflight to call
setBatchSize(preflight.recommended_batch_size) only when current batchSize is
empty). Also modify buildRequest() to include epochs, learning_rate, and
batch_size only if the user has explicitly set them or Advanced is open (check
the Advanced toggle state) rather than always sending the fields. Reference:
epochs, setEpochs, learningRate, setLearningRate, batchSize, setBatchSize,
useFineTuningStore.preflight, buildRequest.
In `@web/src/pages/fine-tuning/PipelineStepper.tsx`:
- Around line 24-60: Extract the JSX inside STAGES.map in PipelineStepper into a
new presentational component (e.g., StepItem) to keep the map callback simple;
move the logic that computes isComplete, isCurrent, connectorComplete and the
rendering of the connector div, badge div (checkmark/number), and label span
into that component, accept props like stage, index, currentIdx, and key, and
replace the inline block in STAGES.map with a single StepItem call so the map
callback is under 8 lines and the state handling stays the same.
In `@web/src/pages/fine-tuning/RunHistoryTable.tsx`:
- Around line 45-70: Extract the JSX inside runs.map into a new presentational
component (e.g., RunRow) that accepts a single prop like run: RunSummary (or the
existing run type), moves all formatting/logic (new
Date(run.started_at).toLocaleString(), formatDuration, STAGE_STATUS_MAP lookup,
formatStage, run.config.source_dir, run.stages_completed.length) into that
component, and returns the <tr> markup currently inline; update runs.map to just
render <RunRow key={run.id} run={run} /> and ensure RunRow is typed/exported
appropriately and imports StatusBadge, formatDuration, formatStage, and
STAGE_STATUS_MAP so the visual output and classNames remain identical.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2672fa48-c842-43d4-8bf2-1f99d1e911fe
📒 Files selected for processing (25)
CLAUDE.mddocs/design/memory.mddocs/design/page-structure.mdsrc/synthorg/api/controllers/memory.pysrc/synthorg/api/state.pysrc/synthorg/api/ws_models.pysrc/synthorg/memory/backends/composite/adapter.pysrc/synthorg/memory/backends/inmemory/adapter.pysrc/synthorg/memory/embedding/cancellation.pysrc/synthorg/memory/embedding/fine_tune.pysrc/synthorg/memory/embedding/fine_tune_models.pysrc/synthorg/memory/embedding/fine_tune_orchestrator.pysrc/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/persistence/sqlite/schema.sqltests/unit/memory/embedding/test_fine_tune_models.pytests/unit/memory/embedding/test_fine_tune_orchestrator.pytests/unit/persistence/sqlite/test_fine_tune_repo.pytests/unit/persistence/sqlite/test_migrations.pyweb/src/pages/FineTuningPage.tsxweb/src/pages/fine-tuning/CheckpointTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsxweb/src/pages/fine-tuning/PipelineProgressBar.tsxweb/src/pages/fine-tuning/PipelineStepper.tsxweb/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/stores/fine-tuning.ts
| except (sqlite3.Error, aiosqlite.Error) as exc: | ||
| msg = f"Failed to save fine-tune run {run.id}" | ||
| logger.exception( | ||
| MEMORY_FINE_TUNE_PERSIST_FAILED, | ||
| run_id=run.id, | ||
| error=str(exc), | ||
| ) | ||
| raise QueryError(msg) from exc |
There was a problem hiding this comment.
Use PEP 758 except syntax without parentheses.
Python 3.14 requires except A, B: syntax (comma-separated, no parentheses) per PEP 758. The parenthesized form except (sqlite3.Error, aiosqlite.Error): should be changed to the unpacked form.
This same pattern appears throughout the file at lines 153, 176, 213, 244, 279, 340, 360, 398, 434, 452, 490, 508.
🔧 Suggested fix for all occurrences
- except (sqlite3.Error, aiosqlite.Error) as exc:
+ except sqlite3.Error, aiosqlite.Error as exc:Apply this pattern to all except statements in the file.
As per coding guidelines: "Use except A, B: syntax (no parentheses) for exception handling — PEP 758 except syntax. ruff enforces this on Python 3.14."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except (sqlite3.Error, aiosqlite.Error) as exc: | |
| msg = f"Failed to save fine-tune run {run.id}" | |
| logger.exception( | |
| MEMORY_FINE_TUNE_PERSIST_FAILED, | |
| run_id=run.id, | |
| error=str(exc), | |
| ) | |
| raise QueryError(msg) from exc | |
| except sqlite3.Error, aiosqlite.Error as exc: | |
| msg = f"Failed to save fine-tune run {run.id}" | |
| logger.exception( | |
| MEMORY_FINE_TUNE_PERSIST_FAILED, | |
| run_id=run.id, | |
| error=str(exc), | |
| ) | |
| raise QueryError(msg) from exc |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/persistence/sqlite/fine_tune_repo.py` around lines 136 - 143,
Change parenthesized except clauses to PEP 758 comma-separated form: replace
constructs like "except (sqlite3.Error, aiosqlite.Error) as exc:" with "except
sqlite3.Error, aiosqlite.Error as exc:" (and similarly for other multi-exception
handlers). Update the occurrence in the fine-tune save block that logs
MEMORY_FINE_TUNE_PERSIST_FAILED and raises QueryError (the except handling
around run.id and logger.exception), and apply the same pattern for all other
multi-exception excepts in this module (the other occurrences noted in the
review) so they use the comma-separated syntax without parentheses.
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (3)
web/src/pages/fine-tuning/PipelineProgressBar.tsx (1)
23-38: 🛠️ Refactor suggestion | 🟠 MajorUse the shared UI progress component instead of rebuilding it here.
This page-local component is still hand-rolling determinate and indeterminate progress states with raw
divs. Please reuse the existing progress primitive fromweb/src/components/ui/so behavior, styling, and accessibility stay consistent across the dashboard.As per coding guidelines, "ALWAYS reuse existing components from
web/src/components/ui/before creating new ones."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx` around lines 23 - 38, Replace the hand-rolled progress markup inside the PipelineProgressBar component with the shared UI progress primitive: import the shared Progress component and use it instead of the raw divs, passing pct as the determinate value (e.g., value={pct}) and indeterminate as the prop to toggle the indeterminate state, and preserve aria-label ("Fine-tuning progress") and aria min/max when determinate; remove the duplicated determinate/indeterminate div branches and any inline style-based width logic so the shared component handles styling, animation and accessibility.tests/unit/memory/embedding/test_fine_tune_orchestrator.py (1)
64-81:⚠️ Potential issue | 🟠 MajorUse the real
tmp_pathlocations in_request()instead of/test/....
_mock_all_stages()writes files underoutput_dir, so these placeholder root paths make the tests depend on write access to the filesystem root and still leavesource_dirout of sync with the files created above. Point the request at the temp directories you create here, or make the stage mocks fully in-memory.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py` around lines 64 - 81, The test helper _request() currently returns synthetic "/test/..." paths which makes tests write to the filesystem root and leaves source_dir out of sync with the files you created; change it to point at the real tmp_path locations you create (e.g. use the src Path you create and an output dir under tmp_path) and return them as POSIX strings so FineTuneRequest still rejects Windows drive letters; ensure you also create the output directory before returning so _mock_all_stages() can write files there (references: _request, FineTuneRequest, _mock_all_stages).src/synthorg/api/controllers/memory.py (1)
252-262:⚠️ Potential issue | 🔴 CriticalDeploy and rollback can still leave runtime settings out of sync with checkpoint state.
These flows persist settings and checkpoint flags as separate commits. If a later
svc.set()orrepo.deactivate_all()fails after an earlier step succeeded, the API returns an error while one side has already changed. Move this cross-store transition behind a single service method, or capture and restore the prior settings values in the failure path as well.Also applies to: 313-317
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/api/controllers/memory.py` around lines 252 - 262, The activation flow updates settings and checkpoint state in separate commits which can leave them inconsistent on partial failure; modify the logic around app_state.settings_service (svc) and repo operations (repo.set_active, repo.deactivate_all) so the cross-store transition is atomic: either (A) add a new service method on settings_service that performs both the settings writes (e.g., embedder_model/embedder_provider) and the checkpoint activation in one operation, or (B) capture the prior settings values before calling svc.set (read current "memory.embedder_model" and "memory.embedder_provider"), perform the two svc.set calls, then call repo.set_active/prior handling, and on any exception restore the captured prior settings values and the prior checkpoint state (using repo.set_active(prior.id) or repo.deactivate_all()); ensure the rollback path updates both the settings and repo consistently and reference cp.model_path, svc.set, prior, repo.set_active, and repo.deactivate_all in your changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/api/controllers/memory.py`:
- Around line 425-428: The preflight is calling _check_disk_space with
request.source_dir but the pipeline writes to request.output_dir; change the
call in the checks list to use request.output_dir (i.e., replace
_check_disk_space(request.source_dir) with
_check_disk_space(request.output_dir)) so disk-space is validated against the
actual write destination (or optionally call _check_disk_space for both
request.source_dir and request.output_dir if you want to validate both mounts).
In `@src/synthorg/memory/backends/composite/adapter.py`:
- Around line 357-368: The method _resolve_retrieve_targets currently treats
only query.namespaces is None as "no filter" and returns all backends, while an
empty frozenset returns no targets; update the logic in
_resolve_retrieve_targets to treat falsy namespaces (e.g., empty frozenset) the
same as None by checking "if not query.namespaces" so empty sets return
list(self._unique_backends), leaving the rest of the resolve loop (seen dict +
self._resolve(ns)) unchanged; reference _resolve_retrieve_targets,
MemoryQuery.namespaces, self._unique_backends, and self._resolve when making
this change.
- Around line 370-426: The except clause inside _fanout_retrieve currently uses
"except MemoryError, RecursionError:" which binds to builtins.MemoryError and is
syntactically wrong for grouping; update the imports to include the synthorg
custom MemoryError from synthorg.memory.errors and change the clause to properly
group exceptions (e.g., except (MemoryError, RecursionError):) so synthorg's
MemoryConnectionError/MemoryRetrievalError are re-raised instead of being
swallowed; reference the import block that imports from synthorg.memory.errors
and the _fanout_retrieve function's except clause to make the fixes.
In `@src/synthorg/memory/backends/inmemory/adapter.py`:
- Around line 244-272: The get() method currently returns entries even if
expired, causing inconsistency with retrieve(); update get(self, agent_id,
memory_id) to check the entry with the existing _is_expired(entry) helper and
return None when expired (and optionally log an expiration event similar to
MEMORY_ENTRY_FETCHED); ensure you reference the same keys used in _store lookup
and keep the existing logger usage (MEMORY_ENTRY_FETCHED) but add a debug or
trace log for expired entries so semantics match retrieve().
- Around line 304-335: count() currently tallies all entries from self._store
regardless of expiry, causing mismatch with retrieve() which filters out expired
items; update count() (in src/synthorg/memory/backends/inmemory/adapter.py) to
apply the same expiration check used by retrieve() when iterating
agent_store.values() (i.e., skip entries where the entry is expired based on its
expires_at/expiration logic or by calling the same helper used by retrieve()),
preserve the category filter logic, and log MEMORY_ENTRY_COUNTED with the
filtered total so counts and retrievals are consistent.
In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 706-712: The call to SettingsService.set is passing the timestamp
as a positional 4th argument which violates the keyword-only parameter
expected_updated_at; update the call in fine_tune.py to call
settings_service.set("memory", "embedder_model", checkpoint_path,
expected_updated_at=now) so expected_updated_at is passed by name (refer to
settings_service.set and the local variables checkpoint_path and now) and remove
the positional argument usage.
In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py`:
- Around line 128-130: The test currently suppresses all exceptions from
awaiting orchestrator._current_task using contextlib.suppress(Exception); change
this to either explicitly assert that the background task was cancelled (await
and expect asyncio.CancelledError) or catch only the concrete cancellation
exception type instead of Exception, e.g., handle asyncio.CancelledError around
awaiting orchestrator._current_task (and similarly for the second occurrence at
lines 150-152) so unrelated errors are not silently swallowed and the test fails
for unexpected failures.
---
Duplicate comments:
In `@src/synthorg/api/controllers/memory.py`:
- Around line 252-262: The activation flow updates settings and checkpoint state
in separate commits which can leave them inconsistent on partial failure; modify
the logic around app_state.settings_service (svc) and repo operations
(repo.set_active, repo.deactivate_all) so the cross-store transition is atomic:
either (A) add a new service method on settings_service that performs both the
settings writes (e.g., embedder_model/embedder_provider) and the checkpoint
activation in one operation, or (B) capture the prior settings values before
calling svc.set (read current "memory.embedder_model" and
"memory.embedder_provider"), perform the two svc.set calls, then call
repo.set_active/prior handling, and on any exception restore the captured prior
settings values and the prior checkpoint state (using repo.set_active(prior.id)
or repo.deactivate_all()); ensure the rollback path updates both the settings
and repo consistently and reference cp.model_path, svc.set, prior,
repo.set_active, and repo.deactivate_all in your changes.
In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py`:
- Around line 64-81: The test helper _request() currently returns synthetic
"/test/..." paths which makes tests write to the filesystem root and leaves
source_dir out of sync with the files you created; change it to point at the
real tmp_path locations you create (e.g. use the src Path you create and an
output dir under tmp_path) and return them as POSIX strings so FineTuneRequest
still rejects Windows drive letters; ensure you also create the output directory
before returning so _mock_all_stages() can write files there (references:
_request, FineTuneRequest, _mock_all_stages).
In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx`:
- Around line 23-38: Replace the hand-rolled progress markup inside the
PipelineProgressBar component with the shared UI progress primitive: import the
shared Progress component and use it instead of the raw divs, passing pct as the
determinate value (e.g., value={pct}) and indeterminate as the prop to toggle
the indeterminate state, and preserve aria-label ("Fine-tuning progress") and
aria min/max when determinate; remove the duplicated determinate/indeterminate
div branches and any inline style-based width logic so the shared component
handles styling, animation and accessibility.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5a0627c0-b5bd-4429-b28a-b3760275e3f8
📒 Files selected for processing (14)
docs/design/page-structure.mdsrc/synthorg/api/controllers/memory.pysrc/synthorg/memory/backends/composite/adapter.pysrc/synthorg/memory/backends/inmemory/adapter.pysrc/synthorg/memory/embedding/fine_tune.pysrc/synthorg/memory/embedding/fine_tune_orchestrator.pytests/unit/memory/embedding/test_fine_tune.pytests/unit/memory/embedding/test_fine_tune_orchestrator.pyweb/src/pages/FineTuningPage.tsxweb/src/pages/fine-tuning/CheckpointTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsxweb/src/pages/fine-tuning/PipelineProgressBar.tsxweb/src/pages/fine-tuning/PipelineStepper.tsxweb/src/pages/fine-tuning/RunHistoryTable.tsx
| async def _fanout_retrieve( | ||
| self, | ||
| agent_id: NotBlankStr, | ||
| query: MemoryQuery, | ||
| targets: list[MemoryBackend], | ||
| ) -> list[MemoryEntry]: | ||
| """Fan out retrieve to *targets* with graceful degradation.""" | ||
| if len(targets) == 1: | ||
| entries = await targets[0].retrieve(agent_id, query) | ||
| return [self._rewrite_entry(targets[0], e) for e in entries] | ||
| results: list[MemoryEntry] = [] | ||
| errors: list[str] = [] | ||
| async with asyncio.TaskGroup() as tg: | ||
| for b in targets: | ||
|
|
||
| async def _fetch( | ||
| backend: MemoryBackend = b, | ||
| ) -> None: | ||
| try: | ||
| entries = await backend.retrieve( | ||
| agent_id, | ||
| query, | ||
| ) | ||
| results.extend(self._rewrite_entry(backend, e) for e in entries) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception as exc: | ||
| name = self._backend_to_name.get( | ||
| id(backend), | ||
| "?", | ||
| ) | ||
| errors.append(f"{name}: {exc}") | ||
| logger.warning( | ||
| MEMORY_COMPOSITE_FANOUT_PARTIAL, | ||
| operation="retrieve", | ||
| backend=name, | ||
| error=str(exc), | ||
| ) | ||
|
|
||
| tg.create_task(_fetch()) | ||
| if errors: | ||
| if not results: | ||
| msg = f"All backends failed during retrieve: {'; '.join(errors)}" | ||
| raise MemoryRetrievalError(msg) | ||
| logger.warning( | ||
| MEMORY_COMPOSITE_FANOUT_PARTIAL, | ||
| operation="retrieve", | ||
| failed_backends=errors, | ||
| successful_count=len(results), | ||
| ) | ||
| else: | ||
| logger.debug( | ||
| MEMORY_COMPOSITE_FANOUT_COMPLETE, | ||
| operation="retrieve", | ||
| total=len(results), | ||
| ) | ||
| return results |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify MemoryError is not imported and check if builtins.MemoryError is being caught
echo "=== Checking imports in composite/adapter.py ==="
rg -n "^from synthorg\.memory\.errors import" src/synthorg/memory/backends/composite/adapter.py
echo ""
echo "=== Checking what MemoryError refers to at line 394 ==="
# If MemoryError were imported from synthorg.memory.errors, it would appear in the import block
rg -n "MemoryError" src/synthorg/memory/backends/composite/adapter.py
echo ""
echo "=== Verify synthorg.memory.errors.MemoryError exists ==="
rg -n "^class MemoryError" src/synthorg/memory/errors.pyRepository: Aureliolo/synthorg
Length of output: 372
🏁 Script executed:
sed -n '18,25p' src/synthorg/memory/backends/composite/adapter.pyRepository: Aureliolo/synthorg
Length of output: 277
Critical: MemoryError at line 394 refers to the builtin, not synthorg.memory.errors.MemoryError.
The except clause except MemoryError, RecursionError: intends to re-raise synthorg's memory exceptions, but MemoryError is not imported from synthorg.memory.errors (which defines MemoryConfigError, MemoryConnectionError, and MemoryRetrievalError but not MemoryError in the import block). This catches the builtin builtins.MemoryError (out-of-memory conditions) instead.
Consequence: MemoryConnectionError, MemoryRetrievalError, and other synthorg memory exceptions will fall through to except Exception as exc: and be logged as warnings rather than re-raised. This breaks the error propagation contract and violates the logging guideline that error paths must log and raise (not silently convert exceptions to warnings).
Fix: Add MemoryError to the import from synthorg.memory.errors:
from synthorg.memory.errors import (
MemoryConfigError,
MemoryConnectionError,
+ MemoryError,
MemoryRetrievalError,
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/memory/backends/composite/adapter.py` around lines 370 - 426,
The except clause inside _fanout_retrieve currently uses "except MemoryError,
RecursionError:" which binds to builtins.MemoryError and is syntactically wrong
for grouping; update the imports to include the synthorg custom MemoryError from
synthorg.memory.errors and change the clause to properly group exceptions (e.g.,
except (MemoryError, RecursionError):) so synthorg's
MemoryConnectionError/MemoryRetrievalError are re-raised instead of being
swallowed; reference the import block that imports from synthorg.memory.errors
and the _fanout_retrieve function's except clause to make the fixes.
Implement namespace-based memory routing with two new backends: - InMemoryBackend: dict-based session-scoped backend implementing MemoryBackend + MemoryCapabilities protocols. Supports namespace/ category/tag/time filtering, max_memories_per_agent enforcement, and clear() for session cleanup. - CompositeBackend: dispatches memory operations to child backends based on a new 'namespace' field on MemoryStoreRequest/Query/Entry. Prefixed memory IDs (e.g. 'mem0:abc123') enable O(1) routing on get()/delete(). Fan-out retrieve with graceful degradation. Data model changes: - MemoryStoreRequest: add namespace (default='default') - MemoryQuery: add namespaces filter (frozenset, default None) - MemoryEntry: add namespace (default='default') Also fixes pre-existing circular import between engine.agent_engine and memory.procedural.pipeline by deferring the imports.
Add foundation layer for the fine-tuning pipeline: - FineTuneDependencyError + FineTuneCancelledError in errors.py - CancellationToken (asyncio.Event wrapper) for cooperative cancellation - EVALUATING stage added to FineTuneStage enum - New models: EvalMetrics (computed improvement fields), FineTuneRunConfig, FineTuneRun, CheckpointRecord, PreflightCheck, PreflightResult (computed can_proceed) - Extended FineTuneRequest with override fields + resume_run_id - Extended FineTuneStatus with run_id - 14 new observability event constants for fine-tune lifecycle
Add fine_tune_runs and fine_tune_checkpoints tables to schema.sql. - SQLiteFineTuneRunRepository: save/get/list/update runs, get_active_run, mark_interrupted (startup recovery) - SQLiteFineTuneCheckpointRepository: save/get/list checkpoints, set_active, delete (with active-checkpoint guard), get_active_checkpoint Both repos follow existing patterns: aiosqlite, parameterized queries, QueryError wrapping, structured logging.
Replace NotImplementedError stubs with actual implementations: Stage 1 (generate_training_data): scans source documents, chunks text, generates extractive queries, outputs training.jsonl + validation.jsonl with configurable holdout split. Stage 2 (mine_hard_negatives): lazy-imports sentence-transformers, encodes passages, computes cosine similarity, selects top-k hard negatives with margin filter. GPU required. Stage 3 (contrastive_fine_tune): biencoder training with InfoNCE loss via MultipleNegativesRankingLoss. Configurable epochs, lr, temperature, batch_size. GPU required. Stage 4 (evaluate_checkpoint): NEW stage. Compares fine-tuned vs base model on held-out validation data. Computes NDCG@10 and Recall@10, saves eval_metrics.json. Stage 5 (deploy_checkpoint): backs up current embedder config, updates settings to point to fine-tuned model. All ML deps (torch, sentence-transformers) are lazy-imported with FineTuneDependencyError on failure. All stages accept progress_callback and CancellationToken parameters.
FineTuneOrchestrator manages background pipeline execution: - start(): single-run guard, creates FineTuneRun, spawns background asyncio.Task for sequential stage execution - resume(): loads failed/cancelled run, skips completed stages - cancel(): sets CancellationToken for cooperative stop - recover_interrupted(): marks active runs as FAILED on startup - get_status(): returns current or most recent run status Progress callbacks are throttled to 1/sec. WebSocket events emitted on stage transitions and completion/failure via ChannelsPlugin (best-effort, no crash on missing plugin).
- Add 4 WsEventType values for fine-tune progress/stage/complete/fail
- Add FineTuneOrchestrator slot + property on AppState
- Replace controller stubs with real endpoints:
POST /admin/memory/fine-tune (start with 409 guard)
POST /admin/memory/fine-tune/resume/{run_id}
GET /admin/memory/fine-tune/status
POST /admin/memory/fine-tune/cancel
POST /admin/memory/fine-tune/preflight (deps + GPU checks)
GET /admin/memory/fine-tune/checkpoints (paginated)
POST /admin/memory/fine-tune/checkpoints/{id}/deploy
POST /admin/memory/fine-tune/checkpoints/{id}/rollback
DELETE /admin/memory/fine-tune/checkpoints/{id} (409 if active)
GET /admin/memory/fine-tune/runs (paginated history)
Preflight includes VRAM-based batch size recommendation.
Dashboard page with 6 sub-components: - PipelineControlPanel: start/cancel, source dir, preflight, advanced overrides (epochs, lr, batch size) - PreflightResultPanel: pass/warn/fail badges per check, recommended batch size - PipelineStepper: horizontal 5-stage stepper with completion indicators - PipelineProgressBar: animated progress bar with stage label - CheckpointTable: date, model, doc count, NDCG@10, Recall@10, size, deploy/rollback/delete actions, metric delta badges - RunHistoryTable: date, duration, status, stages completed - DependencyMissingBanner: install instructions when deps missing Zustand store with WebSocket event handling for real-time progress updates. API client with 11 endpoint functions.
Pre-reviewed by 18 specialized agents (docs-consistency, code-reviewer,
python-reviewer, pr-test-analyzer, silent-failure-hunter,
type-design-analyzer, logging-audit, resilience-audit,
conventions-enforcer, security-reviewer, frontend-reviewer,
design-token-audit, api-contract-drift, persistence-reviewer,
test-quality-reviewer, async-concurrency-reviewer, comment-analyzer,
issue-resolution-verifier). 42 findings addressed:
Critical fixes:
- Register FineTuningPage route in router (was unreachable)
- CancellationToken: threading.Event instead of asyncio.Event (thread-safety)
- Replace bare string event names with constants from observability/events
- Fix WS progress events sending stale run data
- Add 4 fine-tune WS event types to frontend WsEventType
Backend fixes:
- Fix wrong event constant (STARTED -> PERSIST_FAILED) in save_run
- Add MemoryError/RecursionError guards before broad except
- Atomic set_active with existence check via BEGIN IMMEDIATE
- Eliminate TOCTOU in delete_checkpoint (conditional DELETE)
- mark_interrupted now updates updated_at + completed_at
- _build_config uses explicit None checks instead of falsy-or
- Wrap _read_jsonl in asyncio.to_thread (was blocking event loop)
- Add done-callback on fire-and-forget pipeline tasks
- Fix progress callback to read self._current_run (not stale closure)
- Add deactivate_all() for rollback (replaces set_active(''))
- Narrow preflight except clauses (catch specific types)
- Add document count preflight check
- Clamp pagination limit to [1, 200]
- Move stdlib json/datetime imports to module level
Frontend fixes:
- Add Fine-Tuning to sidebar navigation
- Add ConfirmDialog for checkpoint deletion
- Extract ACTIVE_STAGES to shared constant
- Add resume_run_id to StartFineTuneRequest
- Fix design token gap-4 -> gap-section-gap/gap-grid-gap (4 files)
- Add role=progressbar + ARIA attrs to PipelineProgressBar
- Add role=alert to DependencyMissingBanner
Schema fixes:
- CHECK constraint on fine_tune_runs.stage
- ON DELETE CASCADE on fine_tune_checkpoints.run_id FK
- Index on fine_tune_runs(updated_at)
Docs:
- CLAUDE.md: update memory/ and persistence/ package descriptions
- web/CLAUDE.md: update domain count, pages, stores
Pre-existing bug fixes:
- Session store: wrap time-sensitive list/revoke calls in _patch_now()
- Migration test: add fine_tune_runs/checkpoints tables + indexes
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 3-5: The module-level docstring incorrectly claims "LLM-powered"
synthetic data generation while the implementation of _generate_query ignores
llm_provider and always uses an extractive fallback; update the module docstring
to state that synthetic data generation currently uses an extractive fallback
(no LLM calls) and reference the limitation of _generate_query, or alternatively
implement LLM usage inside _generate_query to match the docstring—ensure the
module docstring and _generate_query behavior (and its docstring) are
consistent.
- Around line 720-745: The backup_config.json is written unconditionally via
backup_path.write_text even when deployment may be skipped; move the backup
write so it only happens after you confirm deployment will proceed (i.e., after
checking settings_service or config_path and before calling settings_service.set
or writing the config), or alternatively only write when returning a json backup
for downstream use; update the logic around backup_path/backup and the
conditional using settings_service, config_path, checkpoint_path and
MEMORY_FINE_TUNE_CHECKPOINT_DEPLOYED so orphaned backup files are not created
when the function logs the warning and returns early.
- Around line 708-718: The current broad except block around the
settings_service.get call swallows all Exceptions; change it to only catch
settings-related errors by replacing the generic "except Exception" with "except
SettingsError" (while still re-raising MemoryError and RecursionError) so
SettingNotFoundError, SettingsEncryptionError, QueryError, etc. are logged via
MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED and other unexpected exceptions can
propagate; locate the try/except surrounding settings_service.get(...) in the
fine_tune backup logic (references: settings_service.get, backup,
MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED) and update the except clause accordingly.
In `@src/synthorg/persistence/sqlite/fine_tune_repo.py`:
- Around line 40-55: Map row-parsing failures to the repository's QueryError:
wrap calls that can raise (datetime.fromisoformat in _parse_ts and
_parse_required_ts, any json.loads and Pydantic model_validate_json usages that
map DB rows) in a try/except that catches ValueError, TypeError, and
pydantic.ValidationError and re-raises as QueryError with context (similar to
the wrapper used in _checkpoint_from_row); apply the same pattern for the
row-to-model mapping code paths used by list_runs() and list_checkpoints() so a
single corrupted row produces a QueryError instead of bubbling raw exceptions.
- Around line 129-131: The code persists
run.started_at/created_at/updated_at/completed_at by calling .isoformat() which
preserves timezone offsets and breaks lexical ordering of TEXT columns; update
the persistence logic in fine_tune_repo.py to normalize datetimes to UTC before
formatting (e.g., call .astimezone(datetime.timezone.utc) on run.started_at,
run.created_at, run.updated_at and run.completed_at when building the
VALUES/parameters) and then store the UTC ISO string (optionally with 'Z' for
+00:00) so ORDER BY ... DESC on the TEXT column matches chronological order;
apply this change to all places that serialize these fields (the code paths
referencing run.started_at, run.created_at, run.updated_at, run.completed_at and
any insert/update helpers used by get_active_run and list queries).
- Around line 103-104: All write methods (save_run, update_run,
mark_interrupted, save_checkpoint, set_active, deactivate_all,
delete_checkpoint) must acquire a single shared async lock for the underlying
aiosqlite.Connection before starting execute()/commit() and perform a rollback
on error; implement a module-level mapping (e.g., a WeakKeyDictionary or attach
a lock attribute on the Connection keyed by the Connection object) so the same
lock is used across all repository instances that share the same db connection,
then wrap each write method with: await lock.acquire(); try: await
self._db.execute(...); await self._db.commit(); except Exception: await
self._db.rollback(); raise; finally: lock.release(); ensure imports for asyncio
and weakref (or equivalent) are added and apply this pattern to all seven DML
methods listed.
In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx`:
- Around line 41-44: The component is directly calling
useFineTuningStore.setState({ preflight: null }) which bypasses the store's
coordinated actions and can create races; add a clearPreflight action to the
fine-tuning store (e.g., clearPreflight: () => set({ preflight: null, error:
null })) alongside existing runPreflightCheck, then replace the direct setState
call in PipelineControlPanel with the new action (invoke clearPreflight via
useFineTuningStore) so preflight is cleared through the store's action pattern.
- Around line 116-126: In PipelineControlPanel, remove the redundant aria-hidden
attribute from the advanced options container (the div with id
"advanced-options-panel") because the panel is conditionally rendered via
showAdvanced; keep the conditional rendering and ensure the toggle control's
aria-expanded and aria-controls attributes remain on the toggle button so
accessibility is preserved.
In `@web/src/pages/fine-tuning/RunHistoryTable.tsx`:
- Around line 39-40: The progress display uses run.stages_completed.length which
can include duplicates or non-active stages; instead compute the count of unique
stages that are present in ACTIVE_STAGES and render that. In
RunHistoryTable.tsx, replace usage of run.stages_completed.length with a derived
value (e.g., create a Set from run.stages_completed filtered by ACTIVE_STAGES
membership and take its size) so you count only unique, active stages before
outputting "{completedActiveCount}/{ACTIVE_STAGES.size}". Ensure you reference
run.stages_completed and ACTIVE_STAGES in the new computation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 952d81bf-b75e-4943-b9ba-007091054e10
📒 Files selected for processing (6)
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.pytests/unit/memory/backends/composite/test_adapter.pytests/unit/memory/backends/inmemory/test_adapter.pyweb/src/pages/fine-tuning/PipelineControlPanel.tsxweb/src/pages/fine-tuning/RunHistoryTable.tsx
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
web/src/{components,pages}/**/*.tsx
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/{components,pages}/**/*.tsx: Always reuse existing components fromweb/src/components/ui/before creating new ones; every page must compose from shared building blocks
Do not recreate status dots inline; use the<StatusBadge>component from@/components/ui/status-badge
Do not build card-with-header layouts from scratch; use the<SectionCard>component from@/components/ui/section-card
Do not create metric displays withtext-metric font-boldinline; use the<MetricCard>component from@/components/ui/metric-card
Do not render initials circles manually; use the<Avatar>component from@/components/ui/avatar
Do not create complex (>8 line) JSX inside.map()blocks; extract to a shared component instead
Files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{ts,tsx}: Use Tailwind semantic classes (e.g.,text-foreground,bg-card,text-accent,text-success,bg-danger) or CSS variables (e.g.,var(--so-accent)) for colors; never hardcode hex values in.tsx/.tsfiles
Usefont-sansorfont-monofor typography; never setfontFamilydirectly
Use density-aware tokens (e.g.,p-card,gap-section-gap,gap-grid-gap) or standard Tailwind spacing; never hardcode pixel values for layout spacing
Use token variables (e.g.,var(--so-shadow-card-hover),border-border,border-bright) for shadows and borders; never hardcode values
Importcnfrom@/lib/utilsfor conditional class merging in component files
Do not usergba()with hardcoded values; use design token variables instead
Do not use hardcoded Framer Motion transition durations; use presets from@/lib/motioninstead
web/src/**/*.{ts,tsx}: Web dashboard: ALWAYS reuse existing components fromweb/src/components/ui/before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and@/lib/motionpresets.
Web dashboard: design token rules enforced by PostToolUse hook (scripts/check_web_design_system.py)
TypeScript/JavaScript: Node.js 22+, TypeScript 6.0+, ESLint with zero warnings enforced onweb/src/**/*.{ts,tsx}
Web dashboard YAML parsing: usejs-yamllibrary (in package.json)
Files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
web/src/**/*.{ts,tsx,css}
📄 CodeRabbit inference engine (CLAUDE.md)
Web dashboard styling: Tailwind CSS 4 for utility classes; never hardcode pixel values or colors
Files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ required; nofrom __future__ import annotations(PEP 649 native lazy annotations)
Use PEP 758 except syntax:except A, B:(no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: useallow_inf_nan=Falsein allConfigDictdeclarations to rejectNaN/Infin numeric fields at validation time
Use@computed_fieldfor derived values in Pydantic models instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Files:
src/synthorg/memory/embedding/fine_tune.pytests/unit/memory/backends/inmemory/test_adapter.pytests/unit/memory/backends/composite/test_adapter.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code. Exception:observability/setup.py,observability/sinks.py,observability/syslog_handler.py, andobservability/http_handler.pymay use stdlibloggingandprint(..., file=sys.stderr)for handler construction
Logger variable name must always belogger(not_logger, notlog)
Event names: always use constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Structured kwargs in logging: alwayslogger.info(EVENT, key=value)-- neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging for object creation, internal flow, entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001. Vendor names allowed only in: Operations design page,.claude/files, third-party import paths, andsrc/synthorg/providers/presets.py
Files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Vendor names in test files must usetest-provider,test-small-001, etc.
Test markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing:asyncio_mode = "auto"in conftest -- no manual@pytest.mark.asyncioneeded
Default timeout: 30 seconds per test (global inpyproject.toml). Non-default overrides liketimeout(60)are allowed but avoid adding per-file markers.
Prefer@pytest.mark.parametrizefor testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given+@settings). Profiles intests/conftest.py:ci(deterministic, 10 examples),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples). Controlled viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit@example(...)decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep(). For indefinite blocking, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/backends/inmemory/test_adapter.pytests/unit/memory/backends/composite/test_adapter.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/backends/inmemory/test_adapter.pytests/unit/memory/backends/composite/test_adapter.py
🧠 Learnings (51)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.
📚 Learning: 2026-03-27T22:32:26.927Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T22:32:26.927Z
Learning: Applies to web/src/components/ui/*.{tsx,ts} : For new shared React components: place in web/src/components/ui/ with kebab-case filename, create .stories.tsx with all states, export props as TypeScript interface, use design tokens exclusively
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.stories.tsx : Create a `.stories.tsx` file alongside each new shared component in `web/src/components/ui/` with all states (default, hover, loading, error, empty)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-27T12:44:29.466Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/stores/**/*.{ts,tsx} : Use Zustand stores in web dashboard for state management (auth, WebSocket, toast, analytics, domain shells)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:20:08.544Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from web/src/components/ui/ (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast/ToastContainer, Skeleton variants, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem, Drawer, form fields, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor) before creating new components
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create complex (>8 line) JSX inside `.map()` blocks; extract to a shared component instead
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-02T12:21:16.739Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : Do NOT create complex (>8 line) JSX inside `.map()` -- extract to a shared component
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:41:40.176Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside `.map()`; extract to a shared component
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:20:08.544Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside .map()—extract to a shared component
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-02T12:21:16.739Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup, Drawer, InputField, SelectField, SliderField, ToggleField, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor, TagInput, MetadataGrid, ProjectStatusBadge, ContentTypeBadge)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not recreate status dots inline; use the `<StatusBadge>` component from `@/components/ui/status-badge`
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-27T12:44:29.466Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from `web/src/components/ui/` (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem) before creating new ones
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-31T14:31:11.894Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:31:11.894Z
Learning: Applies to web/src/**/*.{ts,tsx} : Use React 19, TypeScript 6.0+, and design system tokens from shadcn/ui + Tailwind CSS 4 + Radix UI in web dashboard
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-15T21:49:53.264Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T21:49:53.264Z
Learning: Fix everything valid — never skip when review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes). No deferring, no 'out of scope' skipping.
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-01T05:46:17.064Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T05:46:17.064Z
Learning: Fix all valid issues found by review agents (including pre-existing issues in surrounding code, suggestions, and adjacent findings) — never skip or defer
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to web/src/**/*.{ts,tsx} : Web dashboard: ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and `@/lib/motion` presets.
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-30T10:41:40.176Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones; refer to design system inventory (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, etc.)
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Always reuse existing components from `web/src/components/ui/` before creating new ones; every page must compose from shared building blocks
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-30T10:20:08.544Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Web dashboard shadows/borders: use token variables (var(--so-shadow-card-hover), border-border, border-bright); never hardcode shadow or border values
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create metric displays with `text-metric font-bold` inline; use the `<MetricCard>` component from `@/components/ui/metric-card`
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.tsx : Place new shared components in `web/src/components/ui/` with descriptive kebab-case filenames
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (without parentheses) per PEP 758 for exception handling in Python 3.14
Applied to files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions
Applied to files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) per PEP 758 exception syntax on Python 3.14
Applied to files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (no parentheses) for exception handling — PEP 758 exception syntax enforced by ruff on Python 3.14
Applied to files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions
Applied to files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Use PEP 758 except syntax with `except A, B:` (no parentheses) for multiple exceptions—ruff enforces this on Python 3.14.
Applied to files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Handle errors explicitly—never silently swallow exceptions.
Applied to files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-19T21:11:37.538Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T21:11:37.538Z
Learning: Applies to **/*.py : Errors: handle explicitly, never silently swallow.
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) — enforced by ruff on Python 3.14
Applied to files:
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-04-02T21:38:30.127Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T21:38:30.127Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Handle errors explicitly; never silently swallow exceptions
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Handle errors explicitly, never silently swallow them. All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-16T20:14:00.937Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src/synthorg/**/*.py,tests/**/*.py,web/src/**/*.{ts,tsx}} : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small`
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/memory/**/*.py : Use MemoryBackend protocol with pluggable backends (Mem0 adapter available at backends/mem0/) for persistent agent memory
Applied to files:
tests/unit/memory/backends/inmemory/test_adapter.pytests/unit/memory/backends/composite/test_adapter.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
tests/unit/memory/backends/inmemory/test_adapter.pytests/unit/memory/backends/composite/test_adapter.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
tests/unit/memory/backends/inmemory/test_adapter.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to tests/**/*.py : Never skip, dismiss, or ignore flaky tests; fix them fully and fundamentally by mocking `time.monotonic()` and `asyncio.sleep()` for determinism or using `asyncio.Event().wait()` for indefinite blocking
Applied to files:
tests/unit/memory/backends/inmemory/test_adapter.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/memory/backends/inmemory/test_adapter.py
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to tests/**/*.py : Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()`. For indefinite blocking, use `asyncio.Event().wait()` instead of `asyncio.sleep(large_number)`.
Applied to files:
tests/unit/memory/backends/inmemory/test_adapter.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/memory/backends/inmemory/test_adapter.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to tests/**/*.py : Flaky tests: NEVER skip, dismiss, or ignore flaky tests -- always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/memory/backends/inmemory/test_adapter.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/persistence/**/*.py : Persistence uses pluggable PersistenceBackend protocol. SQLite is the initial backend. Settings use SettingsRepository (namespaced settings CRUD).
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Persistence backend: pluggable PersistenceBackend protocol in `src/synthorg/persistence/`, SQLite initial, SettingsRepository (namespaced settings CRUD).
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) -- ruff enforces this on Python 3.14
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.py
🔇 Additional comments (15)
web/src/pages/fine-tuning/RunHistoryTable.tsx (1)
74-76: Nice cleanup of table row rendering.Extracting row markup into
RunRowkeeps the.map()body minimal and easier to maintain.web/src/pages/fine-tuning/PipelineControlPanel.tsx (4)
1-11: LGTM!Imports are well-organized with proper separation of React primitives, API types/constants, UI components from the shared library, and local component imports.
12-31: LGTM!State setup uses the correct Zustand 5.0 pattern with
useShallowfor selective subscription. ThebatchSizeTouchedRefpattern correctly tracks user interaction without causing re-renders.
46-65: LGTM!The
isActivederivation andbuildRequestfunction are well-implemented. The numeric validation correctly handles empty strings, NaN, and non-positive values before including optional fields in the request.
75-101: LGTM!The JSX structure properly uses shared
ButtonandInputFieldcomponents with appropriate variants, design tokens for spacing (gap-section-gap,gap-4,gap-2), and conditional rendering for the Cancel/Start button based onisActivestate.tests/unit/memory/backends/composite/test_adapter.py (1)
1-510: Comprehensive test coverage for CompositeBackend.The test module thoroughly covers:
- Constructor validation (unknown backends in routes/default)
- Protocol conformance
- Lifecycle fan-out (connect/disconnect/health_check)
- Namespace-based routing with fallback to default
- Prefixed ID semantics (format, parsing, error cases)
- Retrieve fan-out with graceful degradation on partial failures
- Count aggregation and capabilities
The past review comment regarding the assertion strength in the degraded fanout test (line 386) has been addressed—now correctly asserts
len(result) == 1.tests/unit/memory/backends/inmemory/test_adapter.py (2)
1-542: Comprehensive test coverage for InMemoryBackend.The test module thoroughly covers:
- Protocol conformance and capability properties
- Lifecycle (connect/disconnect/health_check) with idempotency tests
- Connection guard for all CRUD operations
- Store/Get with namespace preservation and per-agent isolation
- Retrieve filtering (category, namespace, tags, text search, limit, ordering)
- Count with category filtering
- Max memories enforcement (constructor and runtime)
- Expiry behavior (get/retrieve/count exclude expired entries)
- Clear helper with agent isolation
The expiry tests (lines 461-512) correctly use
model_copy(update=...)to work with frozenMemoryEntrymodels, and the approach of setting a past timestamp is deterministic without requiring time mocking.
463-477: Expiry test approach is sound.Using
model_copy(update={"expires_at": past})to replace the stored entry is the correct pattern for frozen Pydantic models. Setting an explicit past timestamp (datetime.now(tz=UTC) - timedelta(hours=1)) is deterministic and avoids the complexity of time mocking for these straightforward expiration checks.src/synthorg/memory/embedding/fine_tune.py (7)
1-40: LGTM!Module docstring accurately reflects the 5-stage pipeline, imports are well-organized with TYPE_CHECKING guards for type-only imports, and logger setup follows project conventions.
58-95: LGTM!Lazy import pattern with friendly error messages and proper logging before raising
FineTuneDependencyErroris well-implemented.
137-154: LGTM!The
_scan_documentsfunction now properly catchesUnicodeDecodeErrorexplicitly and logs the skipped file with context, rather than silently ignoring invalid bytes.
196-227: LGTM!Proper validation of
validation_splitbounds and minimum pair count, with split index calculation ensuring both training and validation sets are non-empty.
260-342: LGTM!Hard negative mining implementation is correct with proper input validation, lazy imports, cancellation checks at reasonable intervals (every 50 queries), and numerically stable cosine similarity computation.
552-587: LGTM!Cancellation checks are properly placed between each expensive
encode()call (lines 562-563, 569-570, 579-580), addressing the cooperative-cancellation contract for the evaluation stage.
448-477: No action needed. The_progress_hookcallback signature(score: float, epoch: int, steps: int) -> Nonematches the sentence-transformers API exactly as documented in the official source code and reference guide.> Likely an incorrect or invalid review comment.
| Five-stage offline pipeline for domain-specific embedding fine-tuning: | ||
|
|
||
| 1. Synthetic data generation (LLM-powered, no manual annotation) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Module docstring claims LLM-powered generation, but implementation uses extractive fallback only.
The module docstring at line 5 states "Synthetic data generation (LLM-powered, no manual annotation)" but _generate_query() explicitly ignores llm_provider and always returns extractive fallback (line 248: _ = llm_provider). The function docstring correctly documents this limitation, but the module-level documentation is misleading.
Consider updating the module docstring to reflect actual behavior:
📝 Suggested docstring update
-1. Synthetic data generation (LLM-powered, no manual annotation)
+1. Synthetic data generation (extractive fallback; LLM support planned)Also applies to: 237-254
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/memory/embedding/fine_tune.py` around lines 3 - 5, The
module-level docstring incorrectly claims "LLM-powered" synthetic data
generation while the implementation of _generate_query ignores llm_provider and
always uses an extractive fallback; update the module docstring to state that
synthetic data generation currently uses an extractive fallback (no LLM calls)
and reference the limitation of _generate_query, or alternatively implement LLM
usage inside _generate_query to match the docstring—ensure the module docstring
and _generate_query behavior (and its docstring) are consistent.
| try: | ||
| sv = await settings_service.get("memory", key) | ||
| if sv and hasattr(sv, "value") and sv.value: | ||
| backup[key] = sv.value | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception: | ||
| logger.warning( | ||
| MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED, | ||
| key=key, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the exception type raised by SettingsService.get() when a setting is not found
ast-grep --pattern $'async def get(
self,
$$$
) -> $_:
$$$
'
# Also check for custom exception types in the settings module
rg -n 'class.*Error.*Exception' src/synthorg/settings/Repository: Aureliolo/synthorg
Length of output: 50374
🏁 Script executed:
#!/bin/bash
# Find SettingsService interface/protocol
rg -n "class SettingsService\|class.*SettingsRepo\|def get" src/synthorg/persistence/settings* -A 5
# Also look for QueryError definition
rg -n "class QueryError" src/synthorg/
# Check what's imported in settings_repo
head -20 src/synthorg/persistence/sqlite/settings_repo.pyRepository: Aureliolo/synthorg
Length of output: 702
🏁 Script executed:
#!/bin/bash
# Find SettingsService protocol/interface
rg -n "class.*SettingsService\|SettingsRepository" src/synthorg/persistence/ --type py
# Look at fine_tune.py to see settings_service type
sed -n '1,50p' src/synthorg/memory/embedding/fine_tune.py
# Check imports in fine_tune.py
head -30 src/synthorg/memory/embedding/fine_tune.py | grep -E "^import|^from"Repository: Aureliolo/synthorg
Length of output: 2010
🏁 Script executed:
#!/bin/bash
# Find SettingsRepository protocol
rg -n "protocol.*Settings\|class.*SettingsRepository" src/synthorg/persistence/
# Find deploy_checkpoint function to see the type of settings_service
rg -n "async def deploy_checkpoint" src/synthorg/memory/embedding/fine_tune.py -A 3
# Find SettingsRepository in protocol files
find src/synthorg/persistence -name "*.py" -type f | xargs grep -l "SettingsRepository" | head -5Repository: Aureliolo/synthorg
Length of output: 345
🏁 Script executed:
#!/bin/bash
# Find SettingsRepository protocol definition
rg -n "class SettingsRepository" src/synthorg/persistence/protocol.py -A 30
# Also check for any method signatures
rg -n "async def get" src/synthorg/persistence/protocol.py -B 2 -A 8Repository: Aureliolo/synthorg
Length of output: 348
🏁 Script executed:
#!/bin/bash
# Find SettingsRepository protocol more carefully
rg -n "class SettingsRepository" src/synthorg/persistence/protocol.py -B 2 -A 50 | head -70
# Also check what settings_service is passed to deploy_checkpoint
rg -n "settings_service.*:" src/synthorg/memory/embedding/fine_tune.py | head -10Repository: Aureliolo/synthorg
Length of output: 303
🏁 Script executed:
#!/bin/bash
# Get the exact code snippet being reviewed (lines 708-718)
sed -n '708,718p' src/synthorg/memory/embedding/fine_tune.py
# Also get broader context around it to understand the loop and settings_service usage
sed -n '700,730p' src/synthorg/memory/embedding/fine_tune.py
# Check if there's a SettingsRepository protocol in protocol.py
grep -n "SettingsRepository\|class.*Settings" src/synthorg/persistence/protocol.py | head -20Repository: Aureliolo/synthorg
Length of output: 1522
🏁 Script executed:
#!/bin/bash
# Check if settings_service.get() actually returns tuple or an object with .value
# Look at the actual SettingsRepository.get() method signature in protocol.py
grep -n "async def get" src/synthorg/persistence/protocol.py -B 5 -A 15 | grep -A 15 "namespace"
# Also check QueryError class hierarchy
grep -n "class QueryError" src/synthorg/persistence/errors.py -A 5
# Check what exceptions are documented as possible from get()
grep -n "Raises:" src/synthorg/persistence/sqlite/settings_repo.py -A 3Repository: Aureliolo/synthorg
Length of output: 307
🏁 Script executed:
#!/bin/bash
# Get the full SettingsRepository protocol definition
sed -n '1,500p' src/synthorg/persistence/protocol.py | grep -A 30 "class SettingsRepository"Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Search more broadly for SettingsRepository
rg "SettingsRepository" src/synthorg/persistence/protocol.py -B 3 -A 30
# Also check if there's a settings service protocol elsewhere
find src/synthorg -name "*.py" -type f | xargs grep -l "SettingsRepository" | head -3Repository: Aureliolo/synthorg
Length of output: 2553
🏁 Script executed:
#!/bin/bash
# Find the actual SettingsRepository protocol definition
rg "class SettingsRepository" src/synthorg/persistence/ -B 2 -A 30
# Also check the repositories.py file
grep -n "SettingsRepository" src/synthorg/persistence/repositories.py -B 5 -A 20Repository: Aureliolo/synthorg
Length of output: 3704
🏁 Script executed:
#!/bin/bash
# Check the actual line 708-718 again and see what sv really is
sed -n '708,725p' src/synthorg/memory/embedding/fine_tune.py
# Search for other uses of settings_service.get() to see the pattern
rg "settings_service\.get\(" src/ -B 2 -A 3 | head -40Repository: Aureliolo/synthorg
Length of output: 4102
🏁 Script executed:
#!/bin/bash
# Check if there's a different SettingsService (not Repository) that has different behavior
rg "class SettingsService" src/ -A 15
# Check how the code actually uses the result - is .value actually valid somewhere?
rg "\.value" src/synthorg/memory/embedding/fine_tune.py -B 3 -A 1
# Look at the git history or find where this pattern with .value comes from
rg "hasattr.*value" src/synthorg/memory/ -B 2 -A 2Repository: Aureliolo/synthorg
Length of output: 1904
🏁 Script executed:
#!/bin/bash
# Find SettingsService.get() method
rg "class SettingsService" src/synthorg/settings/service.py -A 100 | grep -A 30 "async def get"
# Or just get the method directly
sed -n '1,500p' src/synthorg/settings/service.py | grep -A 30 "async def get"Repository: Aureliolo/synthorg
Length of output: 5095
🏁 Script executed:
#!/bin/bash
# Find SettingNotFoundError definition and SettingValue class
rg "class SettingNotFoundError\|class SettingValue" src/synthorg/settings/ -A 5
# Check what exceptions SettingsService.get() might raise besides SettingNotFoundError
sed -n '1,300p' src/synthorg/settings/service.py | grep -A 50 "async def get" | head -70Repository: Aureliolo/synthorg
Length of output: 2670
🏁 Script executed:
#!/bin/bash
# Find SettingValue and SettingNotFoundError
rg "class SettingValue\|class SettingNotFoundError\|class SettingsEncryptionError" src/synthorg/settings/ -A 3
# Understand the full flow - what exceptions can bubble up
sed -n '1,150p' src/synthorg/settings/service.py | head -100Repository: Aureliolo/synthorg
Length of output: 3519
🏁 Script executed:
#!/bin/bash
# Find SettingValue class definition
rg "class SettingValue" src/synthorg/settings/ -A 10
# Find all possible exceptions that could be raised
rg "class.*Error.*Exception" src/synthorg/settings/errors.py -A 2Repository: Aureliolo/synthorg
Length of output: 822
Backup-read exception handling is overly broad and may hide real failures.
Lines 712-718 catch all Exception types except MemoryError and RecursionError, including legitimate settings errors that indicate deployment cannot proceed safely. SettingsService.get() can raise SettingNotFoundError (missing key), SettingsEncryptionError (decryption failure), or QueryError (database/connectivity failure). All of these should be caught and logged, but catching the generic Exception class risks hiding unexpected errors.
Catch SettingsError (the base class for all settings-related exceptions) instead, allowing other unexpected exceptions to propagate and abort deployment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/memory/embedding/fine_tune.py` around lines 708 - 718, The
current broad except block around the settings_service.get call swallows all
Exceptions; change it to only catch settings-related errors by replacing the
generic "except Exception" with "except SettingsError" (while still re-raising
MemoryError and RecursionError) so SettingNotFoundError,
SettingsEncryptionError, QueryError, etc. are logged via
MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED and other unexpected exceptions can
propagate; locate the try/except surrounding settings_service.get(...) in the
fine_tune backup logic (references: settings_service.get, backup,
MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED) and update the except clause accordingly.
| def __init__(self, db: aiosqlite.Connection) -> None: | ||
| self._db = db |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In aiosqlite 0.22.1, if multiple coroutines share one Connection and each does await execute(...)followed later byawait commit()orawait rollback(), are transactions scoped to the connection so that statements from different coroutines can interleave into the same transaction?
💡 Result:
Yes, in aiosqlite 0.22.1, transactions are scoped to the connection. When multiple coroutines share one Connection and each calls await execute(...) followed later by await commit or await rollback, statements from different coroutines can interleave into the same transaction because all operations are serialized through a single shared request queue executed sequentially in a background thread.
Citations:
- 1: https://aiosqlite.omnilib.dev/en/v0.22.0/
- 2: https://github.com/omnilib/aiosqlite/blob/main/aiosqlite/core.py
- 3: https://aiosqlite.omnilib.dev/en/stable
- 4: commit parity with sqlite3 doesn't exist with
conn.commit()omnilib/aiosqlite#286 - 5: https://stackoverflow.com/questions/53908615/reusing-aiosqlite-connection/53963739
🏁 Script executed:
#!/bin/bash
# First, let's read the file to confirm the patterns described
wc -l src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 115
🏁 Script executed:
# Read the __init__ method and overall structure
head -120 src/synthorg/persistence/sqlite/fine_tune_repo.py | tail -30Repository: Aureliolo/synthorg
Length of output: 1113
🏁 Script executed:
# Read a write method to see the execute/commit pattern
sed -n '106,150p' src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 1946
🏁 Script executed:
# Check error handling in another write method
sed -n '222,260p' src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 1544
🏁 Script executed:
# Check if there's any existing synchronization or transaction management
rg -A 3 "def set_active|def save_run|async def.*await.*commit" src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 402
🏁 Script executed:
# Let's look at the set_active method to see how it handles transactions
sed -n '301,350p' src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 2078
🏁 Script executed:
# Check all DML methods for transaction cleanup patterns
rg -B 2 -A 10 "await self._db.commit\(\)" src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 3247
🏁 Script executed:
# Check for any rollback calls
rg "rollback" src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 244
🏁 Script executed:
# Let's see the full mark_interrupted and delete_run methods
sed -n '253,310p' src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 1864
🏁 Script executed:
# Let's check the set_active method fully to see the rollback pattern
sed -n '407,450p' src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 1732
🏁 Script executed:
# Check if there's any locking or async context manager used elsewhere
rg -B 3 -A 5 "asyncio.Lock|Semaphore|contextlib.suppress" src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 471
🏁 Script executed:
# Check how these repositories are instantiated and used
rg -B 2 -A 2 "SQLiteFineTuneRunRepository\|SQLiteFineTuneCheckpointRepository" src/synthorg/persistence/ --type pyRepository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Check the imports to see what's available
head -30 src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 772
🏁 Script executed:
# Check how these repositories are used - are they called concurrently?
rg "SQLiteFineTuneRunRepository\|SQLiteFineTuneCheckpointRepository" src/ --type py -B 2 -A 2Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Search for the instantiation point and how many instances exist
rg "SQLiteFineTune.*Repository\(" src/ --type py -B 3 -A 3Repository: Aureliolo/synthorg
Length of output: 2553
🏁 Script executed:
# Check if there's a shared connection pool or factory
fd -t f "*.py" src/synthorg/persistence/ | xargs grep -l "aiosqlite\|Connection" | head -10Repository: Aureliolo/synthorg
Length of output: 345
🏁 Script executed:
# Look at the actual startup/factory code
find src/synthorg/persistence -name "*.py" -type f | xargs grep -l "async def.*init\|def.*init" | head -5Repository: Aureliolo/synthorg
Length of output: 307
🏁 Script executed:
# Check if there's a background worker or other concurrent access pattern
rg -i "background\|worker\|task.*fine.tune\|concurrent" src/synthorg/memory/embedding/ --type py -B 2 -A 2Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Look at memory integration to see if run repo is accessed from multiple places concurrently
rg "SQLiteFineTuneRunRepository\|FineTuneRun" src/synthorg/memory/embedding/ --type py -lRepository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Check the database initialization to understand connection handling
rg -B 5 -A 10 "aiosqlite.connect\|get_db_connection" src/synthorg/persistence/ --type pyRepository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Verify the actual error handling - check deactivate_all which has no rollback
sed -n '445,460p' src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 640
🏁 Script executed:
# Search more carefully for concurrent access patterns - look at fine_tune_pipeline or worker
find src/synthorg -name "*.py" -type f | xargs grep -l "FineTuneRun\|fine_tune.*worker\|background" 2>/dev/null | head -10Repository: Aureliolo/synthorg
Length of output: 485
🏁 Script executed:
# Look at the actual transaction semantics - does aiosqlite require explicit transaction begin?
rg "BEGIN\|autocommit\|isolation" src/synthorg/persistence/sqlite/ --type pyRepository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Count the write methods to verify the scope of the issue
rg "async def (save|update|mark|delete|deactivate|set_active)" src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 388
🏁 Script executed:
# Check if the db connection is actually a shared singleton
rg -B 10 -A 5 "get_db_connection\|Depends.*db\|provide.*db" src/synthorg/api/ --type py | head -50Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Look at a few write methods more closely to confirm no rollback in most
sed -n '176,220p' src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 1597
🏁 Script executed:
# Check save_checkpoint for rollback presence
sed -n '304,347p' src/synthorg/persistence/sqlite/fine_tune_repo.pyRepository: Aureliolo/synthorg
Length of output: 1952
Make transaction boundaries explicit on the shared SQLite connection.
Given this feature set has concurrent control APIs and background workers, multiple coroutines can access the same aiosqlite.Connection simultaneously. All write methods here (save_run, update_run, mark_interrupted, save_checkpoint, set_active, deactivate_all, delete_checkpoint) follow a vulnerable pattern: they await execute() and commit() separately on the same connection, allowing statements from different coroutines to interleave into the same transaction. This can cause one repository call to commit or roll back another call's work. Cleanup is also inconsistent—only set_active and delete_checkpoint attempt rollback on errors, while most other write methods skip cleanup entirely, leaving failed transactions to interfere with concurrent operations.
🛠️ Suggested direction
+import asyncio
+
class SQLiteFineTuneRunRepository:
@@
- def __init__(self, db: aiosqlite.Connection) -> None:
+ def __init__(
+ self,
+ db: aiosqlite.Connection,
+ write_lock: asyncio.Lock,
+ ) -> None:
self._db = db
+ self._write_lock = write_lock
@@
async def save_run(self, run: FineTuneRun) -> None:
"""Persist a run (upsert semantics)."""
- try:
- await self._db.execute(
- ...
- )
- await self._db.commit()
- except (sqlite3.Error, aiosqlite.Error) as exc:
- msg = f"Failed to save fine-tune run {run.id}"
- ...
- raise QueryError(msg) from exc
+ async with self._write_lock:
+ try:
+ await self._db.execute(
+ ...
+ )
+ await self._db.commit()
+ except (sqlite3.Error, aiosqlite.Error) as exc:
+ await self._db.rollback()
+ msg = f"Failed to save fine-tune run {run.id}"
+ ...
+ raise QueryError(msg) from excApply the shared lock/rollback pattern to all seven DML methods in both repositories. The lock must be shared across all repository instances that use the same aiosqlite.Connection; a per-instance lock is insufficient.
Also applies to: 106–143, 176–213, 222–251, 253–291, 304–347, 407–443, 445–458
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/persistence/sqlite/fine_tune_repo.py` around lines 103 - 104,
All write methods (save_run, update_run, mark_interrupted, save_checkpoint,
set_active, deactivate_all, delete_checkpoint) must acquire a single shared
async lock for the underlying aiosqlite.Connection before starting
execute()/commit() and perform a rollback on error; implement a module-level
mapping (e.g., a WeakKeyDictionary or attach a lock attribute on the Connection
keyed by the Connection object) so the same lock is used across all repository
instances that share the same db connection, then wrap each write method with:
await lock.acquire(); try: await self._db.execute(...); await self._db.commit();
except Exception: await self._db.rollback(); raise; finally: lock.release();
ensure imports for asyncio and weakref (or equivalent) are added and apply this
pattern to all seven DML methods listed.
| run.started_at.isoformat(), | ||
| run.updated_at.isoformat(), | ||
| run.completed_at.isoformat() if run.completed_at else None, |
There was a problem hiding this comment.
Normalize persisted datetimes to UTC before storing them as TEXT.
started_at and created_at are later ordered with plain ORDER BY ... DESC, but src/synthorg/persistence/sqlite/schema.sql:377-401 stores them as TEXT. Writing arbitrary timezone offsets via .isoformat() means lexical order stops matching chronological order, so get_active_run() and the list queries can return the wrong “latest” row when offsets differ.
🕒 Proposed fix
+def _serialize_ts(value: datetime | None) -> str | None:
+ if value is None:
+ return None
+ dt = value if value.tzinfo is not None else value.replace(tzinfo=UTC)
+ return dt.astimezone(UTC).isoformat()
+
@@
- run.started_at.isoformat(),
- run.updated_at.isoformat(),
- run.completed_at.isoformat() if run.completed_at else None,
+ _serialize_ts(run.started_at),
+ _serialize_ts(run.updated_at),
+ _serialize_ts(run.completed_at),
@@
- run.updated_at.isoformat(),
- run.completed_at.isoformat() if run.completed_at else None,
+ _serialize_ts(run.updated_at),
+ _serialize_ts(run.completed_at),
@@
- checkpoint.created_at.isoformat(),
+ _serialize_ts(checkpoint.created_at),Also applies to: 168-171, 207-210, 237-238, 334-334, 392-395
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/persistence/sqlite/fine_tune_repo.py` around lines 129 - 131,
The code persists run.started_at/created_at/updated_at/completed_at by calling
.isoformat() which preserves timezone offsets and breaks lexical ordering of
TEXT columns; update the persistence logic in fine_tune_repo.py to normalize
datetimes to UTC before formatting (e.g., call
.astimezone(datetime.timezone.utc) on run.started_at, run.created_at,
run.updated_at and run.completed_at when building the VALUES/parameters) and
then store the UTC ISO string (optionally with 'Z' for +00:00) so ORDER BY ...
DESC on the TEXT column matches chronological order; apply this change to all
places that serialize these fields (the code paths referencing run.started_at,
run.created_at, run.updated_at, run.completed_at and any insert/update helpers
used by get_active_run and list queries).
| // Clear stale preflight when sourceDir changes. | ||
| useEffect(() => { | ||
| useFineTuningStore.setState({ preflight: null }) | ||
| }, [sourceDir]) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Avoid direct setState for coordinated store fields.
Calling useFineTuningStore.setState({ preflight: null }) bypasses the store's action pattern. Per the store implementation, runPreflightCheck coordinates state updates with set({ loading: true, preflight: null, error: null }). Direct setState can race with WebSocket-driven updates and leaves loading/error in inconsistent states.
Consider adding a dedicated clearPreflight action to the store, or invoking an existing action that resets preflight state.
♻️ Proposed fix: Add store action
In web/src/stores/fine-tuning.ts, add a clearPreflight action:
clearPreflight: () => set({ preflight: null, error: null }),Then in this component:
+ const { status, preflight, loading, startRun, cancelRun, runPreflightCheck, clearPreflight } =
useFineTuningStore(useShallow((s) => ({
status: s.status,
preflight: s.preflight,
loading: s.loading,
startRun: s.startRun,
cancelRun: s.cancelRun,
runPreflightCheck: s.runPreflightCheck,
+ clearPreflight: s.clearPreflight,
})))
// Clear stale preflight when sourceDir changes.
useEffect(() => {
- useFineTuningStore.setState({ preflight: null })
+ clearPreflight()
}, [sourceDir])Based on learnings: "The runPreflightCheck action properly manages preflight state using set({ loading: true, preflight: null, error: null }). Using direct setState() calls in components bypasses this coordinated state management and could overwrite concurrent updates from WebSocket handlers."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx` around lines 41 - 44, The
component is directly calling useFineTuningStore.setState({ preflight: null })
which bypasses the store's coordinated actions and can create races; add a
clearPreflight action to the fine-tuning store (e.g., clearPreflight: () =>
set({ preflight: null, error: null })) alongside existing runPreflightCheck,
then replace the direct setState call in PipelineControlPanel with the new
action (invoke clearPreflight via useFineTuningStore) so preflight is cleared
through the store's action pattern.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/memory/embedding/test_fine_tune.py (1)
213-238: 🧹 Nitpick | 🔵 TrivialConsider adding tests for other numeric parameter validations.
contrastive_fine_tunenow validatesbatch_size,learning_rate, andtemperaturein addition toepochs. Consider adding tests for these validation paths:async def test_rejects_zero_batch_size(self) -> None: with pytest.raises(ValueError, match="batch_size"): await contrastive_fine_tune( training_data_path="/data", base_model="test-model", output_dir="/output", batch_size=0, ) async def test_rejects_negative_learning_rate(self) -> None: with pytest.raises(ValueError, match="learning_rate"): await contrastive_fine_tune( training_data_path="/data", base_model="test-model", output_dir="/output", learning_rate=-0.001, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/memory/embedding/test_fine_tune.py` around lines 213 - 238, Add unit tests to cover numeric validation paths for contrastive_fine_tune: inside TestContrastiveFineTune add async tests that assert ValueError for invalid batch_size, learning_rate, and temperature (e.g., test_rejects_zero_batch_size, test_rejects_negative_learning_rate, test_rejects_invalid_temperature) by calling contrastive_fine_tune with training_data_path, base_model, output_dir and the offending parameter (batch_size=0, learning_rate=-0.001, temperature values outside allowed range) and use pytest.raises(ValueError, match="batch_size"/"learning_rate"/"temperature") to verify the error messages reference the correct parameter names.
♻️ Duplicate comments (3)
src/synthorg/memory/embedding/fine_tune.py (1)
246-248: 🧹 Nitpick | 🔵 TrivialReminder: LLM-powered query generation is stubbed.
The
llm_providerparameter is accepted but explicitly unused (_ = llm_provider). Stage 1 always uses extractive fallback. This is documented but worth noting for future implementation.Would you like me to help implement LLM-powered query generation when a provider is available?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/embedding/fine_tune.py` around lines 246 - 248, The code currently accepts llm_provider but ignores it ("_ = llm_provider"), forcing Stage 1 to always use the extractive fallback; replace that stub with a conditional path that, when llm_provider is provided, invokes the provider to generate the query and otherwise falls back to the existing extractive logic. Specifically, remove the placeholder assignment and add logic to call the provider's generation method (e.g., llm_provider.generate_query or llm_provider.create_query) with the same inputs used by the extractive path, handle errors/timeouts and validate the returned query, and only use the extractive fallback when llm_provider is None or the provider call fails — keep the current extractive behavior as the fallback.web/src/pages/fine-tuning/PipelineControlPanel.tsx (1)
41-44:⚠️ Potential issue | 🟠 MajorInvalidate preflight against the full form state.
buildRequest()depends onshowAdvanced,epochs,learningRate, andeffectiveBatchSize, but Lines 41-44 only clearpreflightwhensourceDirchanges. After an advanced-field edit, Line 95 can still enable or block Start with the previous result, andrunPreflightCheck()inweb/src/stores/fine-tuning.ts:115-124can repopulate that stale result after the form has changed. Track preflight against the full request before renderingPreflightResultPanelor usingcan_proceedto gatestartRun().Also applies to: 48-72, 95-95, 103-103, 121-123
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx` around lines 41 - 44, The preflight is only cleared on sourceDir changes (useEffect that calls useFineTuningStore.setState({ preflight: null })), but buildRequest() depends on showAdvanced, epochs, learningRate, and effectiveBatchSize so the preflight becomes stale after edits; update the invalidation logic to clear preflight whenever the full request shape changes (derive the full request via buildRequest() or include showAdvanced/epochs/learningRate/effectiveBatchSize in the useEffect dependency array) so PreflightResultPanel and can_proceed used by startRun() always reflect the current form, and ensure runPreflightCheck() in the fine-tuning store cannot repopulate an outdated preflight result after form edits.web/src/pages/fine-tuning/RunHistoryTable.tsx (1)
83-89:⚠️ Potential issue | 🟡 MinorNormalize
secondsbefore choosing the display unit.
FineTuneRun.duration_secondsisnumber | nullinweb/src/api/endpoints/fine-tuning.ts, so Line 84 can receive fractional values. Because the<60check happens before rounding, values like59.6still render as60s. Round/clamp once up front and branch on that normalized value instead.Proposed fix
function formatDuration(seconds: number): string { - if (seconds < 60) return `${Math.round(seconds)}s` - if (seconds < 3600) return `${Math.floor(seconds / 60)}m` - const h = Math.floor(seconds / 3600) - const m = Math.floor((seconds % 3600) / 60) + const totalSeconds = Math.max(0, Math.round(seconds)) + if (totalSeconds < 60) return `${totalSeconds}s` + if (totalSeconds < 3600) return `${Math.floor(totalSeconds / 60)}m` + const h = Math.floor(totalSeconds / 3600) + const m = Math.floor((totalSeconds % 3600) / 60) return `${h}h ${m}m` }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/pages/fine-tuning/RunHistoryTable.tsx` around lines 83 - 89, formatDuration may receive fractional seconds from FineTuneRun.duration_seconds (number | null); normalize the input once at the top of formatDuration (e.g., clamp to >=0 and Math.round) and use that normalized value for all comparisons and outputs so values like 59.6 round to 60 and render as "1m". Update the function referenced as formatDuration to compute a normalized seconds variable and then use it for the <60, <3600 checks and for computing h/m/s.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/unit/memory/embedding/test_fine_tune.py`:
- Around line 213-238: Add unit tests to cover numeric validation paths for
contrastive_fine_tune: inside TestContrastiveFineTune add async tests that
assert ValueError for invalid batch_size, learning_rate, and temperature (e.g.,
test_rejects_zero_batch_size, test_rejects_negative_learning_rate,
test_rejects_invalid_temperature) by calling contrastive_fine_tune with
training_data_path, base_model, output_dir and the offending parameter
(batch_size=0, learning_rate=-0.001, temperature values outside allowed range)
and use pytest.raises(ValueError,
match="batch_size"/"learning_rate"/"temperature") to verify the error messages
reference the correct parameter names.
---
Duplicate comments:
In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 246-248: The code currently accepts llm_provider but ignores it
("_ = llm_provider"), forcing Stage 1 to always use the extractive fallback;
replace that stub with a conditional path that, when llm_provider is provided,
invokes the provider to generate the query and otherwise falls back to the
existing extractive logic. Specifically, remove the placeholder assignment and
add logic to call the provider's generation method (e.g.,
llm_provider.generate_query or llm_provider.create_query) with the same inputs
used by the extractive path, handle errors/timeouts and validate the returned
query, and only use the extractive fallback when llm_provider is None or the
provider call fails — keep the current extractive behavior as the fallback.
In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx`:
- Around line 41-44: The preflight is only cleared on sourceDir changes
(useEffect that calls useFineTuningStore.setState({ preflight: null })), but
buildRequest() depends on showAdvanced, epochs, learningRate, and
effectiveBatchSize so the preflight becomes stale after edits; update the
invalidation logic to clear preflight whenever the full request shape changes
(derive the full request via buildRequest() or include
showAdvanced/epochs/learningRate/effectiveBatchSize in the useEffect dependency
array) so PreflightResultPanel and can_proceed used by startRun() always reflect
the current form, and ensure runPreflightCheck() in the fine-tuning store cannot
repopulate an outdated preflight result after form edits.
In `@web/src/pages/fine-tuning/RunHistoryTable.tsx`:
- Around line 83-89: formatDuration may receive fractional seconds from
FineTuneRun.duration_seconds (number | null); normalize the input once at the
top of formatDuration (e.g., clamp to >=0 and Math.round) and use that
normalized value for all comparisons and outputs so values like 59.6 round to 60
and render as "1m". Update the function referenced as formatDuration to compute
a normalized seconds variable and then use it for the <60, <3600 checks and for
computing h/m/s.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 28f2d71f-24fe-495f-8ab7-0819bc979655
📒 Files selected for processing (5)
src/synthorg/memory/embedding/fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.pytests/unit/memory/embedding/test_fine_tune.pyweb/src/pages/fine-tuning/PipelineControlPanel.tsxweb/src/pages/fine-tuning/RunHistoryTable.tsx
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dashboard Test
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
web/src/{components,pages}/**/*.tsx
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/{components,pages}/**/*.tsx: Always reuse existing components fromweb/src/components/ui/before creating new ones; every page must compose from shared building blocks
Do not recreate status dots inline; use the<StatusBadge>component from@/components/ui/status-badge
Do not build card-with-header layouts from scratch; use the<SectionCard>component from@/components/ui/section-card
Do not create metric displays withtext-metric font-boldinline; use the<MetricCard>component from@/components/ui/metric-card
Do not render initials circles manually; use the<Avatar>component from@/components/ui/avatar
Do not create complex (>8 line) JSX inside.map()blocks; extract to a shared component instead
Files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{ts,tsx}: Use Tailwind semantic classes (e.g.,text-foreground,bg-card,text-accent,text-success,bg-danger) or CSS variables (e.g.,var(--so-accent)) for colors; never hardcode hex values in.tsx/.tsfiles
Usefont-sansorfont-monofor typography; never setfontFamilydirectly
Use density-aware tokens (e.g.,p-card,gap-section-gap,gap-grid-gap) or standard Tailwind spacing; never hardcode pixel values for layout spacing
Use token variables (e.g.,var(--so-shadow-card-hover),border-border,border-bright) for shadows and borders; never hardcode values
Importcnfrom@/lib/utilsfor conditional class merging in component files
Do not usergba()with hardcoded values; use design token variables instead
Do not use hardcoded Framer Motion transition durations; use presets from@/lib/motioninstead
web/src/**/*.{ts,tsx}: Web dashboard: ALWAYS reuse existing components fromweb/src/components/ui/before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and@/lib/motionpresets.
Web dashboard: design token rules enforced by PostToolUse hook (scripts/check_web_design_system.py)
TypeScript/JavaScript: Node.js 22+, TypeScript 6.0+, ESLint with zero warnings enforced onweb/src/**/*.{ts,tsx}
Web dashboard YAML parsing: usejs-yamllibrary (in package.json)
Files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
web/src/**/*.{ts,tsx,css}
📄 CodeRabbit inference engine (CLAUDE.md)
Web dashboard styling: Tailwind CSS 4 for utility classes; never hardcode pixel values or colors
Files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ required; nofrom __future__ import annotations(PEP 649 native lazy annotations)
Use PEP 758 except syntax:except A, B:(no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: useallow_inf_nan=Falsein allConfigDictdeclarations to rejectNaN/Infin numeric fields at validation time
Use@computed_fieldfor derived values in Pydantic models instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Files:
tests/unit/memory/embedding/test_fine_tune.pysrc/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Vendor names in test files must usetest-provider,test-small-001, etc.
Test markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing:asyncio_mode = "auto"in conftest -- no manual@pytest.mark.asyncioneeded
Default timeout: 30 seconds per test (global inpyproject.toml). Non-default overrides liketimeout(60)are allowed but avoid adding per-file markers.
Prefer@pytest.mark.parametrizefor testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given+@settings). Profiles intests/conftest.py:ci(deterministic, 10 examples),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples). Controlled viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit@example(...)decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep(). For indefinite blocking, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/embedding/test_fine_tune.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/embedding/test_fine_tune.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code. Exception:observability/setup.py,observability/sinks.py,observability/syslog_handler.py, andobservability/http_handler.pymay use stdlibloggingandprint(..., file=sys.stderr)for handler construction
Logger variable name must always belogger(not_logger, notlog)
Event names: always use constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Structured kwargs in logging: alwayslogger.info(EVENT, key=value)-- neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging for object creation, internal flow, entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001. Vendor names allowed only in: Operations design page,.claude/files, third-party import paths, andsrc/synthorg/providers/presets.py
Files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
🧠 Learnings (48)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-27T22:32:26.927Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T22:32:26.927Z
Learning: Applies to web/src/components/ui/*.{tsx,ts} : For new shared React components: place in web/src/components/ui/ with kebab-case filename, create .stories.tsx with all states, export props as TypeScript interface, use design tokens exclusively
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.stories.tsx : Create a `.stories.tsx` file alongside each new shared component in `web/src/components/ui/` with all states (default, hover, loading, error, empty)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:20:08.544Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from web/src/components/ui/ (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast/ToastContainer, Skeleton variants, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem, Drawer, form fields, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor) before creating new components
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-30T10:41:40.176Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/components/ui/**/*.{ts,tsx} : Create new shared components in `web/src/components/ui/` with `.stories.tsx` Storybook file covering all states (default, hover, loading, error, empty)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create complex (>8 line) JSX inside `.map()` blocks; extract to a shared component instead
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-02T12:21:16.739Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : Do NOT create complex (>8 line) JSX inside `.map()` -- extract to a shared component
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:41:40.176Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside `.map()`; extract to a shared component
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:20:08.544Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside .map()—extract to a shared component
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-02T12:21:16.739Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup, Drawer, InputField, SelectField, SliderField, ToggleField, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor, TagInput, MetadataGrid, ProjectStatusBadge, ContentTypeBadge)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not recreate status dots inline; use the `<StatusBadge>` component from `@/components/ui/status-badge`
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsxweb/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-27T12:44:29.466Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from `web/src/components/ui/` (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem) before creating new ones
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-15T21:49:53.264Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T21:49:53.264Z
Learning: Fix everything valid — never skip when review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes). No deferring, no 'out of scope' skipping.
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-01T05:46:17.064Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T05:46:17.064Z
Learning: Fix all valid issues found by review agents (including pre-existing issues in surrounding code, suggestions, and adjacent findings) — never skip or defer
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to web/src/**/*.{ts,tsx} : Web dashboard: ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and `@/lib/motion` presets.
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-30T10:41:40.176Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones; refer to design system inventory (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, etc.)
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Always reuse existing components from `web/src/components/ui/` before creating new ones; every page must compose from shared building blocks
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-30T10:20:08.544Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Web dashboard shadows/borders: use token variables (var(--so-shadow-card-hover), border-border, border-bright); never hardcode shadow or border values
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create metric displays with `text-metric font-bold` inline; use the `<MetricCard>` component from `@/components/ui/metric-card`
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.tsx : Place new shared components in `web/src/components/ui/` with descriptive kebab-case filenames
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-27T12:44:29.466Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/stores/**/*.{ts,tsx} : Use Zustand stores in web dashboard for state management (auth, WebSocket, toast, analytics, domain shells)
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-30T10:41:40.176Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Use Zustand stores for state management in the web dashboard; each domain has its own store module (auth, WebSocket, toast, analytics, setup, company, agents, budget, tasks, settings, providers, theme, per-domain stores)
Applied to files:
web/src/pages/fine-tuning/PipelineControlPanel.tsx
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Test markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Coverage: 80% minimum. Async: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Timeout: 30 seconds per test. Parallelism: `pytest-xdist` via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/persistence/**/*.py : Persistence uses pluggable PersistenceBackend protocol. SQLite is the initial backend. Settings use SettingsRepository (namespaced settings CRUD).
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Persistence backend: pluggable PersistenceBackend protocol in `src/synthorg/persistence/`, SQLite initial, SettingsRepository (namespaced settings CRUD).
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) per PEP 758 exception syntax on Python 3.14
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (no parentheses) for exception handling — PEP 758 exception syntax enforced by ruff on Python 3.14
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (without parentheses) per PEP 758 for exception handling in Python 3.14
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) — enforced by ruff on Python 3.14
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Use PEP 758 except syntax with `except A, B:` (no parentheses) for multiple exceptions—ruff enforces this on Python 3.14.
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) -- ruff enforces this on Python 3.14
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Handle errors explicitly—never silently swallow exceptions.
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.pysrc/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
src/synthorg/persistence/sqlite/fine_tune_repo.py
📚 Learning: 2026-03-19T21:11:37.538Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T21:11:37.538Z
Learning: Applies to **/*.py : Errors: handle explicitly, never silently swallow.
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-04-02T21:38:30.127Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T21:38:30.127Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Handle errors explicitly; never silently swallow exceptions
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Handle errors explicitly, never silently swallow them. All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-16T20:14:00.937Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-04-01T14:22:06.315Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T14:22:06.315Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src/synthorg/**/*.py,tests/**/*.py,web/src/**/*.{ts,tsx}} : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small`
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/backup/**/*.py : Backup package (backup/): scheduled/manual/lifecycle backups of persistence DB, agent memory, company config. BackupService orchestrator, BackupScheduler (periodic asyncio task), RetentionManager (count + age pruning), tar.gz compression, SHA-256 checksums, manifest tracking, validated restore with atomic rollback and safety backup. handlers/ subpackage: ComponentHandler protocol + concrete handlers (PersistenceComponentHandler, MemoryComponentHandler, ConfigComponentHandler)
Applied to files:
src/synthorg/memory/embedding/fine_tune.py
🔇 Additional comments (30)
src/synthorg/persistence/sqlite/fine_tune_repo.py (8)
152-159: Use PEP 758 except syntax without parentheses.All
except (sqlite3.Error, aiosqlite.Error) as exc:clauses should use the comma-separated form without parentheses per PEP 758 on Python 3.14.- except (sqlite3.Error, aiosqlite.Error) as exc: + except sqlite3.Error, aiosqlite.Error as exc:Apply this pattern to all occurrences throughout the file.
As per coding guidelines: "Use
except A, B:syntax (no parentheses) for exception handling — PEP 758 exception syntax enforced by ruff on Python 3.14."Also applies to: 169-176, 192-198, 229-235, 260-267, 295-301, 356-363, 376-383, 414-420, 450-459, 468-474, 516-523, 534-540
78-80: Use PEP 758 except syntax in row mappers.The exception handling in
_run_from_rowand_checkpoint_from_rowuses parenthesized form. Apply the comma-separated PEP 758 syntax:- except (ValueError, TypeError, json.JSONDecodeError) as exc: + except ValueError, TypeError, json.JSONDecodeError as exc:As per coding guidelines: "Use PEP 758 except syntax:
except A, B:(no parentheses) — enforced by ruff on Python 3.14."Also applies to: 107-109
145-147: Normalize timestamps to UTC before storing.The
.isoformat()calls preserve arbitrary timezone offsets. Since timestamps are stored asTEXTand sorted lexically viaORDER BY ... DESC, different offsets will break chronological ordering.🕒 Suggested helper
+def _serialize_ts(value: datetime | None) -> str | None: + """Serialize datetime to UTC ISO-8601 string.""" + if value is None: + return None + dt = value if value.tzinfo is not None else value.replace(tzinfo=UTC) + return dt.astimezone(UTC).isoformat()Then replace all
.isoformat()calls with_serialize_ts().
119-120: Transaction boundaries need explicit synchronization.Multiple coroutines sharing the same
aiosqlite.Connectioncan interleave theirexecute()/commit()calls into the same transaction. Additionally, most write methods lack rollback on error (onlyset_activeanddelete_checkpointroll back), leaving failed transactions to affect concurrent operations.Consider adding a shared
asyncio.Lockfor all write operations on the same connection, and ensure consistent rollback on all error paths.
26-35: LGTM: Active stages tuple is well-defined.The
_ACTIVE_STAGEStuple correctly derives fromFineTuneStageenum, excluding terminal states (IDLE,COMPLETE,FAILED). This ensuresget_active_run()andmark_interrupted()target the correct stage values.
40-56: LGTM: Timestamp parsing and row mappers with proper error handling.The timestamp helpers (
_parse_ts,_parse_required_ts) correctly handle UTC fallback for naive datetimes. The row mappers (_run_from_row,_checkpoint_from_row) now properly wrap parsing failures asQueryError, preventing raw exceptions from bubbling out.Also applies to: 58-81, 83-109
423-459: LGTM: set_active uses proper atomic deactivate-then-activate with rollback.The
set_activemethod correctly:
- Deactivates all checkpoints
- Activates the target checkpoint
- Checks rowcount to verify the checkpoint exists
- Rolls back on both logical failure (checkpoint not found) and DB errors
476-523: LGTM: delete_checkpoint has TOCTOU protection.The
DELETE ... WHERE id = ? AND is_active = 0guard combined with the rowcount check properly handles the race condition where a checkpoint becomes active between the SELECT and DELETE. The rollback is correctly called before raising.src/synthorg/memory/embedding/fine_tune.py (12)
712-718: Backup-read exception handling is overly broad.The code correctly re-raises
MemoryErrorandRecursionError, but catching genericExceptioncan hide real settings failures likeSettingsEncryptionErrororQueryErrorthat should abort deployment rather than just skip the backup key.Consider catching
SettingsError(the base class for settings-related exceptions) instead ofException.
1-14: LGTM: Module docstring accurately reflects implementation.The docstring correctly documents the 5-stage pipeline and explicitly notes that synthetic data generation uses extractive fallback with LLM path stubbed. This addresses the previous inconsistency concern.
61-76: LGTM: Lazy dependency imports with friendly errors.The
_import_sentence_transformers()and_import_torch()helpers properly:
- Attempt lazy import
- Log the missing dependency with structured event
- Raise
FineTuneDependencyErrorwith install instructionsAlso applies to: 79-94
143-151: LGTM: UTF-8 decode errors handled explicitly.The
_scan_documentsfunction now catchesUnicodeDecodeErrorexplicitly, logs with context, and skips the problematic file rather than silently dropping bytes witherrors="ignore".
196-227: LGTM: Validation split and minimum pairs properly validated.The function now:
- Rejects
validation_splitoutside(0, 1)exclusive range- Requires at least 2 query-document pairs
- Clamps
split_idxto ensure both train and validation sets have at least 1 item
237-254: LGTM: Extractive query generation with proper bounds.The
_generate_queryfunction now:
- Handles empty
firstby falling back tochunk[:100]- Caps length to 200 characters
- Correctly documents that
llm_provideris unused (forward compatibility)
260-342: LGTM: Hard negative mining implementation.The function correctly:
- Lazy-imports sentence-transformers
- Encodes all passages and queries
- Selects hard negatives using a margin-based approach (95% of positive similarity)
- Checks cancellation periodically (every 50 queries)
- Reports progress incrementally
421-432: LGTM: All numeric training parameters validated.The function now validates all numeric parameters before use:
epochs >= 1batch_size >= 1learning_rate > 0temperature > 0This addresses the previous concern about invalid values reaching the optimizer.
448-458: LGTM: Cancellation checked during training.The
_progress_hookcallback checks cancellation every 10 steps, enabling cooperative cancellation during the training loop.
552-587: LGTM: Cancellation checks between evaluation phases.The function now checks cancellation between each expensive encoding phase:
- After loading models (line 552)
- After fine-tuned query encoding (line 562)
- After fine-tuned passage encoding (line 569)
- After base query encoding (line 579)
This addresses the previous concern about long-running evaluation not observing cancel requests.
736-742: LGTM: settings_service.set() call is correct.The call passes only the three required positional arguments (
namespace,key,value), which matches theSettingsService.set()signature. The optionalexpected_updated_atis not needed here.
720-734: LGTM: Backup file only written when deployment proceeds.The early return at line 727 correctly avoids writing
backup_config.jsonwhen nosettings_serviceis available. The backup file is only written (lines 729-734) when deployment will actually update settings.tests/unit/memory/embedding/test_fine_tune.py (8)
25-36: LGTM: Stage enum test includes EVALUATING.The test correctly verifies all
FineTuneStagevalues including the newEVALUATINGstage.
41-78: LGTM: Helper function tests cover edge cases.Tests for
_chunk_text,_generate_query, and_scan_documentsappropriately cover:
- Basic functionality
- Empty inputs
- Edge cases (empty files, non-text files)
100-134: LGTM: Training data generation tests are comprehensive.Tests cover:
- Empty source directory error
- Correct JSONL output format with required fields (
query,positive_passage)- Validation split producing both training and validation files
154-169: LGTM: Cancellation test uses pre-cancelled token.The test correctly verifies that a pre-cancelled
CancellationTokenraisesFineTuneCancelledErrorimmediately.
189-207: LGTM: Dependency error test properly mocks lazy import.The test patches
_import_sentence_transformersto raiseFineTuneDependencyError, verifying the error propagates correctly without requiring the actual ML dependencies.
244-279: LGTM: Evaluation tests cover validation paths.Tests verify:
- Blank checkpoint_path rejection
- Empty validation data raises ValueError with "empty" message
301-311: LGTM: Deploy test verifies no backup without settings_service.The test correctly asserts that:
deploy_checkpointreturnsNonewhen nosettings_serviceis provided- No
backup_config.jsonfile is created in this case
317-337: LGTM: Metrics computation tests verify boundary conditions.The tests correctly verify:
- Perfect ranking: Identity embeddings yield NDCG@10 = 1.0 and Recall@10 = 1.0 (each query's ground truth passage is at rank 1)
- Random embeddings: Scores are bounded within [0.0, 1.0]
web/src/pages/fine-tuning/RunHistoryTable.tsx (2)
20-76: Good decomposition of the row rendering.Pulling the row markup out of
runs.map()keeps the table body easy to scan, and reusingStatusBadgematches the existing UI primitives.As per coding guidelines "Do not create complex (>8 line) JSX inside
.map()blocks; extract to a shared component instead" and "Do not recreate status dots inline; use the<StatusBadge>component from@/components/ui/status-badge".
9-18: Nice use of exhaustive stage maps.Typing both lookup tables with
FineTuneStagekeeps the badge state and label rendering aligned if the backend adds a new stage.Also applies to: 91-104
- Tests for batch_size=0, learning_rate<0, temperature=0 rejection - formatDuration normalizes fractional seconds at top (59.6 -> 1m)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/memory/embedding/test_fine_tune.py`:
- Around line 227-277: Consolidate the four nearly identical tests into one
parametrized test using pytest.mark.parametrize: replace the separate async
test_rejects_* functions with a single async test (e.g.,
test_rejects_invalid_hyperparameters) that parametrizes tuples of (param_name,
invalid_value, match_string) for ("epochs", 0, "epochs"), ("batch_size", 0,
"batch_size"), ("learning_rate", -0.001, "learning_rate"), and ("temperature",
0.0, "temperature"), and inside the test call contrastive_fine_tune with the
given param_name set to invalid_value and assert with pytest.raises(ValueError,
match=match_string); keep imports of contrastive_fine_tune and ensure await is
used for the async call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6e129d0f-a311-480a-9d4a-5b9d72ae0415
📒 Files selected for processing (2)
tests/unit/memory/embedding/test_fine_tune.pyweb/src/pages/fine-tuning/RunHistoryTable.tsx
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dashboard Test
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
web/src/{components,pages}/**/*.tsx
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/{components,pages}/**/*.tsx: Always reuse existing components fromweb/src/components/ui/before creating new ones; every page must compose from shared building blocks
Do not recreate status dots inline; use the<StatusBadge>component from@/components/ui/status-badge
Do not build card-with-header layouts from scratch; use the<SectionCard>component from@/components/ui/section-card
Do not create metric displays withtext-metric font-boldinline; use the<MetricCard>component from@/components/ui/metric-card
Do not render initials circles manually; use the<Avatar>component from@/components/ui/avatar
Do not create complex (>8 line) JSX inside.map()blocks; extract to a shared component instead
Files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{ts,tsx}: Use Tailwind semantic classes (e.g.,text-foreground,bg-card,text-accent,text-success,bg-danger) or CSS variables (e.g.,var(--so-accent)) for colors; never hardcode hex values in.tsx/.tsfiles
Usefont-sansorfont-monofor typography; never setfontFamilydirectly
Use density-aware tokens (e.g.,p-card,gap-section-gap,gap-grid-gap) or standard Tailwind spacing; never hardcode pixel values for layout spacing
Use token variables (e.g.,var(--so-shadow-card-hover),border-border,border-bright) for shadows and borders; never hardcode values
Importcnfrom@/lib/utilsfor conditional class merging in component files
Do not usergba()with hardcoded values; use design token variables instead
Do not use hardcoded Framer Motion transition durations; use presets from@/lib/motioninstead
web/src/**/*.{ts,tsx}: Web dashboard: ALWAYS reuse existing components fromweb/src/components/ui/before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and@/lib/motionpresets.
Web dashboard: design token rules enforced by PostToolUse hook (scripts/check_web_design_system.py)
TypeScript/JavaScript: Node.js 22+, TypeScript 6.0+, ESLint with zero warnings enforced onweb/src/**/*.{ts,tsx}
Web dashboard YAML parsing: usejs-yamllibrary (in package.json)
Files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/**/*.{ts,tsx,css}
📄 CodeRabbit inference engine (CLAUDE.md)
Web dashboard styling: Tailwind CSS 4 for utility classes; never hardcode pixel values or colors
Files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ required; nofrom __future__ import annotations(PEP 649 native lazy annotations)
Use PEP 758 except syntax:except A, B:(no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: useallow_inf_nan=Falsein allConfigDictdeclarations to rejectNaN/Infin numeric fields at validation time
Use@computed_fieldfor derived values in Pydantic models instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Files:
tests/unit/memory/embedding/test_fine_tune.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Vendor names in test files must usetest-provider,test-small-001, etc.
Test markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing:asyncio_mode = "auto"in conftest -- no manual@pytest.mark.asyncioneeded
Default timeout: 30 seconds per test (global inpyproject.toml). Non-default overrides liketimeout(60)are allowed but avoid adding per-file markers.
Prefer@pytest.mark.parametrizefor testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given+@settings). Profiles intests/conftest.py:ci(deterministic, 10 examples),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples). Controlled viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit@example(...)decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep(). For indefinite blocking, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/embedding/test_fine_tune.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/embedding/test_fine_tune.py
🧠 Learnings (16)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-27T22:32:26.927Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T22:32:26.927Z
Learning: Applies to web/src/components/ui/*.{tsx,ts} : For new shared React components: place in web/src/components/ui/ with kebab-case filename, create .stories.tsx with all states, export props as TypeScript interface, use design tokens exclusively
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.stories.tsx : Create a `.stories.tsx` file alongside each new shared component in `web/src/components/ui/` with all states (default, hover, loading, error, empty)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-27T12:44:29.466Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/stores/**/*.{ts,tsx} : Use Zustand stores in web dashboard for state management (auth, WebSocket, toast, analytics, domain shells)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:41:40.176Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/components/ui/**/*.{ts,tsx} : Create new shared components in `web/src/components/ui/` with `.stories.tsx` Storybook file covering all states (default, hover, loading, error, empty)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:20:08.544Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from web/src/components/ui/ (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast/ToastContainer, Skeleton variants, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem, Drawer, form fields, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor) before creating new components
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-27T12:44:29.466Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from `web/src/components/ui/` (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem) before creating new ones
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create metric displays with `text-metric font-bold` inline; use the `<MetricCard>` component from `@/components/ui/metric-card`
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-31T14:31:11.894Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:31:11.894Z
Learning: Applies to web/src/**/*.{ts,tsx} : Use React 19, TypeScript 6.0+, and design system tokens from shadcn/ui + Tailwind CSS 4 + Radix UI in web dashboard
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:41:40.176Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Use Zustand stores for state management in the web dashboard; each domain has its own store module (auth, WebSocket, toast, analytics, setup, company, agents, budget, tasks, settings, providers, theme, per-domain stores)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not recreate status dots inline; use the `<StatusBadge>` component from `@/components/ui/status-badge`
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-04T10:51:58.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create complex (>8 line) JSX inside `.map()` blocks; extract to a shared component instead
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-02T12:21:16.739Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : Do NOT create complex (>8 line) JSX inside `.map()` -- extract to a shared component
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:41:40.176Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside `.map()`; extract to a shared component
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-03-30T10:20:08.544Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside .map()—extract to a shared component
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
📚 Learning: 2026-04-02T12:21:16.739Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup, Drawer, InputField, SelectField, SliderField, ToggleField, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor, TagInput, MetadataGrid, ProjectStatusBadge, ContentTypeBadge)
Applied to files:
web/src/pages/fine-tuning/RunHistoryTable.tsx
🔇 Additional comments (9)
web/src/pages/fine-tuning/RunHistoryTable.tsx (3)
20-47: Good row extraction and shared UI usage.
RunRowkeeps the table rendering clean, and status rendering correctly reuses the shared<StatusBadge>component.As per coding guidelines "Do not recreate status dots inline; use the
<StatusBadge>component from@/components/ui/status-badge".
49-80: Table/empty-state flow is clean and maintainable.Empty-state handling is straightforward, and the
.map()body stays minimal by delegating row JSX toRunRow.As per coding guidelines "Do not create complex (>8 line) JSX inside
.map()blocks; extract to a shared component instead".
83-105: Formatting helpers look correct and consistent.
formatDurationavoids minute rollover edge cases, and stage label mapping remains explicit and easy to reason about.tests/unit/memory/embedding/test_fine_tune.py (6)
32-35: Stage enum assertion update looks correct.The added
EVALUATINGassertion matches the current pipeline lifecycle and keeps enum coverage complete.
41-79: Helper-function test coverage is solid.The chunk/query/scan tests cover empty, nominal, and filtering behavior well without over-mocking.
100-170: Stage 1 tests now validate critical runtime behaviors well.Great additions for no-document failure, artifact creation, progress callback completion, and cooperative cancellation.
298-318: Empty-validation-data guard test is well targeted.Patching
_read_jsonlto[]directly validates the intended failure path in evaluation.
330-350: Deploy tests correctly cover invalid path and no-settings-service behavior.The assertions align with current deploy semantics, including ensuring no backup artifact is created when settings updates are unavailable.
356-376: Metrics tests give good confidence on bounds and ideal ranking.Both perfect-match and randomized-range assertions provide useful sanity coverage for
_compute_metrics.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/memory/embedding/test_fine_tune.py`:
- Around line 248-252: The test uses a nonstandard base model name; update the
kwargs dictionary (variable kwargs used to pass "training_data_path",
"base_model", "output_dir") and replace the "base_model" value "test-model" with
the repository-standard test model name such as "test-small-001" so the test
follows the vendor/model naming convention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 247090f9-344c-4ce8-aceb-c3676a7a2497
📒 Files selected for processing (1)
tests/unit/memory/embedding/test_fine_tune.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ required; nofrom __future__ import annotations(PEP 649 native lazy annotations)
Use PEP 758 except syntax:except A, B:(no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: useallow_inf_nan=Falsein allConfigDictdeclarations to rejectNaN/Infin numeric fields at validation time
Use@computed_fieldfor derived values in Pydantic models instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Files:
tests/unit/memory/embedding/test_fine_tune.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Vendor names in test files must usetest-provider,test-small-001, etc.
Test markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing:asyncio_mode = "auto"in conftest -- no manual@pytest.mark.asyncioneeded
Default timeout: 30 seconds per test (global inpyproject.toml). Non-default overrides liketimeout(60)are allowed but avoid adding per-file markers.
Prefer@pytest.mark.parametrizefor testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given+@settings). Profiles intests/conftest.py:ci(deterministic, 10 examples),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples). Controlled viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit@example(...)decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep(). For indefinite blocking, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/embedding/test_fine_tune.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/embedding/test_fine_tune.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-31T20:29:10.177Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:29:10.177Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases in Python tests
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Test markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Coverage: 80% minimum. Async: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Timeout: 30 seconds per test. Parallelism: `pytest-xdist` via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
🔇 Additional comments (8)
tests/unit/memory/embedding/test_fine_tune.py (8)
32-33: Nice stage enum coverage update.Adding the explicit
EVALUATINGassertion closes the gap for the new lifecycle stage.
41-79: Helper tests are concrete and behavior-focused.Good shift from placeholder expectations to observable behavior for chunking, query fallback, and document scanning rules.
100-169: Stage-1 tests cover key operational paths well.The no-documents, output generation, progress callback, and cooperative cancellation checks are all present and practical.
189-207: Dependency-failure path is correctly exercised.Mocking
_import_sentence_transformersto raiseFineTuneDependencyErrorgives a clean, deterministic guard for missing ML deps.
227-241: Great consolidation with parametrized validation cases.This removes repetitive tests while preserving per-parameter error assertions.
Based on learnings: Applies to tests/**/*.py : Prefer
pytest.mark.parametrizefor testing similar cases.
261-296: Evaluation-stage negative-path tests look solid.Blank checkpoint-path validation and empty validation-data handling are both now explicitly covered.
308-328: Deploy-stage edge cases are well covered.Nonexistent checkpoint rejection and “no settings service” behavior (including no backup file side-effect) are tested clearly.
334-354: Metric tests provide good sanity guarantees.Perfect-match and bounded-random cases together give useful confidence for NDCG/Recall outputs.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/memory/embedding/test_fine_tune.py`:
- Around line 341-343: Replace exact float equality checks in the test that
calls _compute_metrics by using pytest.approx: change "assert ndcg == 1.0" to
"assert ndcg == pytest.approx(1.0)" and "assert recall == 1.0" to "assert recall
== pytest.approx(1.0)"; if pytest is not already imported in
tests/unit/memory/embedding/test_fine_tune.py, add "import pytest" at the top so
the approx matcher is available.
- Around line 127-130: The current assertions use train_lines =
train_path.read_text().strip().split("\n") and val_lines =
val_path.read_text().strip().split("\n") which yields [""] for an empty file and
makes len(val_lines) > 0 a false positive; change the checks to filter out blank
lines (e.g., use .splitlines() then keep only lines where line.strip() != "")
and assert there is at least one non-blank line for both train_lines and
val_lines, or even better, attempt to json.loads the first non-blank line from
val_lines to ensure it is valid JSONL before asserting; update the variables
train_lines and val_lines accordingly and adjust the assertions to reflect the
filtered/parsed content.
- Around line 337-347: The tests import NumPy directly (see test functions like
test_random_embeddings_positive and the use of _compute_metrics that loads numpy
arrays), but NumPy is missing from the test optional dependencies; update the
test dependency list in pyproject.toml to include "numpy" in the test extras so
the imports in these tests succeed during CI and local runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5faee6d9-a1f1-4f38-946c-716d2f910706
📒 Files selected for processing (1)
tests/unit/memory/embedding/test_fine_tune.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ required; nofrom __future__ import annotations(PEP 649 native lazy annotations)
Use PEP 758 except syntax:except A, B:(no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: useallow_inf_nan=Falsein allConfigDictdeclarations to rejectNaN/Infin numeric fields at validation time
Use@computed_fieldfor derived values in Pydantic models instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Files:
tests/unit/memory/embedding/test_fine_tune.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Vendor names in test files must usetest-provider,test-small-001, etc.
Test markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing:asyncio_mode = "auto"in conftest -- no manual@pytest.mark.asyncioneeded
Default timeout: 30 seconds per test (global inpyproject.toml). Non-default overrides liketimeout(60)are allowed but avoid adding per-file markers.
Prefer@pytest.mark.parametrizefor testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given+@settings). Profiles intests/conftest.py:ci(deterministic, 10 examples),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples). Controlled viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit@example(...)decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep(). For indefinite blocking, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/embedding/test_fine_tune.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/embedding/test_fine_tune.py
🧠 Learnings (18)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-31T20:29:10.177Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:29:10.177Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases in Python tests
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Test markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Coverage: 80% minimum. Async: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Timeout: 30 seconds per test. Parallelism: `pytest-xdist` via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to tests/**/*.py : Never skip, dismiss, or ignore flaky tests; fix them fully and fundamentally by mocking `time.monotonic()` and `asyncio.sleep()` for determinism or using `asyncio.Event().wait()` for indefinite blocking
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to tests/**/*.py : Vendor names in test files must use `test-provider`, `test-small-001`, etc.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to tests/**/*.py : Tests must use `test-provider`, `test-small-001`, etc. instead of real vendor names
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src,tests}/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small` as aliases
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Tests must use test-provider, test-small-001, etc. for vendor-agnostic test data.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src/synthorg/**/*.py,tests/**/*.py,web/src/**/*.{ts,tsx}} : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small`
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to {src,tests,web,cli,site}/**/*.{py,ts,tsx,go,astro} : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-04-04T15:08:44.009Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to src/synthorg/**/*.py : Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`. Vendor names allowed only in: Operations design page, `.claude/` files, third-party import paths, and `src/synthorg/providers/presets.py`
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` for all identifier/name fields.
Applied to files:
tests/unit/memory/embedding/test_fine_tune.py
🔇 Additional comments (3)
tests/unit/memory/embedding/test_fine_tune.py (3)
32-33: Good enum contract coverage for the newEVALUATINGstage.This keeps the test aligned with the expanded pipeline lifecycle.
154-169: Cancellation-path test is solid.Pre-cancelling
CancellationTokenand assertingFineTuneCancelledErrorgives good coverage for cooperative cancellation checks.
227-255: Nice use of parametrization for invalid hyperparameters.This reduces duplication and keeps validation coverage concise and maintainable.
| import numpy as np | ||
|
|
||
| n = 5 | ||
| embs = np.eye(n, dtype=np.float32) | ||
| ndcg, recall = _compute_metrics(embs, embs) | ||
| assert ndcg == 1.0 | ||
| assert recall == 1.0 | ||
|
|
||
| def test_random_embeddings_positive(self) -> None: | ||
| import numpy as np | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check dependency manifests for numpy/test extras declarations.
fd -a -t f 'pyproject.toml|setup.py|setup.cfg|requirements.*txt'
rg -n --iglob 'pyproject.toml' --iglob 'setup.py' --iglob 'setup.cfg' --iglob 'requirements*.txt' \
'\bnumpy\b|optional-dependencies|extras_require|\[project.optional-dependencies\]|test'
# Confirm direct numpy imports in changed areas.
rg -n -C2 --type=py 'import numpy as np|from numpy' \
tests/unit/memory/embedding/test_fine_tune.py \
src/synthorg/memory/embedding/fine_tune.pyRepository: Aureliolo/synthorg
Length of output: 3987
Add NumPy to test dependencies.
Tests in this file import NumPy directly (lines 337, 346), but NumPy is not listed in the test optional dependency group in pyproject.toml. Tests will fail at import time without it. Add numpy to the test dependencies in pyproject.toml:
test = [
"pytest==9.0.2",
"pytest-asyncio==1.3.0",
"pytest-cov==7.1.0",
"pytest-mock==3.15.1",
"pytest-timeout==2.4.0",
"pytest-xdist==3.8.0",
"numpy", # Add this
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/memory/embedding/test_fine_tune.py` around lines 337 - 347, The
tests import NumPy directly (see test functions like
test_random_embeddings_positive and the use of _compute_metrics that loads numpy
arrays), but NumPy is missing from the test optional dependencies; update the
test dependency list in pyproject.toml to include "numpy" in the test extras so
the imports in these tests succeed during CI and local runs.
| ndcg, recall = _compute_metrics(embs, embs) | ||
| assert ndcg == 1.0 | ||
| assert recall == 1.0 |
There was a problem hiding this comment.
Use tolerant float assertions for metrics.
Exact equality on floating-point results can be brittle; prefer pytest.approx.
♻️ Proposed fix
- assert ndcg == 1.0
- assert recall == 1.0
+ assert ndcg == pytest.approx(1.0)
+ assert recall == pytest.approx(1.0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/memory/embedding/test_fine_tune.py` around lines 341 - 343,
Replace exact float equality checks in the test that calls _compute_metrics by
using pytest.approx: change "assert ndcg == 1.0" to "assert ndcg ==
pytest.approx(1.0)" and "assert recall == 1.0" to "assert recall ==
pytest.approx(1.0)"; if pytest is not already imported in
tests/unit/memory/embedding/test_fine_tune.py, add "import pytest" at the top so
the approx matcher is available.
Preview DeploymentURL: https://pr-1065.synthorg-pr-preview.pages.dev Built from commit |
🤖 I have created a release *beep* *boop* --- ## [0.6.2](v0.6.1...v0.6.2) (2026-04-06) ### Features * add issue analyzer script for priority/scope management ([#1084](#1084)) ([1ccba27](1ccba27)) * config fixes and deferred improvements from PR [#1058](#1058) review ([#1067](#1067)) ([2cac2d3](2cac2d3)), closes [#1061](#1061) [#1060](#1060) * cumulative risk-unit action budgets ([#806](#806)) and automated reporting ([#245](#245)) ([#1063](#1063)) ([4689816](4689816)) * fine-tuning pipeline + CompositeBackend + workflow lifecycle ([#1065](#1065)) ([85b05bc](85b05bc)), closes [#1001](#1001) [#850](#850) [#1058](#1058) * memory consolidation upgrades (LLM Merge, Search-and-Ask, diversity penalty, distillation capture) ([#1071](#1071)) ([174e2be](174e2be)), closes [#704](#704) * migrate web dashboard from Radix UI to Base UI, activate CSP nonce, rebuild org chart page, and fix agent routing ([#1083](#1083)) ([ebc6921](ebc6921)) * v0.7.0 engine foundations -- structured failure diagnosis + auditable decisions ([#1072](#1072)) ([d341d37](d341d37)) * workflow templates and versioning with diff and rollback ([#1069](#1069)) ([7af94de](7af94de)), closes [#1006](#1006) [#1008](#1008) ### Documentation * unify REST API docs under /docs/openapi/ and patch sitemap ([#1073](#1073)) ([af19382](af19382)) ### Maintenance * bump hypothesis from 6.151.10 to 6.151.11 in the all group ([#1086](#1086)) ([3176318](3176318)) * bump nginxinc/nginx-unprivileged from `f99cc61` to `601c823` in /docker/web in the all group ([#1085](#1085)) ([5eb99ac](5eb99ac)) * bump the all group in /web with 3 updates ([#1087](#1087)) ([8deae44](8deae44)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Composite PR implementing three features across Python backend, React dashboard, and docs:
Fine-tuning pipeline + dashboard (#1001)
asyncio.create_taskwith SQLite state persistence, cooperative cancellation viathreading.Event, WebSocket progress events, startup recovery/settings/memory/fine-tuning: pipeline control panel, live progress stepper/bar, checkpoint table with deploy/rollback/delete (ConfirmDialog), run history, dependency bannerCompositeBackend (#850)
CompanyMemoryConfig.compositewith bidirectional validationWorkflow execution lifecycle (#1058)
Test plan
Review coverage
Pre-reviewed by 18 specialized agents: docs-consistency, code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, security-reviewer, frontend-reviewer, design-token-audit, api-contract-drift, persistence-reviewer, test-quality-reviewer, async-concurrency-reviewer, comment-analyzer, issue-resolution-verifier. 42 findings addressed across 6 Critical, 24 Major, 6 Medium, 5 Docs, 1 Docstring bundle.
Key fixes from review:
threading.Event(thread-safe forasyncio.to_threadworkers)set_activewithBEGIN IMMEDIATE+ existence verificationdelete_checkpointvia conditional DELETE_read_jsonlwrapped inasyncio.to_thread(was blocking event loop)Closes #1001
Closes #850
Closes #1058