feat: fine-tuning pipeline + CompositeBackend + workflow lifecycle by Aureliolo · Pull Request #1065 · Aureliolo/synthorg

Aureliolo · 2026-04-04T13:21:09Z

Summary

Composite PR implementing three features across Python backend, React dashboard, and docs:

Fine-tuning pipeline + dashboard (#1001)

5-stage ML pipeline: synthetic data generation, hard negative mining, contrastive fine-tuning (InfoNCE/MNRL), evaluation (NDCG@10/Recall@10), deploy with backup/rollback
Background orchestration: asyncio.create_task with SQLite state persistence, cooperative cancellation via threading.Event, WebSocket progress events, startup recovery
Checkpoint management API: list, deploy, rollback, delete with atomic set_active and TOCTOU-safe delete
Pre-flight validation: dependency check, GPU detection, document count validation
Dashboard page at /settings/memory/fine-tuning: pipeline control panel, live progress stepper/bar, checkpoint table with deploy/rollback/delete (ConfirmDialog), run history, dependency banner

CompositeBackend (#850)

Namespace-based routing: routes memory operations to different backends by namespace prefix
InMemoryBackend: session-scoped, dict-based, with per-agent limits and expiration
Configuration: YAML-driven via CompanyMemoryConfig.composite with bidirectional validation
Graceful degradation: fan-out retrieve continues on partial backend failures

Workflow execution lifecycle (#1058)

COMPLETED/FAILED transitions: TaskEngine observer mechanism bridges task events to workflow execution status
Workflow list page: card grid with search, type filter, create/duplicate/delete
Editor improvements: bidirectional YAML editing, structured condition builder (AND/OR), minimap, copy/paste

Test plan

13,862 Python tests pass (including 48 new: pipeline stages, orchestrator lifecycle, cancellation, persistence CRUD, migration schema, session store time-sensitivity fixes)
2,396 web dashboard tests pass
All linters clean: ruff, mypy (strict), ESLint (zero warnings), TypeScript

Review coverage

Pre-reviewed by 18 specialized agents: docs-consistency, code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, security-reviewer, frontend-reviewer, design-token-audit, api-contract-drift, persistence-reviewer, test-quality-reviewer, async-concurrency-reviewer, comment-analyzer, issue-resolution-verifier. 42 findings addressed across 6 Critical, 24 Major, 6 Medium, 5 Docs, 1 Docstring bundle.

Key fixes from review:

CancellationToken uses threading.Event (thread-safe for asyncio.to_thread workers)
Atomic set_active with BEGIN IMMEDIATE + existence verification
TOCTOU-safe delete_checkpoint via conditional DELETE
_read_jsonl wrapped in asyncio.to_thread (was blocking event loop)
Pagination limits clamped to [1, 200]
ConfirmDialog on destructive checkpoint delete
ARIA attributes on progress bar and alert banner

Closes #1001
Closes #850
Closes #1058

coderabbitai · 2026-04-04T13:21:25Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: eea92bae-94cd-41c6-b378-e08787923ad2

📥 Commits

Reviewing files that changed from the base of the PR and between f7b9c40 and a514767.

📒 Files selected for processing (1)

tests/unit/memory/embedding/test_fine_tune.py

Walkthrough

Implements a complete embedding fine‑tuning subsystem and UI plus memory backend extensions. Adds a five‑stage fine‑tune pipeline, a background async FineTuneOrchestrator (start/resume/cancel/recover) with CancellationToken support, SQLite persistence and schema for runs and checkpoints, preflight checks, checkpoint CRUD and run-list admin endpoints, and WebSocket progress events. Expands memory models and observability events, adds CompositeBackend and InMemoryBackend with namespace routing, updates memory factory/mappers/config, refactors imports and logging, and adds extensive backend unit tests and frontend API, store, pages, and components for fine‑tuning.

Suggested labels

autorelease: tagged

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-04T13:23:26Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA a514767.

Ensure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice.

Scanned Files

None

gemini-code-assist

Code Review

This pull request introduces a complete embedding fine-tuning pipeline and a composite memory backend supporting namespace-based routing. The implementation includes a background orchestrator for synthetic data generation, negative mining, contrastive training, and evaluation, along with SQLite repositories for persistence and a new web dashboard. Critical feedback identifies several Python 3 syntax errors in exception handling and performance issues related to blocking I/O in async methods and inefficient non-batched model encoding. Additionally, a potential stability issue regarding GPU memory exhaustion during model evaluation was noted.

gemini-code-assist · 2026-04-04T13:23:59Z

src/synthorg/api/controllers/memory.py

+    except MemoryError, RecursionError:
+        raise


Invalid exception handling syntax. In Python 3, except E, V: is a SyntaxError (it was Python 2 syntax for aliasing). To catch multiple exceptions, you must use a tuple: except (E1, E2):. Note that this same issue occurs at line 442 in this file.

except (MemoryError, RecursionError): raise

gemini-code-assist · 2026-04-04T13:23:59Z

src/synthorg/api/controllers/memory.py

+    except MemoryError, RecursionError:
+        raise


Invalid exception handling syntax. Catching multiple exceptions requires a tuple in Python 3.

except (MemoryError, RecursionError): raise

gemini-code-assist · 2026-04-04T13:23:59Z

src/synthorg/memory/backends/composite/adapter.py

+                    except MemoryError, RecursionError:
+                        raise


Invalid exception handling syntax. Catching multiple exceptions requires a tuple in Python 3. The current syntax will result in a SyntaxError.

Suggested change

except MemoryError, RecursionError:

raise

except (MemoryError, RecursionError):

raise

gemini-code-assist · 2026-04-04T13:23:59Z

src/synthorg/memory/embedding/fine_tune.py

+            except MemoryError, RecursionError:
+                raise


Invalid exception handling syntax. Catching multiple exceptions requires a tuple in Python 3.

Suggested change

except MemoryError, RecursionError:

raise

except (MemoryError, RecursionError):

raise

gemini-code-assist · 2026-04-04T13:23:59Z

src/synthorg/memory/embedding/fine_tune_orchestrator.py

+        except MemoryError, RecursionError:
+            raise


Invalid exception handling syntax. Catching multiple exceptions requires a tuple in Python 3.

except (MemoryError, RecursionError): raise

gemini-code-assist · 2026-04-04T13:23:59Z

src/synthorg/memory/embedding/fine_tune_orchestrator.py

+        except MemoryError, RecursionError:
+            raise


Invalid exception handling syntax. Catching multiple exceptions requires a tuple in Python 3.

except (MemoryError, RecursionError): raise

gemini-code-assist · 2026-04-04T13:23:59Z

src/synthorg/api/controllers/memory.py

+        data: FineTuneRequest,
+    ) -> ApiResponse[PreflightResult]:
+        """Run pre-flight validation checks."""
+        checks = list(_run_preflight_checks(data))


The _run_preflight_checks function performs blocking I/O (specifically rglob and file scanning in _check_documents). Calling it directly in an async controller method will block the event loop, degrading performance for all concurrent requests. This should be offloaded to a worker thread using asyncio.to_thread.

checks = await asyncio.to_thread(_run_preflight_checks, data)

gemini-code-assist · 2026-04-04T13:23:59Z

src/synthorg/memory/embedding/fine_tune.py

+    triples: list[dict[str, object]] = []
+    for i, query in enumerate(queries):
+        if cancellation is not None and i % 50 == 0:
+            cancellation.check()
+        query_emb = await asyncio.to_thread(
+            model.encode,
+            [query],
+            show_progress_bar=False,
+        )
+        sims = _cosine_similarities(query_emb[0], passage_embeddings)
+        positive_sim = sims[i]
+        margin = 0.95 * positive_sim
+        candidates = sorted(
+            ((j, s) for j, s in enumerate(sims) if j != i and s < margin),
+            key=lambda x: x[1],
+            reverse=True,
+        )[:top_k]
+        negatives = [passages[j] for j, _ in candidates]
+        triples.append(
+            {
+                "query": query,
+                "positive": passages[i],
+                "negatives": negatives,
+            },
+        )
+        if progress_callback:
+            progress_callback((i + 1) / len(queries))
+


Encoding queries one-by-one in a loop is highly inefficient. SentenceTransformer.encode is optimized for batch processing, which leverages vectorization and GPU parallelism. Moving the encoding step outside the loop will significantly reduce the overhead of thread spawning and model invocation.

query_embeddings = await asyncio.to_thread( model.encode, queries, show_progress_bar=False, ) triples: list[dict[str, object]] = [] for i, query_emb in enumerate(query_embeddings): if cancellation is not None and i % 50 == 0: cancellation.check() sims = _cosine_similarities(query_emb, passage_embeddings) positive_sim = sims[i] margin = 0.95 * positive_sim candidates = sorted( ((j, s) for j, s in enumerate(sims) if j != i and s < margin), key=lambda x: x[1], reverse=True, )[:top_k] negatives = [passages[j] for j, _ in candidates] triples.append( { "query": queries[i], "positive": passages[i], "negatives": negatives, }, ) if progress_callback: progress_callback((i + 1) / len(queries))

gemini-code-assist · 2026-04-04T13:23:59Z

src/synthorg/memory/embedding/fine_tune.py

+    finetuned = await asyncio.to_thread(
+        st.SentenceTransformer,
+        checkpoint_path,
+    )
+    base = await asyncio.to_thread(st.SentenceTransformer, base_model)
+


Loading both the fine-tuned model and the base model into memory simultaneously can lead to Out-Of-Memory (OOM) errors, especially on GPUs with limited VRAM. Consider loading the fine-tuned model, encoding the validation data, deleting the model instance (and clearing the GPU cache), and then repeating the process for the base model.

codecov · 2026-04-04T13:26:43Z

Codecov Report

❌ Patch coverage is 59.39643% with 592 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.69%. Comparing base (4689816) to head (f7b9c40).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/synthorg/api/controllers/memory.py	11.36%	195 Missing ⚠️
src/synthorg/memory/embedding/fine_tune.py	44.69%	137 Missing and 9 partials ⚠️
...ynthorg/memory/embedding/fine_tune_orchestrator.py	53.07%	99 Missing and 8 partials ⚠️
src/synthorg/persistence/sqlite/fine_tune_repo.py	60.89%	65 Missing and 5 partials ⚠️
src/synthorg/memory/factory.py	7.69%	22 Missing and 2 partials ⚠️
src/synthorg/memory/embedding/fine_tune_models.py	82.85%	12 Missing and 6 partials ⚠️
src/synthorg/api/state.py	43.75%	9 Missing ⚠️
src/synthorg/memory/backends/composite/adapter.py	94.73%	6 Missing and 3 partials ⚠️
src/synthorg/memory/config.py	42.85%	6 Missing and 2 partials ⚠️
src/synthorg/memory/backends/inmemory/adapter.py	95.48%	3 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1065      +/-   ##
==========================================
- Coverage   90.77%   89.69%   -1.08%     
==========================================
  Files         722      730       +8     
  Lines       41099    42539    +1440     
  Branches     4100     4267     +167     
==========================================
+ Hits        37306    38157     +851     
- Misses       3085     3636     +551     
- Partials      708      746      +38

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR adds an end-to-end embedding fine-tuning feature (backend pipeline + persistence + admin APIs + dashboard UI), introduces a namespace-routed Composite memory backend (with a new session-scoped InMemoryBackend), and extends workflow-related UI/layout and backend lifecycle wiring.

Changes:

Add fine-tuning pipeline stages, orchestrator, persistence tables/repos, admin endpoints, and dashboard page with WebSocket-driven progress.
Add CompositeBackend + InMemoryBackend and introduce namespace support across memory models and Mem0 mapping/filtering.
Update assorted workflow UI spacing and add/extend tests for new persistence/memory/fine-tuning behaviors.

Reviewed changes

Copilot reviewed 52 out of 54 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
web/src/stores/fine-tuning.ts	New Zustand store for fine-tuning status, CRUD actions, and WS updates.
web/src/router/routes.ts	Adds a Fine-Tuning route constant.
web/src/router/index.tsx	Registers FineTuningPage in the router.
web/src/pages/workflows/WorkflowCreateDrawer.tsx	Updates spacing class usage.
web/src/pages/workflow-editor/WorkflowNodeDrawer.tsx	Updates spacing class usage.
web/src/pages/FineTuningPage.tsx	New fine-tuning settings page with WS subscription and sections.
web/src/pages/fine-tuning/RunHistoryTable.tsx	New run history table UI.
web/src/pages/fine-tuning/PreflightResultPanel.tsx	New preflight results UI panel.
web/src/pages/fine-tuning/PipelineStepper.tsx	New 5-stage pipeline stepper UI.
web/src/pages/fine-tuning/PipelineProgressBar.tsx	New progress bar UI with ARIA attributes.
web/src/pages/fine-tuning/PipelineControlPanel.tsx	New run start/cancel + preflight controls (incl. “Advanced” section).
web/src/pages/fine-tuning/DependencyMissingBanner.tsx	New dependency-missing banner with install instructions.
web/src/pages/fine-tuning/CheckpointTable.tsx	New checkpoint table UI + deploy/rollback/delete actions.
web/src/components/layout/Sidebar.tsx	Adds Fine-Tuning nav entry in the sidebar.
web/src/api/types.ts	Extends WsEventType union with fine-tune event types.
web/src/api/endpoints/fine-tuning.ts	New API client module/types for fine-tuning endpoints.
web/CLAUDE.md	Updates web/ directory documentation to include fine-tuning domain.
tests/unit/persistence/sqlite/test_migrations.py	Updates expected SQLite tables/indexes for new fine-tune schema.
tests/unit/persistence/sqlite/test_fine_tune_repo.py	Adds unit tests for fine-tune run/checkpoint SQLite repositories.
tests/unit/memory/test_models.py	Adds tests for namespace defaults/validation + MemoryQuery.namespaces.
tests/unit/memory/embedding/test_fine_tune.py	Expands fine-tune stage tests (helpers, cancellation, metrics).
tests/unit/memory/embedding/test_fine_tune_orchestrator.py	Adds orchestrator lifecycle/cancel/recovery/status tests.
tests/unit/memory/embedding/test_fine_tune_models.py	Adds fine-tune model tests (request/status/config/metrics/preflight).
tests/unit/memory/backends/mem0/test_mappers.py	Adds namespace mapping/filtering tests for Mem0 backend helpers.
tests/unit/memory/backends/inmemory/test_adapter.py	Adds comprehensive tests for the new InMemoryBackend.
tests/unit/memory/backends/inmemory/init.py	Test package marker for inmemory backend tests.
tests/unit/memory/backends/composite/test_config.py	Adds tests for CompositeBackendConfig validation/defaults.
tests/unit/memory/backends/composite/test_adapter.py	Adds tests for CompositeBackend routing, fan-out, and prefixed IDs.
tests/unit/memory/backends/composite/init.py	Test package marker for composite backend tests.
tests/unit/api/auth/test_session_store.py	Stabilizes time-dependent session store tests via patched now().
src/synthorg/persistence/sqlite/schema.sql	Adds fine_tune_runs + fine_tune_checkpoints tables and indexes.
src/synthorg/persistence/sqlite/fine_tune_repo.py	Adds SQLite repositories for fine-tune runs and checkpoints.
src/synthorg/observability/events/memory.py	Adds new fine-tune + composite structured logging event constants.
src/synthorg/memory/models.py	Adds namespace fields to request/entry + namespaces filter to queries.
src/synthorg/memory/factory.py	Adds backend factory wiring for inmemory and composite backends.
src/synthorg/memory/errors.py	Adds FineTuneDependencyError and FineTuneCancelledError.
src/synthorg/memory/embedding/fine_tune.py	Implements pipeline stages + dependency gating + evaluation helpers.
src/synthorg/memory/embedding/fine_tune_orchestrator.py	Adds background orchestrator with cancellation + WS emits.
src/synthorg/memory/embedding/fine_tune_models.py	Adds fine-tune request/status/run/checkpoint/config/metrics/preflight models.
src/synthorg/memory/embedding/cancellation.py	Adds cooperative CancellationToken implementation.
src/synthorg/memory/config.py	Extends memory config for composite/inmemory backend selection + validation.
src/synthorg/memory/backends/mem0/mappers.py	Adds namespace metadata + post-filter support for namespaces.
src/synthorg/memory/backends/inmemory/adapter.py	Adds a session-scoped dict-based MemoryBackend implementation.
src/synthorg/memory/backends/inmemory/init.py	Exports InMemoryBackend.
src/synthorg/memory/backends/composite/config.py	Adds CompositeBackendConfig model.
src/synthorg/memory/backends/composite/adapter.py	Adds CompositeBackend routing/fan-out + prefixed IDs.
src/synthorg/memory/backends/composite/init.py	Exports CompositeBackend and CompositeBackendConfig.
src/synthorg/memory/backends/init.py	Exports composite/inmemory backends from backends package.
src/synthorg/engine/workflow/execution_observer.py	Adds logger wiring (minor).
src/synthorg/engine/agent_engine.py	Adjusts imports to be lazy for procedural proposer/pipeline.
src/synthorg/api/ws_models.py	Adds fine-tune WS event types to the backend enum.
src/synthorg/api/state.py	Adds FineTuneOrchestrator to AppState wiring.
src/synthorg/api/controllers/memory.py	Implements fine-tune orchestration endpoints + preflight + checkpoint APIs.
CLAUDE.md	Updates repository-level docs to reflect new memory backends + fine-tune repos.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T13:28:13Z

web/src/pages/FineTuningPage.tsx

+  const wsHandler = useCallback(
+    (event: { payload: Record<string, unknown> }) => {
+      handleWsEvent(event.payload)
+    },


WebSocket handlers registered via useWebSocketStore receive a full WsEvent (with event_type/channel/payload). This handler forwards only event.payload to the store, so fine-tuning events will be ignored because event_type is missing. Pass the entire event into the store (or change the store to accept WsEvent and read event.event_type + event.payload).

Copilot · 2026-04-04T13:28:13Z

web/src/stores/fine-tuning.ts

+  handleWsEvent: (payload) => {
+    const eventType = payload.event_type as string | undefined
+    if (!eventType?.startsWith('memory.fine_tune.')) return
+
+    const data = (payload.payload ?? payload) as Record<string, unknown>
+    const currentStatus = get().status
+
+    if (eventType === 'memory.fine_tune.progress') {
+      set({


handleWsEvent currently expects a plain payload object and reads payload.event_type, but WebSocket dispatch passes a WsEvent object (event_type/channel/timestamp/payload). Align this store with other stores (e.g. tasks/meetings) by accepting WsEvent and extracting event.event_type and event.payload; otherwise fine-tuning WS updates will not apply.

Copilot · 2026-04-04T13:28:13Z

web/src/pages/fine-tuning/PipelineControlPanel.tsx

+  const handlePreflight = () => {
+    const request: StartFineTuneRequest = { source_dir: sourceDir }
+    void runPreflightCheck(request)
+  }
+
+  const handleStart = () => {
+    const request: StartFineTuneRequest = { source_dir: sourceDir }
+    void startRun(request)
+  }


The “Advanced Options” inputs are currently non-functional: their values are hard-coded and their onChange handlers are no-ops, and start/preflight requests only include source_dir. Either wire these fields into component state and include the overrides in StartFineTuneRequest, or remove/disable the UI until supported to avoid misleading users.

Copilot · 2026-04-04T13:28:14Z

src/synthorg/api/controllers/memory.py

+        if torch.cuda.is_available():
+            props = torch.cuda.get_device_properties(0)
+            vram_gb = props.total_mem / (1024**3)
+            return PreflightCheck(
+                name="gpu",
+                status="pass",
+                message=f"GPU available: {props.name}",
+                detail=f"VRAM: {vram_gb:.1f} GB",
+            )


GPU VRAM detection uses props.total_mem, but torch.cuda.get_device_properties() exposes total_memory. As written this will raise AttributeError on GPU hosts and incorrectly report GPU status. Use the correct attribute when computing vram_gb.

Copilot · 2026-04-04T13:28:14Z

src/synthorg/api/controllers/memory.py

+        import torch  # noqa: PLC0415
+
+        if not torch.cuda.is_available():
+            return 16
+        props = torch.cuda.get_device_properties(0)
+        vram_gb = props.total_mem / (1024**3)
+        if vram_gb >= 40:  # noqa: PLR2004
+            return 128
+        if vram_gb >= 16:  # noqa: PLR2004
+            return 64
+        if vram_gb >= 8:  # noqa: PLR2004
+            return 32


Batch size recommendation also uses props.total_mem (nonexistent) instead of torch’s total_memory, which will make recommendations fail on GPU hosts. Use total_memory (and keep the same GB thresholds).

Copilot · 2026-04-04T13:28:15Z

src/synthorg/api/controllers/memory.py

+        checks = list(_run_preflight_checks(data))
+        batch_size = _recommend_batch_size()
+        result = PreflightResult(
+            checks=tuple(checks),
+            recommended_batch_size=batch_size,
+        )
+        logger.info(
+            MEMORY_FINE_TUNE_PREFLIGHT_COMPLETED,
+            can_proceed=result.can_proceed,
+            check_count=len(checks),
+        )
+        return ApiResponse(data=result)


Preflight currently only runs dependencies/GPU/doc-count checks. The PR description and issue requirements also call for disk space estimation (warn/fail with actionable detail). Consider adding a disk-space check (e.g. using shutil.disk_usage on output_dir/source_dir filesystem) and including it in the returned checks list.

Copilot · 2026-04-04T13:28:15Z

src/synthorg/api/controllers/memory.py

+def _check_gpu() -> PreflightCheck:
+    """Best-effort GPU availability check."""
+    try:
+        import torch  # type: ignore[import-not-found]  # noqa: PLC0415
+
+        if torch.cuda.is_available():
+            props = torch.cuda.get_device_properties(0)
+            vram_gb = props.total_mem / (1024**3)
+            return PreflightCheck(
+                name="gpu",
+                status="pass",
+                message=f"GPU available: {props.name}",
+                detail=f"VRAM: {vram_gb:.1f} GB",
+            )
+        return PreflightCheck(
+            name="gpu",
+            status="warn",
+            message="No GPU detected -- training will be slow",
+            detail="CPU-only mode",
+        )
+    except ImportError:


_check_gpu() returns status="warn" when no GPU is detected, but the fine-tuning requirements state stages 2–3 require GPU and should fail early with a clear message. Consider making this a "fail" (or at least gate can_proceed) when torch is present but cuda.is_available() is false, to avoid starting runs that will predictably be impractical/unsupported.

Copilot · 2026-04-04T13:28:15Z

src/synthorg/memory/embedding/fine_tune_orchestrator.py

+    async def cancel(self) -> None:
+        """Cancel the active pipeline run."""
+        if self._cancellation is not None:
+            self._cancellation.cancel()
+            logger.info(MEMORY_FINE_TUNE_CANCELLED)
+
+    async def recover_interrupted(self) -> int:
+        """Mark interrupted runs as FAILED on startup."""
+        return await self._run_repo.mark_interrupted()
+


Startup recovery isn’t actually enforced yet: recover_interrupted() exists but there are no call sites in the codebase, so RUNNING stages may remain stuck after a restart. Wire this into API startup (after repositories are available) so active runs are marked FAILED with "interrupted by restart" as intended.

Copilot · 2026-04-04T13:28:16Z

src/synthorg/memory/embedding/fine_tune.py

+    split_idx = max(1, int(len(all_pairs) * (1 - validation_split)))
+    training = all_pairs[:split_idx]
+    validation = all_pairs[split_idx:]
+


generate_training_data() can produce an empty validation set (e.g. when len(all_pairs) is small, split_idx becomes len(all_pairs)), which will later make evaluate_checkpoint() fail with “Validation data is empty”. Consider clamping split_idx to ensure both splits are non-empty (e.g. 1 <= split_idx <= len(all_pairs)-1) or explicitly validate a minimum number of pairs up front with a clear error.

Copilot · 2026-04-04T13:28:16Z

src/synthorg/memory/embedding/fine_tune.py

+    for i, query in enumerate(queries):
+        if cancellation is not None and i % 50 == 0:
+            cancellation.check()
+        query_emb = await asyncio.to_thread(
+            model.encode,
+            [query],
+            show_progress_bar=False,
+        )
+        sims = _cosine_similarities(query_emb[0], passage_embeddings)


mine_hard_negatives() encodes each query individually inside the loop (model.encode([query]) per item), which can be extremely slow for large datasets. Consider batching query encodes (encode all queries once) and then computing similarities, or at least encoding queries in chunks, to keep runtime feasible.

coderabbitai

Actionable comments posted: 33

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Around line 101-103: The documentation contradicts itself about available
memory backends: update the sentence that currently states "mem0 is the
currently only backend" to reflect the current reality described under the
memory/ and backends/ listings—either state that multiple backends are supported
(e.g., mem0, Qdrant+SQLite, inmemory, composite routing) or explicitly mark mem0
as the default/legacy backend; edit the CLAUDE.md section that mentions "mem0 is
the 'currently only backend'" to reference the backends/ subsection and align
wording (e.g., "mem0 (default/legacy) and other backends such as Qdrant+SQLite,
inmemory and composite routing are supported") so the memory/ and backends/
descriptions are consistent.

In `@src/synthorg/api/controllers/memory.py`:
- Around line 160-164: The preflight response omits the disk-space gate: ensure
you include the disk-space check result in the checks list before constructing
PreflightResult so can_proceed reflects storage limits. Call the existing
disk-space check helper (e.g. _run_disk_space_check or _disk_space_check) and
append its result to checks (the list produced by _run_preflight_checks()) prior
to building the tuple passed to PreflightResult (the same change must be applied
in both construction sites around the PreflightResult usage at lines shown).
Verify the aggregated can_proceed logic consumes that disk check result.
- Around line 226-244: Rollback currently only deactivates checkpoints and never
reapplies the saved backup config; after fetching cp with
SQLiteFineTuneCheckpointRepository.get_checkpoint and verifying
cp.backup_config_json you must deserialize that JSON and apply it to the live
deployment/config store (e.g., via a new or existing repository method such as
SQLiteFineTuneCheckpointRepository.apply_backup_config or
activate_checkpoint_with_config), ensure checkpoint active flags are updated
(instead of only calling deactivate_all) and persist the change, then re-query
with get_checkpoint and return the updated record; if no helper exists add a
repository method that accepts checkpoint_id and backup_config_json to perform
the config restore and activation.
- Around line 78-86: The handlers dereference app_state.fine_tune_orchestrator
unconditionally (e.g., the block that sets orchestrator =
app_state.fine_tune_orchestrator and calls await orchestrator.start(data));
guard this optional field before use and return the same controlled API error
used by get_fine_tune_status when fine-tuning is unavailable. Update each
affected handler (the start/trigger block around orchestrator.start, and the
similar uses at the other two locations referenced) to check if
app_state.fine_tune_orchestrator is None and if so return/raise the standardized
API error response; otherwise proceed to assign to orchestrator and call its
methods.
- Around line 160-161: The preflight probes run heavy synchronous work inside an
async handler: move both _run_preflight_checks() and _recommend_batch_size() off
the event loop by running them in threads and coordinating them with structured
concurrency; in the async controller handler, use asyncio.TaskGroup to launch
asyncio.to_thread(_run_preflight_checks, data) and
asyncio.to_thread(_recommend_batch_size) in parallel, await the TaskGroup,
collect the results into checks and batch_size, and propagate any exceptions so
the handler remains non-blocking.
- Around line 421-428: Update both _check_gpu and _recommend_batch_size to
access props.total_memory instead of props.total_mem when reading VRAM; in
addition, change _recommend_batch_size's exception handling to mirror
_check_gpu's pattern by re-raising system exceptions (MemoryError,
RecursionError) and only swallowing other exceptions (returning None) so system
errors propagate while other failures produce a None recommendation. Ensure you
modify the references to props and the except block around the computation in
_recommend_batch_size accordingly.

In `@src/synthorg/api/state.py`:
- Around line 389-395: set_fine_tune_orchestrator currently overwrites an
existing orchestrator silently; change it to first check
self._fine_tune_orchestrator and if it's already set log a WARNING/ERROR with
context and raise an exception to prevent silent reconfiguration, otherwise set
it and emit an INFO-level lifecycle transition log indicating the old->new
wiring; update the method (set_fine_tune_orchestrator) to follow the same
logging/guard pattern used by the other set_* lifecycle methods.

In `@src/synthorg/memory/backends/composite/adapter.py`:
- Around line 176-181: Add a brief clarifying comment above the nested async
function _check (and/or where backend: MemoryBackend = b is defined) explaining
that using a default parameter captures the current value of b to avoid Python's
late-binding closure issue; reference the async helper _check, the default arg
backend: MemoryBackend = b, results.append(await backend.health_check()), and
tg.create_task(_check()) so future maintainers understand this pattern and why
the default parameter is required.

In `@src/synthorg/memory/backends/inmemory/adapter.py`:
- Around line 336-346: The clear() method is synchronous while other CRUD
methods (store, retrieve, get, delete, count) are async; change clear to an
async method (async def clear(self, agent_id: NotBlankStr) -> int) and call
await self._require_connected() at the start for consistency with other
operations, then pop the agent store (self._store.pop(str(agent_id), {})) and
return its length as before so behavior is unchanged but the API and connection
checks match the other methods.
- Around line 81-87: The disconnect method lacks the same _connect_lock
protection used by connect, creating a race between connect() and disconnect();
wrap the body of disconnect (the checks and setting of self._connected and
logger calls—MEMORY_BACKEND_DISCONNECTING and MEMORY_BACKEND_DISCONNECTED) in
the same async lock acquisition used by connect (i.e., acquire the instance's
_connect_lock via async context or explicit acquire/release), perform the
idempotent check and state change while holding the lock, and then release the
lock so connect() and disconnect() are symmetric and thread-safe.

In `@src/synthorg/memory/embedding/cancellation.py`:
- Around line 7-45: The module lacks structured logging: import get_logger from
synthorg.observability and create logger = get_logger(__name__), then add
logger.info in CancellationToken.cancel to record the state transition (include
context like token id or class name), add logger.debug or info in the
is_cancelled property access if desired, and add logger.warning or logger.error
in CancellationToken.check immediately before raising FineTuneCancelledError
(include the same context and the message "Fine-tuning pipeline run was
cancelled"); update references to CancellationToken.cancel, .is_cancelled, and
.check to ensure logs are emitted where state changes or errors occur.

In `@src/synthorg/memory/embedding/fine_tune_models.py`:
- Around line 245-294: The FineTuneRun model allows inconsistent terminal-state
combinations (e.g., stage == FineTuneStage.FAILED with error None, terminal
stages with completed_at None, or non-terminal stages having completed_at set);
add a model_validator(mode="after") on FineTuneRun that enforces these
invariants by checking the stage, error, completed_at, and stages_completed
fields and raising ValueError with clear messages when invariants are violated
(examples: when stage.is_terminal() then ensure completed_at is not None; when
stage == FineTuneStage.FAILED ensure error is not None; when not
stage.is_terminal() ensure completed_at is None; validate that stages_completed
only contains stages prior to current stage and that progress is None for
terminal stages or within 0..1 otherwise). Ensure the validator runs after
initialization and references FineTuneRun, stage, error, completed_at,
stages_completed, and progress to locate the code to change.

In `@src/synthorg/memory/embedding/fine_tune_orchestrator.py`:
- Around line 197-201: The cancel() method currently signals cancellation via
self._cancellation.cancel() but returns before the background task finishes;
update cancel() to await the task stored in self._cancellation (e.g., await
self._cancellation) so callers know the run has stopped, and handle/ignore
asyncio.CancelledError and optional timeout using asyncio.wait_for if needed,
then log MEMORY_FINE_TUNE_CANCELLED after the awaited task completes; reference
the existing cancel(), self._cancellation, and MEMORY_FINE_TUNE_CANCELLED
symbols when making this change.
- Around line 457-483: The progress callback _cb mutates shared state
self._current_run from threads; either document the thread-safety assumption or
make updates thread-safe by scheduling the mutation on the event loop or using a
lock: inside _cb (returned by _make_progress_cb) avoid direct assignment from
worker threads and instead obtain the main asyncio loop and call
loop.call_soon_threadsafe(...) to perform the model_copy + self._current_run
assignment and _emit_ws, or protect the assignment with a threading.Lock; update
the code around references to self._current_run, run_id, run, and _emit_ws
accordingly and add a short comment describing the chosen approach.

In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 224-226: The extractive-query fallback in the function that builds
the query uses sentences = chunk.split(".") and first = sentences[0].strip()
which can yield an empty string or an overly long query; update the logic to
trim chunk first, check if the trimmed chunk is empty and return a safe default
(e.g., "Find information about: [no content]") or a short placeholder, and if
not empty use the first sentence but cap its length (e.g., first_sentence[:100])
so the returned string from the function (the "Find information about: {first}"
return) is neither empty nor unbounded; reference variables sentences, first and
chunk when applying these checks.

In `@src/synthorg/persistence/sqlite/schema.sql`:
- Around line 418-426: The fine_tune_checkpoints table allows arbitrary integers
and multiple rows marked active; change the column definition for is_active to
enforce boolean values (INTEGER NOT NULL DEFAULT 0 CHECK(is_active IN (0,1)))
and add a schema-level uniqueness constraint for the single-active invariant by
creating a partial unique index that enforces at most one row with is_active=1
(e.g., CREATE UNIQUE INDEX ... ON fine_tune_checkpoints(is_active) WHERE
is_active=1); update the existing idx_ftc_active to the new CHECK and add the
partial unique index (referencing the table fine_tune_checkpoints, column
is_active and index name idx_ftc_active / new unique index name) so the DB
prevents invalid values and multiple active rows.

In `@tests/unit/memory/backends/composite/test_adapter.py`:
- Around line 293-384: The suite lacks a degraded fan-out test: add a new async
test method (e.g., test_retrieve_degraded_fanout) in TestRetrieveFanout that
constructs a CompositeBackend with one child backend whose retrieve() raises and
one healthy child that returns entries, then call connected.retrieve (or
CompositeBackend.retrieve) with an appropriate MemoryQuery and assert the
composite returns the healthy child's entries (ignoring the failing child) and
preserves merge ordering across children; implement the failing child as a small
stub/mock backend whose retrieve raises an exception, register both children
into the CompositeBackend used by the test, and assert results contain only the
healthy contents and are ordered as expected.

In `@tests/unit/memory/backends/inmemory/test_adapter.py`:
- Around line 229-245: Add tests that verify TTL/expiration behavior by mocking
time.monotonic() and asyncio.sleep() around existing tests like
test_get_missing_returns_none and test_namespace_preserved: store an entry with
a short TTL using connected.store(..., ttl=...), assert it is returned by
connected.get(), connected.retrieve(), and counted by connected.count(), then
advance the mocked monotonic time past the TTL (or await a mocked asyncio.sleep)
and assert the entry is no longer returned by connected.get(),
connected.retrieve(), and that connected.count() decrements; reference the
InMemoryBackend instance named connected and its methods get, store, retrieve,
and count when adding these assertions.

In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py`:
- Around line 64-74: The helper _request currently creates real files under
tmp_path (src/doc.txt) but returns a FineTuneRequest with synthetic POSIX paths
(/test/src, /test/out), which is confusing; update _request to either stop
creating the filesystem artifacts if they are unused, or create the files at the
same paths used in the FineTuneRequest (or make the intent explicit) — modify
the function _request so it either (a) removes the tmp_path/src creation and
doc.txt write, or (b) set source_dir and output_dir to the real tmp_path-based
paths (e.g., str(src) and str(tmp_path/"out")), or add a clear docstring above
_request explaining that mocks bypass file access and the created files are
intentionally unused; pick one consistent approach and update tests that rely on
_request accordingly.
- Around line 101-102: The test uses fixed asyncio.sleep() calls to wait for
background work (in tests/unit/memory/embedding/test_fine_tune_orchestrator.py);
replace these timing-based waits by awaiting a deterministic completion signal
from the orchestrator: modify the orchestrator to expose a completion
Future/Event or its running Task (e.g., orchestrator._current_task or
orchestrator.completion_event) and change the test to await that signal (e.g.,
await orchestrator._current_task or await orchestrator.completion_event.wait())
so tests block deterministically instead of using asyncio.sleep(0.05/0.1/0.2);
alternatively, mock the async timing functions (asyncio.sleep/time.monotonic) in
the test if exposing a completion signal is not feasible.
- Around line 221-224: The test is coupling to CancellationToken internals by
calling token._event.wait; add a public blocking API on CancellationToken (e.g.,
wait_until_cancelled or wait_until_cancelled_async) and update the test to call
that instead of token._event.wait; implement the new method to delegate to the
internal threading.Event.wait (or await via asyncio.to_thread) and preserve
existing behavior (return/raise consistently with token.check()), and keep the
token.check() call in the test unchanged so the test still verifies cancellation
behavior using the public method.

In `@tests/unit/memory/test_models.py`:
- Around line 118-133: Combine the two duplicate tests into a single
parameterized test: replace test_namespace_empty_rejected and
test_namespace_whitespace_rejected with a `@pytest.mark.parametrize` decorated
function (e.g. test_invalid_namespace_rejected) that iterates over tuples of
(namespace, expected_match) and asserts
MemoryStoreRequest(category=MemoryCategory.WORKING, namespace=namespace,
content="c") raises ValidationError with the expected_match. Reference
MemoryStoreRequest and MemoryCategory.WORKING in the test and keep the same
ValidationError assertions for the provided namespaces "" and "   ".

In `@web/src/pages/fine-tuning/CheckpointTable.tsx`:
- Around line 39-114: The checkpoint row JSX inside the checkpoints.map is too
large and should be extracted into a new presentational component, e.g.,
CheckpointTableRow; create a new component that accepts a checkpoint prop plus
the actions/handlers (deployCheckpointAction, rollbackCheckpointAction,
setDeletingId) and any helpers used (formatBytes, MetricDelta, StatusBadge,
Button) and move the entire <tr> rendering logic into it, then replace the map
body with checkpoints.map(cp => <CheckpointTableRow key={cp.id} checkpoint={cp}
deploy={deployCheckpointAction} rollback={rollbackCheckpointAction}
setDeletingId={setDeletingId} />) so all formatting, status display and mutation
triggers remain unchanged but live in the new component.

In `@web/src/pages/fine-tuning/DependencyMissingBanner.tsx`:
- Around line 3-14: Replace the hand-rolled div in the DependencyMissingBanner
component with the shared UI primitives from web/src/components/ui/—at minimum
use SectionCard to wrap the content, and compose it with the appropriate
header/content subcomponents (e.g., CardHeader/CardTitle or the SectionCard
title prop and CardContent) and the existing code/mono text primitive from the
ui library instead of raw <code> markup; preserve the message text and
role="alert" semantics and ensure classes for spacing/typography are removed in
favor of the ui components so the banner follows the dashboard design system.

In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx`:
- Around line 67-72: The advanced controls (rendered when showAdvanced is true)
are no-ops because InputField instances for "Epochs", "Learning Rate", and
"Batch Size" use fixed values and empty onChange handlers and the start action
(StartFineTuneRequest) only sends source_dir; fix by adding component state
(e.g., epochs, learningRate, batchSize initialized from
preflight?.recommended_batch_size) and pass them as value props, implement
onChange handlers to update those state variables, and include these fields in
the payload built by StartFineTuneRequest (or alternatively hide the entire
showAdvanced block until these fields are fully supported).
- Around line 14-26: Preflight validation is being reused across edits to
sourceDir so stale results can gate new requests; update the code to invalidate
the preflight state whenever the request changes by clearing the preflight
before running a new check or when sourceDir is modified: call the state setter
that clears the preflight result (e.g., setPreflight(undefined) or equivalent)
inside setSourceDir's updater and immediately before calling runPreflightCheck
in handlePreflight and before calling startRun in handleStart (and similarly in
the other handlers referenced on lines ~34-35 and ~49) so each new request
triggers a fresh validation.

In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx`:
- Around line 9-10: The computed percentage pct in PipelineProgressBar uses
progress directly, which can produce values outside 0–100; clamp progress to the
valid range before converting (e.g., ensure progress is between 0 and 1, then
compute pct) so that aria-valuenow and the width style always receive values
within 0–100; update the logic around the pct calculation and any other usages
(e.g., the aria-valuenow attribute and the inline width style) to use the
clamped value derived from the progress prop or state.
- Around line 19-31: The PipelineProgressBar component is re-implementing a
progress primitive using raw divs (pct variable); replace that markup with the
shared UI Progress component from web/src/components/ui (import Progress and its
child ProgressIndicator or the appropriate API), pass pct as the
value/aria-valuenow equivalent and keep aria-label, min/max props, and the same
classes/transition via existing component props or wrapper className so styling
and behavior match the app-wide progress primitive.

In `@web/src/pages/fine-tuning/PipelineStepper.tsx`:
- Around line 24-59: The connector uses the current step's isComplete so the bar
into the active step stays muted; fix by computing connectorComplete = i <=
currentIdx (while keeping isComplete = i < currentIdx and isCurrent = s.key ===
stage) and use connectorComplete for the connector class; also extract the JSX
inside STAGES.map into a new StepItem (or StepRenderer) component that accepts
props (s, i, currentIdx, stage) and renders the connector, circle and label,
then replace the inline map body with <StepItem ... /> to keep the map simple.

In `@web/src/pages/fine-tuning/RunHistoryTable.tsx`:
- Around line 7-11: STAGE_STATUS_MAP in RunHistoryTable.tsx currently maps only
complete/failed/idle so in-progress stages fall back to 'idle'; update
STAGE_STATUS_MAP to include the in-progress stage keys generating_data,
mining_negatives, training, evaluating, and deploying and map them to 'active'
(so the component that uses STAGE_STATUS_MAP to render badges shows a non-idle
badge for in-progress stages).
- Around line 38-63: Extract the JSX inside runs.map into a new presentational
component (e.g., RunHistoryRow) that accepts a single prop "run" and reuses
existing helpers (formatDuration, formatStage), constants (STAGE_STATUS_MAP),
and components (StatusBadge); replace the inline JSX in RunHistoryTable's
runs.map with <RunHistoryRow key={run.id} run={run} /> and implement
RunHistoryRow to render the same <tr> structure, preserving the key, date
formatting (new Date(run.started_at).toLocaleString()), duration fallback,
stages_completed count, and run.config.source_dir so the table behavior remains
identical but the map body is simplified.
- Around line 70-75: The formatDuration function incorrectly uses Math.round for
minute calculations which can roll over to 60m; update formatDuration so seconds
(<60) still use Math.round, but for minute-only and hour+minute branches compute
minutes with Math.floor((seconds % 3600) / 60) or Math.floor(seconds / 60)
respectively (and compute hours with Math.floor(seconds / 3600)), ensuring no
60-minute value appears and no extra rollover logic is needed.

In `@web/src/stores/fine-tuning.ts`:
- Around line 107-114: runPreflightCheck never clears prior errors, so add
clearing of the error state when starting and on successful completion: in
runPreflightCheck (the async function that calls runPreflight) update the
initial set call to include error: null (e.g., set({ loading: true, preflight:
null, error: null })) and ensure the success branch also clears error (e.g.,
set({ preflight: result, loading: false, error: null })); leave the catch to set
the failure message as it does.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 71295937-4984-484b-907f-a1cd8cd06d5e

📥 Commits

Reviewing files that changed from the base of the PR and between 2af5dd2 and d7fd392.

📒 Files selected for processing (54)

CLAUDE.md
src/synthorg/api/controllers/memory.py
src/synthorg/api/state.py
src/synthorg/api/ws_models.py
src/synthorg/engine/agent_engine.py
src/synthorg/engine/workflow/execution_observer.py
src/synthorg/memory/backends/__init__.py
src/synthorg/memory/backends/composite/__init__.py
src/synthorg/memory/backends/composite/adapter.py
src/synthorg/memory/backends/composite/config.py
src/synthorg/memory/backends/inmemory/__init__.py
src/synthorg/memory/backends/inmemory/adapter.py
src/synthorg/memory/backends/mem0/mappers.py
src/synthorg/memory/config.py
src/synthorg/memory/embedding/cancellation.py
src/synthorg/memory/embedding/fine_tune.py
src/synthorg/memory/embedding/fine_tune_models.py
src/synthorg/memory/embedding/fine_tune_orchestrator.py
src/synthorg/memory/errors.py
src/synthorg/memory/factory.py
src/synthorg/memory/models.py
src/synthorg/observability/events/memory.py
src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/persistence/sqlite/schema.sql
tests/unit/api/auth/test_session_store.py
tests/unit/memory/backends/composite/__init__.py
tests/unit/memory/backends/composite/test_adapter.py
tests/unit/memory/backends/composite/test_config.py
tests/unit/memory/backends/inmemory/__init__.py
tests/unit/memory/backends/inmemory/test_adapter.py
tests/unit/memory/backends/mem0/test_mappers.py
tests/unit/memory/embedding/test_fine_tune.py
tests/unit/memory/embedding/test_fine_tune_models.py
tests/unit/memory/embedding/test_fine_tune_orchestrator.py
tests/unit/memory/test_models.py
tests/unit/persistence/sqlite/test_fine_tune_repo.py
tests/unit/persistence/sqlite/test_migrations.py
web/CLAUDE.md
web/src/api/endpoints/fine-tuning.ts
web/src/api/types.ts
web/src/components/layout/Sidebar.tsx
web/src/pages/FineTuningPage.tsx
web/src/pages/fine-tuning/CheckpointTable.tsx
web/src/pages/fine-tuning/DependencyMissingBanner.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx
web/src/pages/fine-tuning/PipelineProgressBar.tsx
web/src/pages/fine-tuning/PipelineStepper.tsx
web/src/pages/fine-tuning/PreflightResultPanel.tsx
web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/workflow-editor/WorkflowNodeDrawer.tsx
web/src/pages/workflows/WorkflowCreateDrawer.tsx
web/src/router/index.tsx
web/src/router/routes.ts
web/src/stores/fine-tuning.ts

CLAUDE.md

src/synthorg/api/controllers/memory.py

coderabbitai · 2026-04-04T13:35:03Z

web/src/pages/fine-tuning/PipelineProgressBar.tsx

+      <div
+        className="h-2 w-full overflow-hidden rounded-full bg-muted"
+        role="progressbar"
+        aria-valuenow={pct}
+        aria-valuemin={0}
+        aria-valuemax={100}
+        aria-label="Fine-tuning progress"
+      >
+        <div
+          className="h-full rounded-full bg-accent transition-all duration-300"
+          style={{ width: `${pct}%` }}
+        />
+      </div>


🛠️ Refactor suggestion | 🟠 Major

Use the shared UI progress component instead of custom progress markup.

This page-level component recreates a progress primitive; please reuse the existing UI component to keep behavior and styling consistent across pages.

As per coding guidelines, "ALWAYS reuse existing components from web/src/components/ui/ before creating new ones."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx` around lines 19 - 31, The PipelineProgressBar component is re-implementing a progress primitive using raw divs (pct variable); replace that markup with the shared UI Progress component from web/src/components/ui (import Progress and its child ProgressIndicator or the appropriate API), pass pct as the value/aria-valuenow equivalent and keep aria-label, min/max props, and the same classes/transition via existing component props or wrapper className so styling and behavior match the app-wide progress primitive.

web/src/pages/fine-tuning/PipelineStepper.tsx

web/src/pages/fine-tuning/RunHistoryTable.tsx

web/src/stores/fine-tuning.ts

coderabbitai

Actionable comments posted: 10

♻️ Duplicate comments (7)

web/src/pages/fine-tuning/CheckpointTable.tsx (1)

45-120: 🛠️ Refactor suggestion | 🟠 Major

Extract the checkpoint row out of the .map() block.

The row is already handling metric formatting, status presentation, and three mutations inline.

As per coding guidelines "Do not create complex (>8 line) JSX inside .map() blocks; extract to a shared component instead".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/src/pages/fine-tuning/CheckpointTable.tsx` around lines 45 - 120, The map
callback in CheckpointTable.tsx contains a large chunk of JSX for each row —
extract that JSX into a new presentational component (e.g., CheckpointRow) that
accepts a checkpoint object (cp) and the action handlers
(deployCheckpointAction, rollbackCheckpointAction, setDeletingId) as props; move
all logic for date/bytes/metric formatting, StatusBadge/MetricDelta usage, and
the conditional Buttons into that component, then replace the inline map body
with <CheckpointRow cp={cp} ... /> to keep the .map() concise and reuseable
while preserving existing symbols like MetricDelta, StatusBadge, formatBytes,
and the three action functions.

web/src/pages/fine-tuning/PipelineStepper.tsx (1)

24-60: 🛠️ Refactor suggestion | 🟠 Major

Pull the step item out of STAGES.map().

Connector state, badge state, and label rendering are all mixed into one long inline block here.

As per coding guidelines "Do not create complex (>8 line) JSX inside .map() blocks; extract to a shared component instead".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/pages/fine-tuning/PipelineStepper.tsx` around lines 24 - 60, Extract
the JSX inside STAGES.map in PipelineStepper into a new presentational component
(e.g., StepItem) to keep the map callback simple; move the logic that computes
isComplete, isCurrent, connectorComplete and the rendering of the connector div,
badge div (checkmark/number), and label span into that component, accept props
like stage, index, currentIdx, and key, and replace the inline block in
STAGES.map with a single StepItem call so the map callback is under 8 lines and
the state handling stays the same.

web/src/pages/fine-tuning/RunHistoryTable.tsx (1)

45-70: 🛠️ Refactor suggestion | 🟠 Major

Extract the run row out of the .map() block.

This renderer is still carrying date, duration, status, and source formatting inline, which makes the table harder to change safely.

As per coding guidelines "Do not create complex (>8 line) JSX inside .map() blocks; extract to a shared component instead".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/pages/fine-tuning/RunHistoryTable.tsx` around lines 45 - 70, Extract
the JSX inside runs.map into a new presentational component (e.g., RunRow) that
accepts a single prop like run: RunSummary (or the existing run type), moves all
formatting/logic (new Date(run.started_at).toLocaleString(), formatDuration,
STAGE_STATUS_MAP lookup, formatStage, run.config.source_dir,
run.stages_completed.length) into that component, and returns the <tr> markup
currently inline; update runs.map to just render <RunRow key={run.id} run={run}
/> and ensure RunRow is typed/exported appropriately and imports StatusBadge,
formatDuration, formatStage, and STAGE_STATUS_MAP so the visual output and
classNames remain identical.

web/src/pages/fine-tuning/PipelineControlPanel.tsx (1)

24-33: ⚠️ Potential issue | 🟠 Major

Treat the advanced fields as optional overrides, not hidden defaults.

These values are always positive, so buildRequest() sends epochs, learning_rate, and batch_size even while Advanced is collapsed. batchSize also never updates from a later preflight.recommended_batch_size, and preflight is only cleared when sourceDir changes. That lets stale validation gate a different request and silently pins runs to batch_size=128.

Suggested fix

-  const [epochs, setEpochs] = useState('3')
-  const [learningRate, setLearningRate] = useState('1e-5')
-  const [batchSize, setBatchSize] = useState(
-    String(preflight?.recommended_batch_size ?? 128),
-  )
+  const [epochs, setEpochs] = useState('')
+  const [learningRate, setLearningRate] = useState('')
+  const [batchSize, setBatchSize] = useState('')

   useEffect(() => {
     useFineTuningStore.setState({ preflight: null })
-  }, [sourceDir])
+  }, [sourceDir, showAdvanced, epochs, learningRate, batchSize])
+
+  useEffect(() => {
+    if (preflight?.recommended_batch_size != null && batchSize === '') {
+      setBatchSize(String(preflight.recommended_batch_size))
+    }
+  }, [preflight?.recommended_batch_size, batchSize])

   const buildRequest = (): StartFineTuneRequest => {
     const request: StartFineTuneRequest = { source_dir: sourceDir }
     const parsedEpochs = Number(epochs)
-    if (!Number.isNaN(parsedEpochs) && parsedEpochs > 0) request.epochs = parsedEpochs
+    if (showAdvanced && !Number.isNaN(parsedEpochs) && parsedEpochs > 0) {
+      request.epochs = parsedEpochs
+    }
     const parsedLr = Number(learningRate)
-    if (!Number.isNaN(parsedLr) && parsedLr > 0) request.learning_rate = parsedLr
+    if (showAdvanced && !Number.isNaN(parsedLr) && parsedLr > 0) {
+      request.learning_rate = parsedLr
+    }
     const parsedBatch = Number(batchSize)
-    if (!Number.isNaN(parsedBatch) && parsedBatch > 0) request.batch_size = parsedBatch
+    if (showAdvanced && !Number.isNaN(parsedBatch) && parsedBatch > 0) {
+      request.batch_size = parsedBatch
+    }
     return request
   }

Also applies to: 37-54

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx` around lines 24 - 33, The
advanced fields (epochs, learningRate, batchSize) are being sent as hard
defaults even when Advanced is collapsed and batchSize never picks up a later
preflight value; change their handling so they act as optional overrides: stop
initializing batchSize/epochs/learningRate to concrete positive defaults (e.g.
'3'/'1e-5'/'128'); instead initialize them to undefined or empty string and when
a preflight arrives set batchSize only if the user hasn't entered an override
(useEffect watching preflight to call
setBatchSize(preflight.recommended_batch_size) only when current batchSize is
empty). Also modify buildRequest() to include epochs, learning_rate, and
batch_size only if the user has explicitly set them or Advanced is open (check
the Advanced toggle state) rather than always sending the fields. Reference:
epochs, setEpochs, learningRate, setLearningRate, batchSize, setBatchSize,
useFineTuningStore.preflight, buildRequest.

tests/unit/memory/embedding/test_fine_tune_orchestrator.py (1)

64-74: 🧹 Nitpick | 🔵 Trivial

Test helper creates filesystem artifacts but uses synthetic paths.

The _request() helper creates real files in tmp_path but returns a FineTuneRequest with hardcoded POSIX paths (/test/src, /test/out). The comment explains this is intentional since mocks bypass filesystem access. Consider adding a brief docstring to make this clearer for future maintainers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py` around lines 64 -
74, The test helper _request currently creates real files under tmp_path but
returns a FineTuneRequest with synthetic POSIX paths ("/test/src", "/test/out")
which can be confusing; update the _request function to include a concise
docstring that explains it deliberately writes local artifacts for test
diagnostics while returning synthetic POSIX paths because the FineTuneRequest is
used with an orchestrator mock that does not access the filesystem, and mention
FineTuneRequest and the tmp_path behavior so future maintainers understand the
intentional mismatch.

src/synthorg/memory/embedding/fine_tune_orchestrator.py (1)

213-222: 🧹 Nitpick | 🔵 Trivial

cancel() signals cancellation but doesn't await task completion.

The cancel() method sets the cancellation flag and logs, but doesn't await the background task to terminate. Callers may expect the run to be fully stopped when cancel() returns. The current behavior is cooperative cancellation that returns immediately.

Consider either awaiting task completion with a timeout, or documenting this as intentional fire-and-forget cancellation in the docstring.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/embedding/fine_tune_orchestrator.py` around lines 213 -
222, The cancel() method currently sets self._cancellation and logs
MEMORY_FINE_TUNE_CANCELLED but does not wait for the background pipeline to
finish; update cancel() (protected by self._op_lock) to await the background
task's completion (e.g., wait on the task/future that is canceled or a stored
asyncio.Task representing the run) with a reasonable timeout, handle
asyncio.TimeoutError and cancellation exceptions by logging and returning, and
ensure the method returns only after the run task has terminated (or document in
the docstring if you intentionally want fire-and-forget behavior). Reference
symbols: cancel(), self._op_lock, self._cancellation, the background task/future
variable that represents the active run, logger, and MEMORY_FINE_TUNE_CANCELLED.

src/synthorg/api/controllers/memory.py (1)

388-396: ⚠️ Potential issue | 🟠 Major

Preflight validation is still missing the disk-space gate.

The PR objectives explicitly require disk-space validation as part of pre-flight checks: "Pre-flight validation endpoint: ... document-count and disk-space checks". However, _run_preflight_checks() only performs dependency, GPU, and document checks. This means can_proceed can return true on low-storage hosts.

🔧 Proposed fix to add disk-space check

 def _run_preflight_checks(
     request: FineTuneRequest,
 ) -> list[PreflightCheck]:
     """Run all pre-flight validation checks."""
     checks: list[PreflightCheck] = []
     checks.append(_check_dependencies())
     checks.append(_check_gpu())
     checks.append(_check_documents(request.source_dir))
+    checks.append(_check_disk_space(request.source_dir))
     return checks
+
+
+def _check_disk_space(source_dir: str) -> PreflightCheck:
+    """Check available disk space for fine-tuning artifacts."""
+    from pathlib import Path  # noqa: PLC0415
+    import shutil  # noqa: PLC0415
+
+    try:
+        src = Path(source_dir)
+        if not src.exists():
+            return PreflightCheck(
+                name="disk_space",
+                status="warn",
+                message="Cannot check disk space (source dir missing)",
+            )
+        usage = shutil.disk_usage(src)
+        free_gb = usage.free / (1024**3)
+        if free_gb < 5:  # noqa: PLR2004
+            return PreflightCheck(
+                name="disk_space",
+                status="fail",
+                message=f"Insufficient disk space ({free_gb:.1f} GB free)",
+                detail="At least 5 GB required for checkpoints and artifacts",
+            )
+        if free_gb < 10:  # noqa: PLR2004
+            return PreflightCheck(
+                name="disk_space",
+                status="warn",
+                message=f"Low disk space ({free_gb:.1f} GB free)",
+                detail="10+ GB recommended",
+            )
+        return PreflightCheck(
+            name="disk_space",
+            status="pass",
+            message=f"{free_gb:.1f} GB available",
+        )
+    except MemoryError, RecursionError:
+        raise
+    except Exception as exc:
+        return PreflightCheck(
+            name="disk_space",
+            status="warn",
+            message=f"Disk space check failed: {type(exc).__name__}",
+            detail=str(exc),
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/api/controllers/memory.py` around lines 388 - 396, The preflight
flow in _run_preflight_checks currently omits disk-space validation; add a call
to the disk-space check (e.g. _check_disk_space(...)) into the checks list so
disk usage is validated along with _check_dependencies, _check_gpu, and
_check_documents; update _run_preflight_checks to append
_check_disk_space(request.source_dir) (or appropriate path argument) so the
returned checks include the disk-space PreflightCheck used by can_proceed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/page-structure.md`:
- Around line 207-208: The Fine-Tuning sub-page documentation for the memory
namespace (/settings/memory/fine-tuning) omits the checkpoint lifecycle; update
the page-structure text for the Fine-Tuning sub-page to include a concise IA
spec and route note that document checkpoint list, deploy, rollback, and delete
flows (including UI affordances, required API actions, user permissions, and
expected status transitions) alongside the existing pipeline status/history,
preflight, and start/cancel descriptions so operators can discover and manage
checkpoints from the same route.

In `@src/synthorg/api/controllers/memory.py`:
- Around line 248-258: Wrap the activation + settings update in a try/except so
activation is not left half-done: after calling repo.set_active(checkpoint_id)
attempt the settings updates via app_state.settings_service.set(...) inside a
try block; on any exception catch it, roll back the repo activation by restoring
the previous active checkpoint (use repo.get_checkpoint to read prior active id
before changing, or call a repo method to unset/deactivate the new checkpoint)
and re-raise or return a controlled error, and stop relying on
cp.backup_config_json to gate settings updates (use the incoming checkpoint's
model_path or an explicit flag instead) so settings updates reflect the current
deployment intent. Ensure you reference repo.set_active, repo.get_checkpoint,
and app_state.settings_service.set in the fix.
- Around line 284-291: The code assumes cp.backup_config_json is valid JSON and
that repo.get_checkpoint(checkpoint_id) returns a value; update the handler to
catch json.JSONDecodeError around json.loads(cp.backup_config_json) (log the
error with context and skip applying the corrupted backup or raise a controlled
ApiError) and add a null check after updated = await
repo.get_checkpoint(checkpoint_id) to handle a None result (return a 404
ApiError or appropriate ApiResponse error instead of returning None). Use the
existing symbols svc.set, repo.deactivate_all, repo.get_checkpoint,
cp.backup_config_json and ApiResponse/ApiError to locate and implement the fix.

In `@src/synthorg/memory/backends/composite/adapter.py`:
- Around line 382-423: The _fanout_retrieve path currently converts all child
exceptions into warnings and returns an empty results list even when every
backend failed; modify the logic in the async TaskGroup/_fetch closure to track
successful backend completions (e.g., increment a success_count or append
successful backend IDs when entries are retrieved) and after the TaskGroup
finishes, if success_count == 0 raise MemoryRetrievalError (include aggregated
errors for context) instead of returning [] ; keep existing logging for partial
failures (errors list and MEMORY_COMPOSITE_FANOUT_PARTIAL) but ensure callers
see an exception when no backends succeeded.

In `@src/synthorg/memory/backends/inmemory/adapter.py`:
- Around line 171-177: store() currently counts agent_store length before
removing expired entries so expired memories still consume quota; update store()
to prune expired items from self._store[str(agent_id)] before checking len and
enforcing self._max_memories_per_agent, reusing the same expiry logic used in
retrieve() (or extract that logic into a helper like _prune_expired(agent_id)
and call it from store(), retrieve(), and any direct lookup/count methods) so
counts and lookups consistently ignore expired entries and new writes are
allowed once expired items are pruned.

In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 155-157: Validate the validation_split input at the start of the
fine-tune routine (the function that accepts validation_split, ProgressCallback,
CancellationToken) and before any dataset slicing: reject values not strictly
between 0 and 1 and compute split indices such that both train and validation
get at least one example (e.g., enforce min_validation = 1 and min_train = 1,
adjust split to floor/max accordingly), throwing a clear
ValueError/ArgumentError if the dataset is too small or the split cannot satisfy
these constraints; apply the same checks where validation_split is used later
(the alternative slicing path referenced around the current use) so you never
attempt to slice into empty train/validation sets.
- Around line 628-641: The function deploy_checkpoint currently logs config_path
but never applies it; ensure that when settings_service is None and a
config_path is provided you perform the file-based update (backup existing file,
modify the embedder config entry to point to checkpoint_path, and atomically
write the updated config), and when neither settings_service nor config_path can
be used, raise/return an error before logging success; locate deploy_checkpoint
and the references to config_path and settings_service and either implement the
backup+modify+write flow for the config file or explicitly reject that code path
so deployment cannot falsely report success.

In `@src/synthorg/persistence/sqlite/fine_tune_repo.py`:
- Around line 136-143: Change parenthesized except clauses to PEP 758
comma-separated form: replace constructs like "except (sqlite3.Error,
aiosqlite.Error) as exc:" with "except sqlite3.Error, aiosqlite.Error as exc:"
(and similarly for other multi-exception handlers). Update the occurrence in the
fine-tune save block that logs MEMORY_FINE_TUNE_PERSIST_FAILED and raises
QueryError (the except handling around run.id and logger.exception), and apply
the same pattern for all other multi-exception excepts in this module (the other
occurrences noted in the review) so they use the comma-separated syntax without
parentheses.

In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx`:
- Around line 8-30: PipelineProgressBar currently coerces a null progress to 0%
which misrepresents an indeterminate state; update PipelineProgressBar (props:
stage, progress) to treat progress === null as indeterminate by: 1) not
computing/printing pct or the "{pct}%" label when progress is null, 2) omitting
aria-valuenow from the outer progressbar element in that case (keep
aria-valuemin/max), and 3) rendering an indeterminate visual (e.g., a full-width
animated/striped/gradient inner bar or a CSS class like "indeterminate") instead
of setting style.width; make the conditional logic around pct, aria-valuenow and
the inner bar appearance in the PipelineProgressBar function so assistive tech
and visuals reflect indeterminate states correctly.

In `@web/src/pages/FineTuningPage.tsx`:
- Around line 28-31: The component currently only registers a local handler via
useWebSocketStore's onChannelEvent/offChannelEvent, but doesn't request server
delivery for the 'system' channel so progress can stall; update the effect that
registers the handler (where useWebSocketStore, useShallow, onChannelEvent and
offChannelEvent are used) to also call the store's channel subscription API to
subscribe to 'system' when mounting and unsubscribe from 'system' in the cleanup
— i.e., grab the subscribe/unsubscribe functions from useWebSocketStore, call
subscribe('system') before registering the handler, and call offChannelEvent
followed by unsubscribe('system') in the effect cleanup.

---

Duplicate comments:
In `@src/synthorg/api/controllers/memory.py`:
- Around line 388-396: The preflight flow in _run_preflight_checks currently
omits disk-space validation; add a call to the disk-space check (e.g.
_check_disk_space(...)) into the checks list so disk usage is validated along
with _check_dependencies, _check_gpu, and _check_documents; update
_run_preflight_checks to append _check_disk_space(request.source_dir) (or
appropriate path argument) so the returned checks include the disk-space
PreflightCheck used by can_proceed.

In `@src/synthorg/memory/embedding/fine_tune_orchestrator.py`:
- Around line 213-222: The cancel() method currently sets self._cancellation and
logs MEMORY_FINE_TUNE_CANCELLED but does not wait for the background pipeline to
finish; update cancel() (protected by self._op_lock) to await the background
task's completion (e.g., wait on the task/future that is canceled or a stored
asyncio.Task representing the run) with a reasonable timeout, handle
asyncio.TimeoutError and cancellation exceptions by logging and returning, and
ensure the method returns only after the run task has terminated (or document in
the docstring if you intentionally want fire-and-forget behavior). Reference
symbols: cancel(), self._op_lock, self._cancellation, the background task/future
variable that represents the active run, logger, and MEMORY_FINE_TUNE_CANCELLED.

In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py`:
- Around line 64-74: The test helper _request currently creates real files under
tmp_path but returns a FineTuneRequest with synthetic POSIX paths ("/test/src",
"/test/out") which can be confusing; update the _request function to include a
concise docstring that explains it deliberately writes local artifacts for test
diagnostics while returning synthetic POSIX paths because the FineTuneRequest is
used with an orchestrator mock that does not access the filesystem, and mention
FineTuneRequest and the tmp_path behavior so future maintainers understand the
intentional mismatch.

In `@web/src/pages/fine-tuning/CheckpointTable.tsx`:
- Around line 45-120: The map callback in CheckpointTable.tsx contains a large
chunk of JSX for each row — extract that JSX into a new presentational component
(e.g., CheckpointRow) that accepts a checkpoint object (cp) and the action
handlers (deployCheckpointAction, rollbackCheckpointAction, setDeletingId) as
props; move all logic for date/bytes/metric formatting, StatusBadge/MetricDelta
usage, and the conditional Buttons into that component, then replace the inline
map body with <CheckpointRow cp={cp} ... /> to keep the .map() concise and
reuseable while preserving existing symbols like MetricDelta, StatusBadge,
formatBytes, and the three action functions.

In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx`:
- Around line 24-33: The advanced fields (epochs, learningRate, batchSize) are
being sent as hard defaults even when Advanced is collapsed and batchSize never
picks up a later preflight value; change their handling so they act as optional
overrides: stop initializing batchSize/epochs/learningRate to concrete positive
defaults (e.g. '3'/'1e-5'/'128'); instead initialize them to undefined or empty
string and when a preflight arrives set batchSize only if the user hasn't
entered an override (useEffect watching preflight to call
setBatchSize(preflight.recommended_batch_size) only when current batchSize is
empty). Also modify buildRequest() to include epochs, learning_rate, and
batch_size only if the user has explicitly set them or Advanced is open (check
the Advanced toggle state) rather than always sending the fields. Reference:
epochs, setEpochs, learningRate, setLearningRate, batchSize, setBatchSize,
useFineTuningStore.preflight, buildRequest.

In `@web/src/pages/fine-tuning/PipelineStepper.tsx`:
- Around line 24-60: Extract the JSX inside STAGES.map in PipelineStepper into a
new presentational component (e.g., StepItem) to keep the map callback simple;
move the logic that computes isComplete, isCurrent, connectorComplete and the
rendering of the connector div, badge div (checkmark/number), and label span
into that component, accept props like stage, index, currentIdx, and key, and
replace the inline block in STAGES.map with a single StepItem call so the map
callback is under 8 lines and the state handling stays the same.

In `@web/src/pages/fine-tuning/RunHistoryTable.tsx`:
- Around line 45-70: Extract the JSX inside runs.map into a new presentational
component (e.g., RunRow) that accepts a single prop like run: RunSummary (or the
existing run type), moves all formatting/logic (new
Date(run.started_at).toLocaleString(), formatDuration, STAGE_STATUS_MAP lookup,
formatStage, run.config.source_dir, run.stages_completed.length) into that
component, and returns the <tr> markup currently inline; update runs.map to just
render <RunRow key={run.id} run={run} /> and ensure RunRow is typed/exported
appropriately and imports StatusBadge, formatDuration, formatStage, and
STAGE_STATUS_MAP so the visual output and classNames remain identical.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2672fa48-c842-43d4-8bf2-1f99d1e911fe

📥 Commits

Reviewing files that changed from the base of the PR and between d7fd392 and 7f2d203.

📒 Files selected for processing (25)

CLAUDE.md
docs/design/memory.md
docs/design/page-structure.md
src/synthorg/api/controllers/memory.py
src/synthorg/api/state.py
src/synthorg/api/ws_models.py
src/synthorg/memory/backends/composite/adapter.py
src/synthorg/memory/backends/inmemory/adapter.py
src/synthorg/memory/embedding/cancellation.py
src/synthorg/memory/embedding/fine_tune.py
src/synthorg/memory/embedding/fine_tune_models.py
src/synthorg/memory/embedding/fine_tune_orchestrator.py
src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/persistence/sqlite/schema.sql
tests/unit/memory/embedding/test_fine_tune_models.py
tests/unit/memory/embedding/test_fine_tune_orchestrator.py
tests/unit/persistence/sqlite/test_fine_tune_repo.py
tests/unit/persistence/sqlite/test_migrations.py
web/src/pages/FineTuningPage.tsx
web/src/pages/fine-tuning/CheckpointTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx
web/src/pages/fine-tuning/PipelineProgressBar.tsx
web/src/pages/fine-tuning/PipelineStepper.tsx
web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/stores/fine-tuning.ts

docs/design/page-structure.md

src/synthorg/api/controllers/memory.py

src/synthorg/memory/backends/composite/adapter.py

src/synthorg/memory/backends/inmemory/adapter.py

src/synthorg/memory/embedding/fine_tune.py

coderabbitai · 2026-04-04T14:30:23Z

src/synthorg/persistence/sqlite/fine_tune_repo.py

+        except (sqlite3.Error, aiosqlite.Error) as exc:
+            msg = f"Failed to save fine-tune run {run.id}"
+            logger.exception(
+                MEMORY_FINE_TUNE_PERSIST_FAILED,
+                run_id=run.id,
+                error=str(exc),
+            )
+            raise QueryError(msg) from exc


⚠️ Potential issue | 🟡 Minor

Use PEP 758 except syntax without parentheses.

Python 3.14 requires except A, B: syntax (comma-separated, no parentheses) per PEP 758. The parenthesized form except (sqlite3.Error, aiosqlite.Error): should be changed to the unpacked form.

This same pattern appears throughout the file at lines 153, 176, 213, 244, 279, 340, 360, 398, 434, 452, 490, 508.

🔧 Suggested fix for all occurrences

- except (sqlite3.Error, aiosqlite.Error) as exc: + except sqlite3.Error, aiosqlite.Error as exc:

Apply this pattern to all except statements in the file.

As per coding guidelines: "Use except A, B: syntax (no parentheses) for exception handling — PEP 758 except syntax. ruff enforces this on Python 3.14."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except (sqlite3.Error, aiosqlite.Error) as exc:

msg = f"Failed to save fine-tune run {run.id}"

logger.exception(

MEMORY_FINE_TUNE_PERSIST_FAILED,

run_id=run.id,

error=str(exc),

)

raise QueryError(msg) from exc

except sqlite3.Error, aiosqlite.Error as exc:

msg = f"Failed to save fine-tune run {run.id}"

logger.exception(

MEMORY_FINE_TUNE_PERSIST_FAILED,

run_id=run.id,

error=str(exc),

)

raise QueryError(msg) from exc

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/synthorg/persistence/sqlite/fine_tune_repo.py` around lines 136 - 143, Change parenthesized except clauses to PEP 758 comma-separated form: replace constructs like "except (sqlite3.Error, aiosqlite.Error) as exc:" with "except sqlite3.Error, aiosqlite.Error as exc:" (and similarly for other multi-exception handlers). Update the occurrence in the fine-tune save block that logs MEMORY_FINE_TUNE_PERSIST_FAILED and raises QueryError (the except handling around run.id and logger.exception), and apply the same pattern for all other multi-exception excepts in this module (the other occurrences noted in the review) so they use the comma-separated syntax without parentheses.

web/src/pages/fine-tuning/PipelineProgressBar.tsx

web/src/pages/FineTuningPage.tsx

coderabbitai

Actionable comments posted: 7

♻️ Duplicate comments (3)

web/src/pages/fine-tuning/PipelineProgressBar.tsx (1)
23-38: 🛠️ Refactor suggestion | 🟠 Major

Use the shared UI progress component instead of rebuilding it here.

This page-local component is still hand-rolling determinate and indeterminate progress states with raw divs. Please reuse the existing progress primitive from web/src/components/ui/ so behavior, styling, and accessibility stay consistent across the dashboard.

As per coding guidelines, "ALWAYS reuse existing components from web/src/components/ui/ before creating new ones."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx` around lines 23 - 38,
Replace the hand-rolled progress markup inside the PipelineProgressBar component
with the shared UI progress primitive: import the shared Progress component and
use it instead of the raw divs, passing pct as the determinate value (e.g.,
value={pct}) and indeterminate as the prop to toggle the indeterminate state,
and preserve aria-label ("Fine-tuning progress") and aria min/max when
determinate; remove the duplicated determinate/indeterminate div branches and
any inline style-based width logic so the shared component handles styling,
animation and accessibility.
tests/unit/memory/embedding/test_fine_tune_orchestrator.py (1)
64-81: ⚠️ Potential issue | 🟠 Major

Use the real tmp_path locations in _request() instead of /test/....

_mock_all_stages() writes files under output_dir, so these placeholder root paths make the tests depend on write access to the filesystem root and still leave source_dir out of sync with the files created above. Point the request at the temp directories you create here, or make the stage mocks fully in-memory.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py` around lines 64 -
81, The test helper _request() currently returns synthetic "/test/..." paths
which makes tests write to the filesystem root and leaves source_dir out of sync
with the files you created; change it to point at the real tmp_path locations
you create (e.g. use the src Path you create and an output dir under tmp_path)
and return them as POSIX strings so FineTuneRequest still rejects Windows drive
letters; ensure you also create the output directory before returning so
_mock_all_stages() can write files there (references: _request, FineTuneRequest,
_mock_all_stages).
src/synthorg/api/controllers/memory.py (1)
252-262: ⚠️ Potential issue | 🔴 Critical

Deploy and rollback can still leave runtime settings out of sync with checkpoint state.

These flows persist settings and checkpoint flags as separate commits. If a later svc.set() or repo.deactivate_all() fails after an earlier step succeeded, the API returns an error while one side has already changed. Move this cross-store transition behind a single service method, or capture and restore the prior settings values in the failure path as well.

Also applies to: 313-317
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/api/controllers/memory.py` around lines 252 - 262, The
activation flow updates settings and checkpoint state in separate commits which
can leave them inconsistent on partial failure; modify the logic around
app_state.settings_service (svc) and repo operations (repo.set_active,
repo.deactivate_all) so the cross-store transition is atomic: either (A) add a
new service method on settings_service that performs both the settings writes
(e.g., embedder_model/embedder_provider) and the checkpoint activation in one
operation, or (B) capture the prior settings values before calling svc.set (read
current "memory.embedder_model" and "memory.embedder_provider"), perform the two
svc.set calls, then call repo.set_active/prior handling, and on any exception
restore the captured prior settings values and the prior checkpoint state (using
repo.set_active(prior.id) or repo.deactivate_all()); ensure the rollback path
updates both the settings and repo consistently and reference cp.model_path,
svc.set, prior, repo.set_active, and repo.deactivate_all in your changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/api/controllers/memory.py`:
- Around line 425-428: The preflight is calling _check_disk_space with
request.source_dir but the pipeline writes to request.output_dir; change the
call in the checks list to use request.output_dir (i.e., replace
_check_disk_space(request.source_dir) with
_check_disk_space(request.output_dir)) so disk-space is validated against the
actual write destination (or optionally call _check_disk_space for both
request.source_dir and request.output_dir if you want to validate both mounts).

In `@src/synthorg/memory/backends/composite/adapter.py`:
- Around line 357-368: The method _resolve_retrieve_targets currently treats
only query.namespaces is None as "no filter" and returns all backends, while an
empty frozenset returns no targets; update the logic in
_resolve_retrieve_targets to treat falsy namespaces (e.g., empty frozenset) the
same as None by checking "if not query.namespaces" so empty sets return
list(self._unique_backends), leaving the rest of the resolve loop (seen dict +
self._resolve(ns)) unchanged; reference _resolve_retrieve_targets,
MemoryQuery.namespaces, self._unique_backends, and self._resolve when making
this change.
- Around line 370-426: The except clause inside _fanout_retrieve currently uses
"except MemoryError, RecursionError:" which binds to builtins.MemoryError and is
syntactically wrong for grouping; update the imports to include the synthorg
custom MemoryError from synthorg.memory.errors and change the clause to properly
group exceptions (e.g., except (MemoryError, RecursionError):) so synthorg's
MemoryConnectionError/MemoryRetrievalError are re-raised instead of being
swallowed; reference the import block that imports from synthorg.memory.errors
and the _fanout_retrieve function's except clause to make the fixes.

In `@src/synthorg/memory/backends/inmemory/adapter.py`:
- Around line 244-272: The get() method currently returns entries even if
expired, causing inconsistency with retrieve(); update get(self, agent_id,
memory_id) to check the entry with the existing _is_expired(entry) helper and
return None when expired (and optionally log an expiration event similar to
MEMORY_ENTRY_FETCHED); ensure you reference the same keys used in _store lookup
and keep the existing logger usage (MEMORY_ENTRY_FETCHED) but add a debug or
trace log for expired entries so semantics match retrieve().
- Around line 304-335: count() currently tallies all entries from self._store
regardless of expiry, causing mismatch with retrieve() which filters out expired
items; update count() (in src/synthorg/memory/backends/inmemory/adapter.py) to
apply the same expiration check used by retrieve() when iterating
agent_store.values() (i.e., skip entries where the entry is expired based on its
expires_at/expiration logic or by calling the same helper used by retrieve()),
preserve the category filter logic, and log MEMORY_ENTRY_COUNTED with the
filtered total so counts and retrievals are consistent.

In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 706-712: The call to SettingsService.set is passing the timestamp
as a positional 4th argument which violates the keyword-only parameter
expected_updated_at; update the call in fine_tune.py to call
settings_service.set("memory", "embedder_model", checkpoint_path,
expected_updated_at=now) so expected_updated_at is passed by name (refer to
settings_service.set and the local variables checkpoint_path and now) and remove
the positional argument usage.

In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py`:
- Around line 128-130: The test currently suppresses all exceptions from
awaiting orchestrator._current_task using contextlib.suppress(Exception); change
this to either explicitly assert that the background task was cancelled (await
and expect asyncio.CancelledError) or catch only the concrete cancellation
exception type instead of Exception, e.g., handle asyncio.CancelledError around
awaiting orchestrator._current_task (and similarly for the second occurrence at
lines 150-152) so unrelated errors are not silently swallowed and the test fails
for unexpected failures.

---

Duplicate comments:
In `@src/synthorg/api/controllers/memory.py`:
- Around line 252-262: The activation flow updates settings and checkpoint state
in separate commits which can leave them inconsistent on partial failure; modify
the logic around app_state.settings_service (svc) and repo operations
(repo.set_active, repo.deactivate_all) so the cross-store transition is atomic:
either (A) add a new service method on settings_service that performs both the
settings writes (e.g., embedder_model/embedder_provider) and the checkpoint
activation in one operation, or (B) capture the prior settings values before
calling svc.set (read current "memory.embedder_model" and
"memory.embedder_provider"), perform the two svc.set calls, then call
repo.set_active/prior handling, and on any exception restore the captured prior
settings values and the prior checkpoint state (using repo.set_active(prior.id)
or repo.deactivate_all()); ensure the rollback path updates both the settings
and repo consistently and reference cp.model_path, svc.set, prior,
repo.set_active, and repo.deactivate_all in your changes.

In `@tests/unit/memory/embedding/test_fine_tune_orchestrator.py`:
- Around line 64-81: The test helper _request() currently returns synthetic
"/test/..." paths which makes tests write to the filesystem root and leaves
source_dir out of sync with the files you created; change it to point at the
real tmp_path locations you create (e.g. use the src Path you create and an
output dir under tmp_path) and return them as POSIX strings so FineTuneRequest
still rejects Windows drive letters; ensure you also create the output directory
before returning so _mock_all_stages() can write files there (references:
_request, FineTuneRequest, _mock_all_stages).

In `@web/src/pages/fine-tuning/PipelineProgressBar.tsx`:
- Around line 23-38: Replace the hand-rolled progress markup inside the
PipelineProgressBar component with the shared UI progress primitive: import the
shared Progress component and use it instead of the raw divs, passing pct as the
determinate value (e.g., value={pct}) and indeterminate as the prop to toggle
the indeterminate state, and preserve aria-label ("Fine-tuning progress") and
aria min/max when determinate; remove the duplicated determinate/indeterminate
div branches and any inline style-based width logic so the shared component
handles styling, animation and accessibility.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5a0627c0-b5bd-4429-b28a-b3760275e3f8

📥 Commits

Reviewing files that changed from the base of the PR and between 7f2d203 and a28b2f9.

📒 Files selected for processing (14)

docs/design/page-structure.md
src/synthorg/api/controllers/memory.py
src/synthorg/memory/backends/composite/adapter.py
src/synthorg/memory/backends/inmemory/adapter.py
src/synthorg/memory/embedding/fine_tune.py
src/synthorg/memory/embedding/fine_tune_orchestrator.py
tests/unit/memory/embedding/test_fine_tune.py
tests/unit/memory/embedding/test_fine_tune_orchestrator.py
web/src/pages/FineTuningPage.tsx
web/src/pages/fine-tuning/CheckpointTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx
web/src/pages/fine-tuning/PipelineProgressBar.tsx
web/src/pages/fine-tuning/PipelineStepper.tsx
web/src/pages/fine-tuning/RunHistoryTable.tsx

src/synthorg/api/controllers/memory.py

src/synthorg/memory/backends/composite/adapter.py

coderabbitai · 2026-04-04T15:19:42Z

src/synthorg/memory/backends/composite/adapter.py

+    async def _fanout_retrieve(
+        self,
+        agent_id: NotBlankStr,
+        query: MemoryQuery,
+        targets: list[MemoryBackend],
+    ) -> list[MemoryEntry]:
+        """Fan out retrieve to *targets* with graceful degradation."""
+        if len(targets) == 1:
+            entries = await targets[0].retrieve(agent_id, query)
+            return [self._rewrite_entry(targets[0], e) for e in entries]
+        results: list[MemoryEntry] = []
+        errors: list[str] = []
+        async with asyncio.TaskGroup() as tg:
+            for b in targets:
+
+                async def _fetch(
+                    backend: MemoryBackend = b,
+                ) -> None:
+                    try:
+                        entries = await backend.retrieve(
+                            agent_id,
+                            query,
+                        )
+                        results.extend(self._rewrite_entry(backend, e) for e in entries)
+                    except MemoryError, RecursionError:
+                        raise
+                    except Exception as exc:
+                        name = self._backend_to_name.get(
+                            id(backend),
+                            "?",
+                        )
+                        errors.append(f"{name}: {exc}")
+                        logger.warning(
+                            MEMORY_COMPOSITE_FANOUT_PARTIAL,
+                            operation="retrieve",
+                            backend=name,
+                            error=str(exc),
+                        )
+
+                tg.create_task(_fetch())
+        if errors:
+            if not results:
+                msg = f"All backends failed during retrieve: {'; '.join(errors)}"
+                raise MemoryRetrievalError(msg)
+            logger.warning(
+                MEMORY_COMPOSITE_FANOUT_PARTIAL,
+                operation="retrieve",
+                failed_backends=errors,
+                successful_count=len(results),
+            )
+        else:
+            logger.debug(
+                MEMORY_COMPOSITE_FANOUT_COMPLETE,
+                operation="retrieve",
+                total=len(results),
+            )
+        return results


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify MemoryError is not imported and check if builtins.MemoryError is being caught echo "=== Checking imports in composite/adapter.py ===" rg -n "^from synthorg\.memory\.errors import" src/synthorg/memory/backends/composite/adapter.py echo "" echo "=== Checking what MemoryError refers to at line 394 ===" # If MemoryError were imported from synthorg.memory.errors, it would appear in the import block rg -n "MemoryError" src/synthorg/memory/backends/composite/adapter.py echo "" echo "=== Verify synthorg.memory.errors.MemoryError exists ===" rg -n "^class MemoryError" src/synthorg/memory/errors.py

Repository: Aureliolo/synthorg

Length of output: 372

🏁 Script executed:

sed -n '18,25p' src/synthorg/memory/backends/composite/adapter.py

Repository: Aureliolo/synthorg

Length of output: 277

Critical: MemoryError at line 394 refers to the builtin, not synthorg.memory.errors.MemoryError.

The except clause except MemoryError, RecursionError: intends to re-raise synthorg's memory exceptions, but MemoryError is not imported from synthorg.memory.errors (which defines MemoryConfigError, MemoryConnectionError, and MemoryRetrievalError but not MemoryError in the import block). This catches the builtin builtins.MemoryError (out-of-memory conditions) instead.

Consequence: MemoryConnectionError, MemoryRetrievalError, and other synthorg memory exceptions will fall through to except Exception as exc: and be logged as warnings rather than re-raised. This breaks the error propagation contract and violates the logging guideline that error paths must log and raise (not silently convert exceptions to warnings).

Fix: Add MemoryError to the import from synthorg.memory.errors:

from synthorg.memory.errors import ( MemoryConfigError, MemoryConnectionError, + MemoryError, MemoryRetrievalError, )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/backends/composite/adapter.py` around lines 370 - 426, The except clause inside _fanout_retrieve currently uses "except MemoryError, RecursionError:" which binds to builtins.MemoryError and is syntactically wrong for grouping; update the imports to include the synthorg custom MemoryError from synthorg.memory.errors and change the clause to properly group exceptions (e.g., except (MemoryError, RecursionError):) so synthorg's MemoryConnectionError/MemoryRetrievalError are re-raised instead of being swallowed; reference the import block that imports from synthorg.memory.errors and the _fanout_retrieve function's except clause to make the fixes.

src/synthorg/memory/backends/inmemory/adapter.py

src/synthorg/memory/embedding/fine_tune.py

tests/unit/memory/embedding/test_fine_tune_orchestrator.py

Implement namespace-based memory routing with two new backends: - InMemoryBackend: dict-based session-scoped backend implementing MemoryBackend + MemoryCapabilities protocols. Supports namespace/ category/tag/time filtering, max_memories_per_agent enforcement, and clear() for session cleanup. - CompositeBackend: dispatches memory operations to child backends based on a new 'namespace' field on MemoryStoreRequest/Query/Entry. Prefixed memory IDs (e.g. 'mem0:abc123') enable O(1) routing on get()/delete(). Fan-out retrieve with graceful degradation. Data model changes: - MemoryStoreRequest: add namespace (default='default') - MemoryQuery: add namespaces filter (frozenset, default None) - MemoryEntry: add namespace (default='default') Also fixes pre-existing circular import between engine.agent_engine and memory.procedural.pipeline by deferring the imports.

Add foundation layer for the fine-tuning pipeline: - FineTuneDependencyError + FineTuneCancelledError in errors.py - CancellationToken (asyncio.Event wrapper) for cooperative cancellation - EVALUATING stage added to FineTuneStage enum - New models: EvalMetrics (computed improvement fields), FineTuneRunConfig, FineTuneRun, CheckpointRecord, PreflightCheck, PreflightResult (computed can_proceed) - Extended FineTuneRequest with override fields + resume_run_id - Extended FineTuneStatus with run_id - 14 new observability event constants for fine-tune lifecycle

Add fine_tune_runs and fine_tune_checkpoints tables to schema.sql. - SQLiteFineTuneRunRepository: save/get/list/update runs, get_active_run, mark_interrupted (startup recovery) - SQLiteFineTuneCheckpointRepository: save/get/list checkpoints, set_active, delete (with active-checkpoint guard), get_active_checkpoint Both repos follow existing patterns: aiosqlite, parameterized queries, QueryError wrapping, structured logging.

Replace NotImplementedError stubs with actual implementations: Stage 1 (generate_training_data): scans source documents, chunks text, generates extractive queries, outputs training.jsonl + validation.jsonl with configurable holdout split. Stage 2 (mine_hard_negatives): lazy-imports sentence-transformers, encodes passages, computes cosine similarity, selects top-k hard negatives with margin filter. GPU required. Stage 3 (contrastive_fine_tune): biencoder training with InfoNCE loss via MultipleNegativesRankingLoss. Configurable epochs, lr, temperature, batch_size. GPU required. Stage 4 (evaluate_checkpoint): NEW stage. Compares fine-tuned vs base model on held-out validation data. Computes NDCG@10 and Recall@10, saves eval_metrics.json. Stage 5 (deploy_checkpoint): backs up current embedder config, updates settings to point to fine-tuned model. All ML deps (torch, sentence-transformers) are lazy-imported with FineTuneDependencyError on failure. All stages accept progress_callback and CancellationToken parameters.

FineTuneOrchestrator manages background pipeline execution: - start(): single-run guard, creates FineTuneRun, spawns background asyncio.Task for sequential stage execution - resume(): loads failed/cancelled run, skips completed stages - cancel(): sets CancellationToken for cooperative stop - recover_interrupted(): marks active runs as FAILED on startup - get_status(): returns current or most recent run status Progress callbacks are throttled to 1/sec. WebSocket events emitted on stage transitions and completion/failure via ChannelsPlugin (best-effort, no crash on missing plugin).

- Add 4 WsEventType values for fine-tune progress/stage/complete/fail - Add FineTuneOrchestrator slot + property on AppState - Replace controller stubs with real endpoints: POST /admin/memory/fine-tune (start with 409 guard) POST /admin/memory/fine-tune/resume/{run_id} GET /admin/memory/fine-tune/status POST /admin/memory/fine-tune/cancel POST /admin/memory/fine-tune/preflight (deps + GPU checks) GET /admin/memory/fine-tune/checkpoints (paginated) POST /admin/memory/fine-tune/checkpoints/{id}/deploy POST /admin/memory/fine-tune/checkpoints/{id}/rollback DELETE /admin/memory/fine-tune/checkpoints/{id} (409 if active) GET /admin/memory/fine-tune/runs (paginated history) Preflight includes VRAM-based batch size recommendation.

Dashboard page with 6 sub-components: - PipelineControlPanel: start/cancel, source dir, preflight, advanced overrides (epochs, lr, batch size) - PreflightResultPanel: pass/warn/fail badges per check, recommended batch size - PipelineStepper: horizontal 5-stage stepper with completion indicators - PipelineProgressBar: animated progress bar with stage label - CheckpointTable: date, model, doc count, NDCG@10, Recall@10, size, deploy/rollback/delete actions, metric delta badges - RunHistoryTable: date, duration, status, stages completed - DependencyMissingBanner: install instructions when deps missing Zustand store with WebSocket event handling for real-time progress updates. API client with 11 endpoint functions.

Pre-reviewed by 18 specialized agents (docs-consistency, code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, security-reviewer, frontend-reviewer, design-token-audit, api-contract-drift, persistence-reviewer, test-quality-reviewer, async-concurrency-reviewer, comment-analyzer, issue-resolution-verifier). 42 findings addressed: Critical fixes: - Register FineTuningPage route in router (was unreachable) - CancellationToken: threading.Event instead of asyncio.Event (thread-safety) - Replace bare string event names with constants from observability/events - Fix WS progress events sending stale run data - Add 4 fine-tune WS event types to frontend WsEventType Backend fixes: - Fix wrong event constant (STARTED -> PERSIST_FAILED) in save_run - Add MemoryError/RecursionError guards before broad except - Atomic set_active with existence check via BEGIN IMMEDIATE - Eliminate TOCTOU in delete_checkpoint (conditional DELETE) - mark_interrupted now updates updated_at + completed_at - _build_config uses explicit None checks instead of falsy-or - Wrap _read_jsonl in asyncio.to_thread (was blocking event loop) - Add done-callback on fire-and-forget pipeline tasks - Fix progress callback to read self._current_run (not stale closure) - Add deactivate_all() for rollback (replaces set_active('')) - Narrow preflight except clauses (catch specific types) - Add document count preflight check - Clamp pagination limit to [1, 200] - Move stdlib json/datetime imports to module level Frontend fixes: - Add Fine-Tuning to sidebar navigation - Add ConfirmDialog for checkpoint deletion - Extract ACTIVE_STAGES to shared constant - Add resume_run_id to StartFineTuneRequest - Fix design token gap-4 -> gap-section-gap/gap-grid-gap (4 files) - Add role=progressbar + ARIA attrs to PipelineProgressBar - Add role=alert to DependencyMissingBanner Schema fixes: - CHECK constraint on fine_tune_runs.stage - ON DELETE CASCADE on fine_tune_checkpoints.run_id FK - Index on fine_tune_runs(updated_at) Docs: - CLAUDE.md: update memory/ and persistence/ package descriptions - web/CLAUDE.md: update domain count, pages, stores Pre-existing bug fixes: - Session store: wrap time-sensitive list/revoke calls in _patch_now() - Migration test: add fine_tune_runs/checkpoints tables + indexes

…t, and Gemini

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 3-5: The module-level docstring incorrectly claims "LLM-powered"
synthetic data generation while the implementation of _generate_query ignores
llm_provider and always uses an extractive fallback; update the module docstring
to state that synthetic data generation currently uses an extractive fallback
(no LLM calls) and reference the limitation of _generate_query, or alternatively
implement LLM usage inside _generate_query to match the docstring—ensure the
module docstring and _generate_query behavior (and its docstring) are
consistent.
- Around line 720-745: The backup_config.json is written unconditionally via
backup_path.write_text even when deployment may be skipped; move the backup
write so it only happens after you confirm deployment will proceed (i.e., after
checking settings_service or config_path and before calling settings_service.set
or writing the config), or alternatively only write when returning a json backup
for downstream use; update the logic around backup_path/backup and the
conditional using settings_service, config_path, checkpoint_path and
MEMORY_FINE_TUNE_CHECKPOINT_DEPLOYED so orphaned backup files are not created
when the function logs the warning and returns early.
- Around line 708-718: The current broad except block around the
settings_service.get call swallows all Exceptions; change it to only catch
settings-related errors by replacing the generic "except Exception" with "except
SettingsError" (while still re-raising MemoryError and RecursionError) so
SettingNotFoundError, SettingsEncryptionError, QueryError, etc. are logged via
MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED and other unexpected exceptions can
propagate; locate the try/except surrounding settings_service.get(...) in the
fine_tune backup logic (references: settings_service.get, backup,
MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED) and update the except clause accordingly.

In `@src/synthorg/persistence/sqlite/fine_tune_repo.py`:
- Around line 40-55: Map row-parsing failures to the repository's QueryError:
wrap calls that can raise (datetime.fromisoformat in _parse_ts and
_parse_required_ts, any json.loads and Pydantic model_validate_json usages that
map DB rows) in a try/except that catches ValueError, TypeError, and
pydantic.ValidationError and re-raises as QueryError with context (similar to
the wrapper used in _checkpoint_from_row); apply the same pattern for the
row-to-model mapping code paths used by list_runs() and list_checkpoints() so a
single corrupted row produces a QueryError instead of bubbling raw exceptions.
- Around line 129-131: The code persists
run.started_at/created_at/updated_at/completed_at by calling .isoformat() which
preserves timezone offsets and breaks lexical ordering of TEXT columns; update
the persistence logic in fine_tune_repo.py to normalize datetimes to UTC before
formatting (e.g., call .astimezone(datetime.timezone.utc) on run.started_at,
run.created_at, run.updated_at and run.completed_at when building the
VALUES/parameters) and then store the UTC ISO string (optionally with 'Z' for
+00:00) so ORDER BY ... DESC on the TEXT column matches chronological order;
apply this change to all places that serialize these fields (the code paths
referencing run.started_at, run.created_at, run.updated_at, run.completed_at and
any insert/update helpers used by get_active_run and list queries).
- Around line 103-104: All write methods (save_run, update_run,
mark_interrupted, save_checkpoint, set_active, deactivate_all,
delete_checkpoint) must acquire a single shared async lock for the underlying
aiosqlite.Connection before starting execute()/commit() and perform a rollback
on error; implement a module-level mapping (e.g., a WeakKeyDictionary or attach
a lock attribute on the Connection keyed by the Connection object) so the same
lock is used across all repository instances that share the same db connection,
then wrap each write method with: await lock.acquire(); try: await
self._db.execute(...); await self._db.commit(); except Exception: await
self._db.rollback(); raise; finally: lock.release(); ensure imports for asyncio
and weakref (or equivalent) are added and apply this pattern to all seven DML
methods listed.

In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx`:
- Around line 41-44: The component is directly calling
useFineTuningStore.setState({ preflight: null }) which bypasses the store's
coordinated actions and can create races; add a clearPreflight action to the
fine-tuning store (e.g., clearPreflight: () => set({ preflight: null, error:
null })) alongside existing runPreflightCheck, then replace the direct setState
call in PipelineControlPanel with the new action (invoke clearPreflight via
useFineTuningStore) so preflight is cleared through the store's action pattern.
- Around line 116-126: In PipelineControlPanel, remove the redundant aria-hidden
attribute from the advanced options container (the div with id
"advanced-options-panel") because the panel is conditionally rendered via
showAdvanced; keep the conditional rendering and ensure the toggle control's
aria-expanded and aria-controls attributes remain on the toggle button so
accessibility is preserved.

In `@web/src/pages/fine-tuning/RunHistoryTable.tsx`:
- Around line 39-40: The progress display uses run.stages_completed.length which
can include duplicates or non-active stages; instead compute the count of unique
stages that are present in ACTIVE_STAGES and render that. In
RunHistoryTable.tsx, replace usage of run.stages_completed.length with a derived
value (e.g., create a Set from run.stages_completed filtered by ACTIVE_STAGES
membership and take its size) so you count only unique, active stages before
outputting "{completedActiveCount}/{ACTIVE_STAGES.size}". Ensure you reference
run.stages_completed and ACTIVE_STAGES in the new computation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 952d81bf-b75e-4943-b9ba-007091054e10

📥 Commits

Reviewing files that changed from the base of the PR and between 32b62bc and c1a6650.

📒 Files selected for processing (6)

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py
tests/unit/memory/backends/composite/test_adapter.py
tests/unit/memory/backends/inmemory/test_adapter.py
web/src/pages/fine-tuning/PipelineControlPanel.tsx
web/src/pages/fine-tuning/RunHistoryTable.tsx

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Dashboard Test
GitHub Check: Test (Python 3.14)
GitHub Check: Build Backend
GitHub Check: Build Web
GitHub Check: Dependency Review
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (7)

web/src/{components,pages}/**/*.tsx

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/{components,pages}/**/*.tsx: Always reuse existing components from web/src/components/ui/ before creating new ones; every page must compose from shared building blocks
Do not recreate status dots inline; use the <StatusBadge> component from @/components/ui/status-badge
Do not build card-with-header layouts from scratch; use the <SectionCard> component from @/components/ui/section-card
Do not create metric displays with text-metric font-bold inline; use the <MetricCard> component from @/components/ui/metric-card
Do not render initials circles manually; use the <Avatar> component from @/components/ui/avatar
Do not create complex (>8 line) JSX inside .map() blocks; extract to a shared component instead

Files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx}: Use Tailwind semantic classes (e.g., text-foreground, bg-card, text-accent, text-success, bg-danger) or CSS variables (e.g., var(--so-accent)) for colors; never hardcode hex values in .tsx/.ts files
Use font-sans or font-mono for typography; never set fontFamily directly
Use density-aware tokens (e.g., p-card, gap-section-gap, gap-grid-gap) or standard Tailwind spacing; never hardcode pixel values for layout spacing
Use token variables (e.g., var(--so-shadow-card-hover), border-border, border-bright) for shadows and borders; never hardcode values
Import cn from @/lib/utils for conditional class merging in component files
Do not use rgba() with hardcoded values; use design token variables instead
Do not use hardcoded Framer Motion transition durations; use presets from @/lib/motion instead

web/src/**/*.{ts,tsx}: Web dashboard: ALWAYS reuse existing components from web/src/components/ui/ before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and @/lib/motion presets.
Web dashboard: design token rules enforced by PostToolUse hook (scripts/check_web_design_system.py)
TypeScript/JavaScript: Node.js 22+, TypeScript 6.0+, ESLint with zero warnings enforced on web/src/**/*.{ts,tsx}
Web dashboard YAML parsing: use js-yaml library (in package.json)

Files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

web/src/**/*.{ts,tsx,css}

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard styling: Tailwind CSS 4 for utility classes; never hardcode pixel values or colors

Files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ required; no from __future__ import annotations (PEP 649 native lazy annotations)
Use PEP 758 except syntax: except A, B: (no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time
Use @computed_field for derived values in Pydantic models instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Files:

src/synthorg/memory/embedding/fine_tune.py
tests/unit/memory/backends/inmemory/test_adapter.py
tests/unit/memory/backends/composite/test_adapter.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr) for handler construction
Logger variable name must always be logger (not _logger, not log)
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Structured kwargs in logging: always logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging for object creation, internal flow, entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names allowed only in: Operations design page, .claude/ files, third-party import paths, and src/synthorg/providers/presets.py

Files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Vendor names in test files must use test-provider, test-small-001, etc.
Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing: asyncio_mode = "auto" in conftest -- no manual @pytest.mark.asyncio needed
Default timeout: 30 seconds per test (global in pyproject.toml). Non-default overrides like timeout(60) are allowed but avoid adding per-file markers.
Prefer @pytest.mark.parametrize for testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given + @settings). Profiles in tests/conftest.py: ci (deterministic, 10 examples), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples). Controlled via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit @example(...) decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep(). For indefinite blocking, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

tests/unit/memory/backends/inmemory/test_adapter.py
tests/unit/memory/backends/composite/test_adapter.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/memory/backends/inmemory/test_adapter.py
tests/unit/memory/backends/composite/test_adapter.py

🧠 Learnings (51)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.

📚 Learning: 2026-03-27T22:32:26.927Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T22:32:26.927Z
Learning: Applies to web/src/components/ui/*.{tsx,ts} : For new shared React components: place in web/src/components/ui/ with kebab-case filename, create .stories.tsx with all states, export props as TypeScript interface, use design tokens exclusively

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.stories.tsx : Create a `.stories.tsx` file alongside each new shared component in `web/src/components/ui/` with all states (default, hover, loading, error, empty)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-27T12:44:29.466Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/stores/**/*.{ts,tsx} : Use Zustand stores in web dashboard for state management (auth, WebSocket, toast, analytics, domain shells)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:20:08.544Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from web/src/components/ui/ (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast/ToastContainer, Skeleton variants, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem, Drawer, form fields, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor) before creating new components

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create complex (>8 line) JSX inside `.map()` blocks; extract to a shared component instead

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-02T12:21:16.739Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : Do NOT create complex (>8 line) JSX inside `.map()` -- extract to a shared component

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:41:40.176Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside `.map()`; extract to a shared component

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:20:08.544Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside .map()—extract to a shared component

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-02T12:21:16.739Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup, Drawer, InputField, SelectField, SliderField, ToggleField, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor, TagInput, MetadataGrid, ProjectStatusBadge, ContentTypeBadge)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not recreate status dots inline; use the `<StatusBadge>` component from `@/components/ui/status-badge`

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-27T12:44:29.466Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from `web/src/components/ui/` (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem) before creating new ones

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-31T14:31:11.894Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:31:11.894Z
Learning: Applies to web/src/**/*.{ts,tsx} : Use React 19, TypeScript 6.0+, and design system tokens from shadcn/ui + Tailwind CSS 4 + Radix UI in web dashboard

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-15T21:49:53.264Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T21:49:53.264Z
Learning: Fix everything valid — never skip when review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes). No deferring, no 'out of scope' skipping.

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-01T05:46:17.064Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T05:46:17.064Z
Learning: Fix all valid issues found by review agents (including pre-existing issues in surrounding code, suggestions, and adjacent findings) — never skip or defer

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to web/src/**/*.{ts,tsx} : Web dashboard: ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and `@/lib/motion` presets.

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-30T10:41:40.176Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones; refer to design system inventory (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, etc.)

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Always reuse existing components from `web/src/components/ui/` before creating new ones; every page must compose from shared building blocks

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-30T10:20:08.544Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Web dashboard shadows/borders: use token variables (var(--so-shadow-card-hover), border-border, border-bright); never hardcode shadow or border values

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create metric displays with `text-metric font-bold` inline; use the `<MetricCard>` component from `@/components/ui/metric-card`

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.tsx : Place new shared components in `web/src/components/ui/` with descriptive kebab-case filenames

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-20T21:44:04.528Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (without parentheses) per PEP 758 for exception handling in Python 3.14

Applied to files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-16T07:22:28.134Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions

Applied to files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-31T21:07:37.470Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) per PEP 758 exception syntax on Python 3.14

Applied to files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-16T07:22:28.134Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (no parentheses) for exception handling — PEP 758 exception syntax enforced by ruff on Python 3.14

Applied to files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-14T15:43:05.601Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions

Applied to files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-14T16:18:57.267Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Use PEP 758 except syntax with `except A, B:` (no parentheses) for multiple exceptions—ruff enforces this on Python 3.14.

Applied to files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-14T16:18:57.267Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Handle errors explicitly—never silently swallow exceptions.

Applied to files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-19T21:11:37.538Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T21:11:37.538Z
Learning: Applies to **/*.py : Errors: handle explicitly, never silently swallow.

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-14T15:43:05.601Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) — enforced by ruff on Python 3.14

Applied to files:

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-04-02T21:38:30.127Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T21:38:30.127Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Handle errors explicitly; never silently swallow exceptions

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-26T15:18:16.848Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Handle errors explicitly, never silently swallow them. All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-16T20:14:00.937Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-19T07:13:44.964Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-20T08:28:32.845Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-04-01T15:36:39.993Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src/synthorg/**/*.py,tests/**/*.py,web/src/**/*.{ts,tsx}} : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small`

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-16T06:24:56.341Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/memory/**/*.py : Use MemoryBackend protocol with pluggable backends (Mem0 adapter available at backends/mem0/) for persistent agent memory

Applied to files:

tests/unit/memory/backends/inmemory/test_adapter.py
tests/unit/memory/backends/composite/test_adapter.py

📚 Learning: 2026-03-19T07:12:14.508Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

tests/unit/memory/backends/inmemory/test_adapter.py
tests/unit/memory/backends/composite/test_adapter.py

📚 Learning: 2026-03-31T21:07:37.470Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

tests/unit/memory/backends/inmemory/test_adapter.py

📚 Learning: 2026-04-02T07:18:02.381Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to tests/**/*.py : Never skip, dismiss, or ignore flaky tests; fix them fully and fundamentally by mocking `time.monotonic()` and `asyncio.sleep()` for determinism or using `asyncio.Event().wait()` for indefinite blocking

Applied to files:

tests/unit/memory/backends/inmemory/test_adapter.py

📚 Learning: 2026-03-20T21:44:04.528Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

tests/unit/memory/backends/inmemory/test_adapter.py

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to tests/**/*.py : Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()`. For indefinite blocking, use `asyncio.Event().wait()` instead of `asyncio.sleep(large_number)`.

Applied to files:

tests/unit/memory/backends/inmemory/test_adapter.py

📚 Learning: 2026-03-16T07:22:28.134Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

tests/unit/memory/backends/inmemory/test_adapter.py

📚 Learning: 2026-04-01T15:36:39.993Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to tests/**/*.py : Flaky tests: NEVER skip, dismiss, or ignore flaky tests -- always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

tests/unit/memory/backends/inmemory/test_adapter.py

📚 Learning: 2026-03-17T06:30:14.180Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/persistence/**/*.py : Persistence uses pluggable PersistenceBackend protocol. SQLite is the initial backend. Settings use SettingsRepository (namespaced settings CRUD).

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Persistence backend: pluggable PersistenceBackend protocol in `src/synthorg/persistence/`, SQLite initial, SettingsRepository (namespaced settings CRUD).

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-15T16:55:07.730Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) -- ruff enforces this on Python 3.14

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py

🔇 Additional comments (15)

web/src/pages/fine-tuning/RunHistoryTable.tsx (1)

74-76: Nice cleanup of table row rendering.

Extracting row markup into RunRow keeps the .map() body minimal and easier to maintain.

web/src/pages/fine-tuning/PipelineControlPanel.tsx (4)

1-11: LGTM!

Imports are well-organized with proper separation of React primitives, API types/constants, UI components from the shared library, and local component imports.

12-31: LGTM!

State setup uses the correct Zustand 5.0 pattern with useShallow for selective subscription. The batchSizeTouchedRef pattern correctly tracks user interaction without causing re-renders.

46-65: LGTM!

The isActive derivation and buildRequest function are well-implemented. The numeric validation correctly handles empty strings, NaN, and non-positive values before including optional fields in the request.

75-101: LGTM!

The JSX structure properly uses shared Button and InputField components with appropriate variants, design tokens for spacing (gap-section-gap, gap-4, gap-2), and conditional rendering for the Cancel/Start button based on isActive state.

tests/unit/memory/backends/composite/test_adapter.py (1)

1-510: Comprehensive test coverage for CompositeBackend.

The test module thoroughly covers:

Constructor validation (unknown backends in routes/default)

Protocol conformance

Lifecycle fan-out (connect/disconnect/health_check)

Namespace-based routing with fallback to default

Prefixed ID semantics (format, parsing, error cases)

Retrieve fan-out with graceful degradation on partial failures

Count aggregation and capabilities

The past review comment regarding the assertion strength in the degraded fanout test (line 386) has been addressed—now correctly asserts len(result) == 1.

tests/unit/memory/backends/inmemory/test_adapter.py (2)

1-542: Comprehensive test coverage for InMemoryBackend.

The test module thoroughly covers:

Protocol conformance and capability properties

Lifecycle (connect/disconnect/health_check) with idempotency tests

Connection guard for all CRUD operations

Store/Get with namespace preservation and per-agent isolation

Retrieve filtering (category, namespace, tags, text search, limit, ordering)

Count with category filtering

Max memories enforcement (constructor and runtime)

Expiry behavior (get/retrieve/count exclude expired entries)

Clear helper with agent isolation

The expiry tests (lines 461-512) correctly use model_copy(update=...) to work with frozen MemoryEntry models, and the approach of setting a past timestamp is deterministic without requiring time mocking.

463-477: Expiry test approach is sound.

Using model_copy(update={"expires_at": past}) to replace the stored entry is the correct pattern for frozen Pydantic models. Setting an explicit past timestamp (datetime.now(tz=UTC) - timedelta(hours=1)) is deterministic and avoids the complexity of time mocking for these straightforward expiration checks.
src/synthorg/memory/embedding/fine_tune.py (7)
1-40: LGTM!

Module docstring accurately reflects the 5-stage pipeline, imports are well-organized with TYPE_CHECKING guards for type-only imports, and logger setup follows project conventions.

58-95: LGTM!

Lazy import pattern with friendly error messages and proper logging before raising FineTuneDependencyError is well-implemented.

137-154: LGTM!

The _scan_documents function now properly catches UnicodeDecodeError explicitly and logs the skipped file with context, rather than silently ignoring invalid bytes.

196-227: LGTM!

Proper validation of validation_split bounds and minimum pair count, with split index calculation ensuring both training and validation sets are non-empty.

260-342: LGTM!

Hard negative mining implementation is correct with proper input validation, lazy imports, cancellation checks at reasonable intervals (every 50 queries), and numerically stable cosine similarity computation.

552-587: LGTM!

Cancellation checks are properly placed between each expensive encode() call (lines 562-563, 569-570, 579-580), addressing the cooperative-cancellation contract for the evaluation stage.

448-477: No action needed. The _progress_hook callback signature (score: float, epoch: int, steps: int) -> None matches the sentence-transformers API exactly as documented in the official source code and reference guide.
			> Likely an incorrect or invalid review comment.

coderabbitai · 2026-04-04T16:46:19Z

src/synthorg/memory/embedding/fine_tune.py

+Five-stage offline pipeline for domain-specific embedding fine-tuning:

 1. Synthetic data generation (LLM-powered, no manual annotation)


🧹 Nitpick | 🔵 Trivial

Module docstring claims LLM-powered generation, but implementation uses extractive fallback only.

The module docstring at line 5 states "Synthetic data generation (LLM-powered, no manual annotation)" but _generate_query() explicitly ignores llm_provider and always returns extractive fallback (line 248: _ = llm_provider). The function docstring correctly documents this limitation, but the module-level documentation is misleading.

Consider updating the module docstring to reflect actual behavior:

📝 Suggested docstring update

-1. Synthetic data generation (LLM-powered, no manual annotation) +1. Synthetic data generation (extractive fallback; LLM support planned)

Also applies to: 237-254

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/embedding/fine_tune.py` around lines 3 - 5, The module-level docstring incorrectly claims "LLM-powered" synthetic data generation while the implementation of _generate_query ignores llm_provider and always uses an extractive fallback; update the module docstring to state that synthetic data generation currently uses an extractive fallback (no LLM calls) and reference the limitation of _generate_query, or alternatively implement LLM usage inside _generate_query to match the docstring—ensure the module docstring and _generate_query behavior (and its docstring) are consistent.

coderabbitai · 2026-04-04T16:46:19Z

src/synthorg/memory/embedding/fine_tune.py

+            try:
+                sv = await settings_service.get("memory", key)
+                if sv and hasattr(sv, "value") and sv.value:
+                    backup[key] = sv.value
+            except MemoryError, RecursionError:
+                raise
+            except Exception:
+                logger.warning(
+                    MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED,
+                    key=key,
+                )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the exception type raised by SettingsService.get() when a setting is not found ast-grep --pattern $'async def get( self, $$$ ) -> $_: $$$ ' # Also check for custom exception types in the settings module rg -n 'class.*Error.*Exception' src/synthorg/settings/

Repository: Aureliolo/synthorg

Length of output: 50374

🏁 Script executed:

#!/bin/bash # Find SettingsService interface/protocol rg -n "class SettingsService\|class.*SettingsRepo\|def get" src/synthorg/persistence/settings* -A 5 # Also look for QueryError definition rg -n "class QueryError" src/synthorg/ # Check what's imported in settings_repo head -20 src/synthorg/persistence/sqlite/settings_repo.py

Repository: Aureliolo/synthorg

Length of output: 702

🏁 Script executed:

#!/bin/bash # Find SettingsService protocol/interface rg -n "class.*SettingsService\|SettingsRepository" src/synthorg/persistence/ --type py # Look at fine_tune.py to see settings_service type sed -n '1,50p' src/synthorg/memory/embedding/fine_tune.py # Check imports in fine_tune.py head -30 src/synthorg/memory/embedding/fine_tune.py | grep -E "^import|^from"

Repository: Aureliolo/synthorg

Length of output: 2010

🏁 Script executed:

#!/bin/bash # Find SettingsRepository protocol rg -n "protocol.*Settings\|class.*SettingsRepository" src/synthorg/persistence/ # Find deploy_checkpoint function to see the type of settings_service rg -n "async def deploy_checkpoint" src/synthorg/memory/embedding/fine_tune.py -A 3 # Find SettingsRepository in protocol files find src/synthorg/persistence -name "*.py" -type f | xargs grep -l "SettingsRepository" | head -5

Repository: Aureliolo/synthorg

Length of output: 345

🏁 Script executed:

#!/bin/bash # Find SettingsRepository protocol definition rg -n "class SettingsRepository" src/synthorg/persistence/protocol.py -A 30 # Also check for any method signatures rg -n "async def get" src/synthorg/persistence/protocol.py -B 2 -A 8

Repository: Aureliolo/synthorg

Length of output: 348

🏁 Script executed:

#!/bin/bash # Find SettingsRepository protocol more carefully rg -n "class SettingsRepository" src/synthorg/persistence/protocol.py -B 2 -A 50 | head -70 # Also check what settings_service is passed to deploy_checkpoint rg -n "settings_service.*:" src/synthorg/memory/embedding/fine_tune.py | head -10

Repository: Aureliolo/synthorg

Length of output: 303

🏁 Script executed:

#!/bin/bash # Get the exact code snippet being reviewed (lines 708-718) sed -n '708,718p' src/synthorg/memory/embedding/fine_tune.py # Also get broader context around it to understand the loop and settings_service usage sed -n '700,730p' src/synthorg/memory/embedding/fine_tune.py # Check if there's a SettingsRepository protocol in protocol.py grep -n "SettingsRepository\|class.*Settings" src/synthorg/persistence/protocol.py | head -20

Repository: Aureliolo/synthorg

Length of output: 1522

🏁 Script executed:

#!/bin/bash # Check if settings_service.get() actually returns tuple or an object with .value # Look at the actual SettingsRepository.get() method signature in protocol.py grep -n "async def get" src/synthorg/persistence/protocol.py -B 5 -A 15 | grep -A 15 "namespace" # Also check QueryError class hierarchy grep -n "class QueryError" src/synthorg/persistence/errors.py -A 5 # Check what exceptions are documented as possible from get() grep -n "Raises:" src/synthorg/persistence/sqlite/settings_repo.py -A 3

Repository: Aureliolo/synthorg

Length of output: 307

🏁 Script executed:

#!/bin/bash # Get the full SettingsRepository protocol definition sed -n '1,500p' src/synthorg/persistence/protocol.py | grep -A 30 "class SettingsRepository"

Repository: Aureliolo/synthorg

Length of output: 44

🏁 Script executed:

#!/bin/bash # Search more broadly for SettingsRepository rg "SettingsRepository" src/synthorg/persistence/protocol.py -B 3 -A 30 # Also check if there's a settings service protocol elsewhere find src/synthorg -name "*.py" -type f | xargs grep -l "SettingsRepository" | head -3

Repository: Aureliolo/synthorg

Length of output: 2553

🏁 Script executed:

#!/bin/bash # Find the actual SettingsRepository protocol definition rg "class SettingsRepository" src/synthorg/persistence/ -B 2 -A 30 # Also check the repositories.py file grep -n "SettingsRepository" src/synthorg/persistence/repositories.py -B 5 -A 20

Repository: Aureliolo/synthorg

Length of output: 3704

🏁 Script executed:

#!/bin/bash # Check the actual line 708-718 again and see what sv really is sed -n '708,725p' src/synthorg/memory/embedding/fine_tune.py # Search for other uses of settings_service.get() to see the pattern rg "settings_service\.get\(" src/ -B 2 -A 3 | head -40

Repository: Aureliolo/synthorg

Length of output: 4102

🏁 Script executed:

#!/bin/bash # Check if there's a different SettingsService (not Repository) that has different behavior rg "class SettingsService" src/ -A 15 # Check how the code actually uses the result - is .value actually valid somewhere? rg "\.value" src/synthorg/memory/embedding/fine_tune.py -B 3 -A 1 # Look at the git history or find where this pattern with .value comes from rg "hasattr.*value" src/synthorg/memory/ -B 2 -A 2

Repository: Aureliolo/synthorg

Length of output: 1904

🏁 Script executed:

#!/bin/bash # Find SettingsService.get() method rg "class SettingsService" src/synthorg/settings/service.py -A 100 | grep -A 30 "async def get" # Or just get the method directly sed -n '1,500p' src/synthorg/settings/service.py | grep -A 30 "async def get"

Repository: Aureliolo/synthorg

Length of output: 5095

🏁 Script executed:

#!/bin/bash # Find SettingNotFoundError definition and SettingValue class rg "class SettingNotFoundError\|class SettingValue" src/synthorg/settings/ -A 5 # Check what exceptions SettingsService.get() might raise besides SettingNotFoundError sed -n '1,300p' src/synthorg/settings/service.py | grep -A 50 "async def get" | head -70

Repository: Aureliolo/synthorg

Length of output: 2670

🏁 Script executed:

#!/bin/bash # Find SettingValue and SettingNotFoundError rg "class SettingValue\|class SettingNotFoundError\|class SettingsEncryptionError" src/synthorg/settings/ -A 3 # Understand the full flow - what exceptions can bubble up sed -n '1,150p' src/synthorg/settings/service.py | head -100

Repository: Aureliolo/synthorg

Length of output: 3519

🏁 Script executed:

#!/bin/bash # Find SettingValue class definition rg "class SettingValue" src/synthorg/settings/ -A 10 # Find all possible exceptions that could be raised rg "class.*Error.*Exception" src/synthorg/settings/errors.py -A 2

Repository: Aureliolo/synthorg

Length of output: 822

Backup-read exception handling is overly broad and may hide real failures.

Lines 712-718 catch all Exception types except MemoryError and RecursionError, including legitimate settings errors that indicate deployment cannot proceed safely. SettingsService.get() can raise SettingNotFoundError (missing key), SettingsEncryptionError (decryption failure), or QueryError (database/connectivity failure). All of these should be caught and logged, but catching the generic Exception class risks hiding unexpected errors.

Catch SettingsError (the base class for all settings-related exceptions) instead, allowing other unexpected exceptions to propagate and abort deployment.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/embedding/fine_tune.py` around lines 708 - 718, The current broad except block around the settings_service.get call swallows all Exceptions; change it to only catch settings-related errors by replacing the generic "except Exception" with "except SettingsError" (while still re-raising MemoryError and RecursionError) so SettingNotFoundError, SettingsEncryptionError, QueryError, etc. are logged via MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED and other unexpected exceptions can propagate; locate the try/except surrounding settings_service.get(...) in the fine_tune backup logic (references: settings_service.get, backup, MEMORY_FINE_TUNE_BACKUP_READ_SKIPPED) and update the except clause accordingly.

src/synthorg/memory/embedding/fine_tune.py

src/synthorg/persistence/sqlite/fine_tune_repo.py

coderabbitai · 2026-04-04T16:46:19Z

src/synthorg/persistence/sqlite/fine_tune_repo.py

+    def __init__(self, db: aiosqlite.Connection) -> None:
+        self._db = db


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In aiosqlite 0.22.1, if multiple coroutines share one Connection and each does await execute(...)followed later byawait commit()orawait rollback(), are transactions scoped to the connection so that statements from different coroutines can interleave into the same transaction?

💡 Result:

Yes, in aiosqlite 0.22.1, transactions are scoped to the connection. When multiple coroutines share one Connection and each calls await execute(...) followed later by await commit or await rollback, statements from different coroutines can interleave into the same transaction because all operations are serialized through a single shared request queue executed sequentially in a background thread.

Citations:

1: https://aiosqlite.omnilib.dev/en/v0.22.0/

2: https://github.com/omnilib/aiosqlite/blob/main/aiosqlite/core.py

3: https://aiosqlite.omnilib.dev/en/stable

4: commit parity with sqlite3 doesn't exist with conn.commit() omnilib/aiosqlite#286

5: https://stackoverflow.com/questions/53908615/reusing-aiosqlite-connection/53963739

🏁 Script executed:

#!/bin/bash # First, let's read the file to confirm the patterns described wc -l src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 115

🏁 Script executed:

# Read the __init__ method and overall structure head -120 src/synthorg/persistence/sqlite/fine_tune_repo.py | tail -30

Repository: Aureliolo/synthorg

Length of output: 1113

🏁 Script executed:

# Read a write method to see the execute/commit pattern sed -n '106,150p' src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 1946

🏁 Script executed:

# Check error handling in another write method sed -n '222,260p' src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 1544

🏁 Script executed:

# Check if there's any existing synchronization or transaction management rg -A 3 "def set_active|def save_run|async def.*await.*commit" src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 402

🏁 Script executed:

# Let's look at the set_active method to see how it handles transactions sed -n '301,350p' src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 2078

🏁 Script executed:

# Check all DML methods for transaction cleanup patterns rg -B 2 -A 10 "await self._db.commit" src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 3247

🏁 Script executed:

# Check for any rollback calls rg "rollback" src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 244

🏁 Script executed:

# Let's see the full mark_interrupted and delete_run methods sed -n '253,310p' src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 1864

🏁 Script executed:

# Let's check the set_active method fully to see the rollback pattern sed -n '407,450p' src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 1732

🏁 Script executed:

# Check if there's any locking or async context manager used elsewhere rg -B 3 -A 5 "asyncio.Lock|Semaphore|contextlib.suppress" src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 471

🏁 Script executed:

# Check how these repositories are instantiated and used rg -B 2 -A 2 "SQLiteFineTuneRunRepository\|SQLiteFineTuneCheckpointRepository" src/synthorg/persistence/ --type py

Repository: Aureliolo/synthorg

Length of output: 44

🏁 Script executed:

# Check the imports to see what's available head -30 src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 772

🏁 Script executed:

# Check how these repositories are used - are they called concurrently? rg "SQLiteFineTuneRunRepository\|SQLiteFineTuneCheckpointRepository" src/ --type py -B 2 -A 2

Repository: Aureliolo/synthorg

Length of output: 44

🏁 Script executed:

# Search for the instantiation point and how many instances exist rg "SQLiteFineTune.*Repository\(" src/ --type py -B 3 -A 3

Repository: Aureliolo/synthorg

Length of output: 2553

🏁 Script executed:

# Check if there's a shared connection pool or factory fd -t f "*.py" src/synthorg/persistence/ | xargs grep -l "aiosqlite\|Connection" | head -10

Repository: Aureliolo/synthorg

Length of output: 345

🏁 Script executed:

# Look at the actual startup/factory code find src/synthorg/persistence -name "*.py" -type f | xargs grep -l "async def.*init\|def.*init" | head -5

Repository: Aureliolo/synthorg

Length of output: 307

🏁 Script executed:

# Check if there's a background worker or other concurrent access pattern rg -i "background\|worker\|task.*fine.tune\|concurrent" src/synthorg/memory/embedding/ --type py -B 2 -A 2

Repository: Aureliolo/synthorg

Length of output: 44

🏁 Script executed:

# Look at memory integration to see if run repo is accessed from multiple places concurrently rg "SQLiteFineTuneRunRepository\|FineTuneRun" src/synthorg/memory/embedding/ --type py -l

Repository: Aureliolo/synthorg

Length of output: 44

🏁 Script executed:

# Check the database initialization to understand connection handling rg -B 5 -A 10 "aiosqlite.connect\|get_db_connection" src/synthorg/persistence/ --type py

Repository: Aureliolo/synthorg

Length of output: 44

🏁 Script executed:

# Verify the actual error handling - check deactivate_all which has no rollback sed -n '445,460p' src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 640

🏁 Script executed:

# Search more carefully for concurrent access patterns - look at fine_tune_pipeline or worker find src/synthorg -name "*.py" -type f | xargs grep -l "FineTuneRun\|fine_tune.*worker\|background" 2>/dev/null | head -10

Repository: Aureliolo/synthorg

Length of output: 485

🏁 Script executed:

# Look at the actual transaction semantics - does aiosqlite require explicit transaction begin? rg "BEGIN\|autocommit\|isolation" src/synthorg/persistence/sqlite/ --type py

Repository: Aureliolo/synthorg

Length of output: 44

🏁 Script executed:

# Count the write methods to verify the scope of the issue rg "async def (save|update|mark|delete|deactivate|set_active)" src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 388

🏁 Script executed:

# Check if the db connection is actually a shared singleton rg -B 10 -A 5 "get_db_connection\|Depends.*db\|provide.*db" src/synthorg/api/ --type py | head -50

Repository: Aureliolo/synthorg

Length of output: 44

🏁 Script executed:

# Look at a few write methods more closely to confirm no rollback in most sed -n '176,220p' src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 1597

🏁 Script executed:

# Check save_checkpoint for rollback presence sed -n '304,347p' src/synthorg/persistence/sqlite/fine_tune_repo.py

Repository: Aureliolo/synthorg

Length of output: 1952

Make transaction boundaries explicit on the shared SQLite connection.

Given this feature set has concurrent control APIs and background workers, multiple coroutines can access the same aiosqlite.Connection simultaneously. All write methods here (save_run, update_run, mark_interrupted, save_checkpoint, set_active, deactivate_all, delete_checkpoint) follow a vulnerable pattern: they await execute() and commit() separately on the same connection, allowing statements from different coroutines to interleave into the same transaction. This can cause one repository call to commit or roll back another call's work. Cleanup is also inconsistent—only set_active and delete_checkpoint attempt rollback on errors, while most other write methods skip cleanup entirely, leaving failed transactions to interfere with concurrent operations.

🛠️ Suggested direction

+import asyncio + class SQLiteFineTuneRunRepository: @@ - def __init__(self, db: aiosqlite.Connection) -> None: + def __init__( + self, + db: aiosqlite.Connection, + write_lock: asyncio.Lock, + ) -> None: self._db = db + self._write_lock = write_lock @@ async def save_run(self, run: FineTuneRun) -> None: """Persist a run (upsert semantics).""" - try: - await self._db.execute( - ... - ) - await self._db.commit() - except (sqlite3.Error, aiosqlite.Error) as exc: - msg = f"Failed to save fine-tune run {run.id}" - ... - raise QueryError(msg) from exc + async with self._write_lock: + try: + await self._db.execute( + ... + ) + await self._db.commit() + except (sqlite3.Error, aiosqlite.Error) as exc: + await self._db.rollback() + msg = f"Failed to save fine-tune run {run.id}" + ... + raise QueryError(msg) from exc

Apply the shared lock/rollback pattern to all seven DML methods in both repositories. The lock must be shared across all repository instances that use the same aiosqlite.Connection; a per-instance lock is insufficient.

Also applies to: 106–143, 176–213, 222–251, 253–291, 304–347, 407–443, 445–458

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/synthorg/persistence/sqlite/fine_tune_repo.py` around lines 103 - 104, All write methods (save_run, update_run, mark_interrupted, save_checkpoint, set_active, deactivate_all, delete_checkpoint) must acquire a single shared async lock for the underlying aiosqlite.Connection before starting execute()/commit() and perform a rollback on error; implement a module-level mapping (e.g., a WeakKeyDictionary or attach a lock attribute on the Connection keyed by the Connection object) so the same lock is used across all repository instances that share the same db connection, then wrap each write method with: await lock.acquire(); try: await self._db.execute(...); await self._db.commit(); except Exception: await self._db.rollback(); raise; finally: lock.release(); ensure imports for asyncio and weakref (or equivalent) are added and apply this pattern to all seven DML methods listed.

coderabbitai · 2026-04-04T16:46:19Z

src/synthorg/persistence/sqlite/fine_tune_repo.py

+                    run.started_at.isoformat(),
+                    run.updated_at.isoformat(),
+                    run.completed_at.isoformat() if run.completed_at else None,


⚠️ Potential issue | 🟠 Major

Normalize persisted datetimes to UTC before storing them as TEXT.

started_at and created_at are later ordered with plain ORDER BY ... DESC, but src/synthorg/persistence/sqlite/schema.sql:377-401 stores them as TEXT. Writing arbitrary timezone offsets via .isoformat() means lexical order stops matching chronological order, so get_active_run() and the list queries can return the wrong “latest” row when offsets differ.

🕒 Proposed fix

+def _serialize_ts(value: datetime | None) -> str | None: + if value is None: + return None + dt = value if value.tzinfo is not None else value.replace(tzinfo=UTC) + return dt.astimezone(UTC).isoformat() + @@ - run.started_at.isoformat(), - run.updated_at.isoformat(), - run.completed_at.isoformat() if run.completed_at else None, + _serialize_ts(run.started_at), + _serialize_ts(run.updated_at), + _serialize_ts(run.completed_at), @@ - run.updated_at.isoformat(), - run.completed_at.isoformat() if run.completed_at else None, + _serialize_ts(run.updated_at), + _serialize_ts(run.completed_at), @@ - checkpoint.created_at.isoformat(), + _serialize_ts(checkpoint.created_at),

Also applies to: 168-171, 207-210, 237-238, 334-334, 392-395

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/synthorg/persistence/sqlite/fine_tune_repo.py` around lines 129 - 131, The code persists run.started_at/created_at/updated_at/completed_at by calling .isoformat() which preserves timezone offsets and breaks lexical ordering of TEXT columns; update the persistence logic in fine_tune_repo.py to normalize datetimes to UTC before formatting (e.g., call .astimezone(datetime.timezone.utc) on run.started_at, run.created_at, run.updated_at and run.completed_at when building the VALUES/parameters) and then store the UTC ISO string (optionally with 'Z' for +00:00) so ORDER BY ... DESC on the TEXT column matches chronological order; apply this change to all places that serialize these fields (the code paths referencing run.started_at, run.created_at, run.updated_at, run.completed_at and any insert/update helpers used by get_active_run and list queries).

coderabbitai · 2026-04-04T16:46:19Z

web/src/pages/fine-tuning/PipelineControlPanel.tsx

+  // Clear stale preflight when sourceDir changes.
+  useEffect(() => {
+    useFineTuningStore.setState({ preflight: null })
+  }, [sourceDir])


🛠️ Refactor suggestion | 🟠 Major

Avoid direct setState for coordinated store fields.

Calling useFineTuningStore.setState({ preflight: null }) bypasses the store's action pattern. Per the store implementation, runPreflightCheck coordinates state updates with set({ loading: true, preflight: null, error: null }). Direct setState can race with WebSocket-driven updates and leaves loading/error in inconsistent states.

Consider adding a dedicated clearPreflight action to the store, or invoking an existing action that resets preflight state.

♻️ Proposed fix: Add store action

In web/src/stores/fine-tuning.ts, add a clearPreflight action:

clearPreflight: () => set({ preflight: null, error: null }),

Then in this component:

+ const { status, preflight, loading, startRun, cancelRun, runPreflightCheck, clearPreflight } = useFineTuningStore(useShallow((s) => ({ status: s.status, preflight: s.preflight, loading: s.loading, startRun: s.startRun, cancelRun: s.cancelRun, runPreflightCheck: s.runPreflightCheck, + clearPreflight: s.clearPreflight, }))) // Clear stale preflight when sourceDir changes. useEffect(() => { - useFineTuningStore.setState({ preflight: null }) + clearPreflight() }, [sourceDir])

Based on learnings: "The runPreflightCheck action properly manages preflight state using set({ loading: true, preflight: null, error: null }). Using direct setState() calls in components bypasses this coordinated state management and could overwrite concurrent updates from WebSocket handlers."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx` around lines 41 - 44, The component is directly calling useFineTuningStore.setState({ preflight: null }) which bypasses the store's coordinated actions and can create races; add a clearPreflight action to the fine-tuning store (e.g., clearPreflight: () => set({ preflight: null, error: null })) alongside existing runPreflightCheck, then replace the direct setState call in PipelineControlPanel with the new action (invoke clearPreflight via useFineTuningStore) so preflight is cleared through the store's action pattern.

web/src/pages/fine-tuning/PipelineControlPanel.tsx

web/src/pages/fine-tuning/RunHistoryTable.tsx

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/memory/embedding/test_fine_tune.py (1)

213-238: 🧹 Nitpick | 🔵 Trivial

Consider adding tests for other numeric parameter validations.

contrastive_fine_tune now validates batch_size, learning_rate, and temperature in addition to epochs. Consider adding tests for these validation paths:

async def test_rejects_zero_batch_size(self) -> None:
    with pytest.raises(ValueError, match="batch_size"):
        await contrastive_fine_tune(
            training_data_path="/data",
            base_model="test-model",
            output_dir="/output",
            batch_size=0,
        )

async def test_rejects_negative_learning_rate(self) -> None:
    with pytest.raises(ValueError, match="learning_rate"):
        await contrastive_fine_tune(
            training_data_path="/data",
            base_model="test-model",
            output_dir="/output",
            learning_rate=-0.001,
        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/memory/embedding/test_fine_tune.py` around lines 213 - 238, Add
unit tests to cover numeric validation paths for contrastive_fine_tune: inside
TestContrastiveFineTune add async tests that assert ValueError for invalid
batch_size, learning_rate, and temperature (e.g., test_rejects_zero_batch_size,
test_rejects_negative_learning_rate, test_rejects_invalid_temperature) by
calling contrastive_fine_tune with training_data_path, base_model, output_dir
and the offending parameter (batch_size=0, learning_rate=-0.001, temperature
values outside allowed range) and use pytest.raises(ValueError,
match="batch_size"/"learning_rate"/"temperature") to verify the error messages
reference the correct parameter names.

♻️ Duplicate comments (3)

src/synthorg/memory/embedding/fine_tune.py (1)
246-248: 🧹 Nitpick | 🔵 Trivial

Reminder: LLM-powered query generation is stubbed.

The llm_provider parameter is accepted but explicitly unused (_ = llm_provider). Stage 1 always uses extractive fallback. This is documented but worth noting for future implementation.

Would you like me to help implement LLM-powered query generation when a provider is available?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/embedding/fine_tune.py` around lines 246 - 248, The code
currently accepts llm_provider but ignores it ("_ = llm_provider"), forcing
Stage 1 to always use the extractive fallback; replace that stub with a
conditional path that, when llm_provider is provided, invokes the provider to
generate the query and otherwise falls back to the existing extractive logic.
Specifically, remove the placeholder assignment and add logic to call the
provider's generation method (e.g., llm_provider.generate_query or
llm_provider.create_query) with the same inputs used by the extractive path,
handle errors/timeouts and validate the returned query, and only use the
extractive fallback when llm_provider is None or the provider call fails — keep
the current extractive behavior as the fallback.
web/src/pages/fine-tuning/PipelineControlPanel.tsx (1)
41-44: ⚠️ Potential issue | 🟠 Major

Invalidate preflight against the full form state.

buildRequest() depends on showAdvanced, epochs, learningRate, and effectiveBatchSize, but Lines 41-44 only clear preflight when sourceDir changes. After an advanced-field edit, Line 95 can still enable or block Start with the previous result, and runPreflightCheck() in web/src/stores/fine-tuning.ts:115-124 can repopulate that stale result after the form has changed. Track preflight against the full request before rendering PreflightResultPanel or using can_proceed to gate startRun().

Also applies to: 48-72, 95-95, 103-103, 121-123
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx` around lines 41 - 44, The
preflight is only cleared on sourceDir changes (useEffect that calls
useFineTuningStore.setState({ preflight: null })), but buildRequest() depends on
showAdvanced, epochs, learningRate, and effectiveBatchSize so the preflight
becomes stale after edits; update the invalidation logic to clear preflight
whenever the full request shape changes (derive the full request via
buildRequest() or include showAdvanced/epochs/learningRate/effectiveBatchSize in
the useEffect dependency array) so PreflightResultPanel and can_proceed used by
startRun() always reflect the current form, and ensure runPreflightCheck() in
the fine-tuning store cannot repopulate an outdated preflight result after form
edits.
web/src/pages/fine-tuning/RunHistoryTable.tsx (1)
83-89: ⚠️ Potential issue | 🟡 Minor

Normalize seconds before choosing the display unit.

FineTuneRun.duration_seconds is number | null in web/src/api/endpoints/fine-tuning.ts, so Line 84 can receive fractional values. Because the <60 check happens before rounding, values like 59.6 still render as 60s. Round/clamp once up front and branch on that normalized value instead.
Proposed fix
 function formatDuration(seconds: number): string {
-  if (seconds < 60) return `${Math.round(seconds)}s`
-  if (seconds < 3600) return `${Math.floor(seconds / 60)}m`
-  const h = Math.floor(seconds / 3600)
-  const m = Math.floor((seconds % 3600) / 60)
+  const totalSeconds = Math.max(0, Math.round(seconds))
+  if (totalSeconds < 60) return `${totalSeconds}s`
+  if (totalSeconds < 3600) return `${Math.floor(totalSeconds / 60)}m`
+  const h = Math.floor(totalSeconds / 3600)
+  const m = Math.floor((totalSeconds % 3600) / 60)
   return `${h}h ${m}m`
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/pages/fine-tuning/RunHistoryTable.tsx` around lines 83 - 89,
formatDuration may receive fractional seconds from FineTuneRun.duration_seconds
(number | null); normalize the input once at the top of formatDuration (e.g.,
clamp to >=0 and Math.round) and use that normalized value for all comparisons
and outputs so values like 59.6 round to 60 and render as "1m". Update the
function referenced as formatDuration to compute a normalized seconds variable
and then use it for the <60, <3600 checks and for computing h/m/s.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/unit/memory/embedding/test_fine_tune.py`:
- Around line 213-238: Add unit tests to cover numeric validation paths for
contrastive_fine_tune: inside TestContrastiveFineTune add async tests that
assert ValueError for invalid batch_size, learning_rate, and temperature (e.g.,
test_rejects_zero_batch_size, test_rejects_negative_learning_rate,
test_rejects_invalid_temperature) by calling contrastive_fine_tune with
training_data_path, base_model, output_dir and the offending parameter
(batch_size=0, learning_rate=-0.001, temperature values outside allowed range)
and use pytest.raises(ValueError,
match="batch_size"/"learning_rate"/"temperature") to verify the error messages
reference the correct parameter names.

---

Duplicate comments:
In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 246-248: The code currently accepts llm_provider but ignores it
("_ = llm_provider"), forcing Stage 1 to always use the extractive fallback;
replace that stub with a conditional path that, when llm_provider is provided,
invokes the provider to generate the query and otherwise falls back to the
existing extractive logic. Specifically, remove the placeholder assignment and
add logic to call the provider's generation method (e.g.,
llm_provider.generate_query or llm_provider.create_query) with the same inputs
used by the extractive path, handle errors/timeouts and validate the returned
query, and only use the extractive fallback when llm_provider is None or the
provider call fails — keep the current extractive behavior as the fallback.

In `@web/src/pages/fine-tuning/PipelineControlPanel.tsx`:
- Around line 41-44: The preflight is only cleared on sourceDir changes
(useEffect that calls useFineTuningStore.setState({ preflight: null })), but
buildRequest() depends on showAdvanced, epochs, learningRate, and
effectiveBatchSize so the preflight becomes stale after edits; update the
invalidation logic to clear preflight whenever the full request shape changes
(derive the full request via buildRequest() or include
showAdvanced/epochs/learningRate/effectiveBatchSize in the useEffect dependency
array) so PreflightResultPanel and can_proceed used by startRun() always reflect
the current form, and ensure runPreflightCheck() in the fine-tuning store cannot
repopulate an outdated preflight result after form edits.

In `@web/src/pages/fine-tuning/RunHistoryTable.tsx`:
- Around line 83-89: formatDuration may receive fractional seconds from
FineTuneRun.duration_seconds (number | null); normalize the input once at the
top of formatDuration (e.g., clamp to >=0 and Math.round) and use that
normalized value for all comparisons and outputs so values like 59.6 round to 60
and render as "1m". Update the function referenced as formatDuration to compute
a normalized seconds variable and then use it for the <60, <3600 checks and for
computing h/m/s.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 28f2d71f-24fe-495f-8ab7-0819bc979655

📥 Commits

Reviewing files that changed from the base of the PR and between c1a6650 and 22831d8.

📒 Files selected for processing (5)

src/synthorg/memory/embedding/fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py
tests/unit/memory/embedding/test_fine_tune.py
web/src/pages/fine-tuning/PipelineControlPanel.tsx
web/src/pages/fine-tuning/RunHistoryTable.tsx

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Build Backend
GitHub Check: Build Sandbox
GitHub Check: Build Web
GitHub Check: Test (Python 3.14)
GitHub Check: Dashboard Test
GitHub Check: Dependency Review
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (7)

web/src/{components,pages}/**/*.tsx

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/{components,pages}/**/*.tsx: Always reuse existing components from web/src/components/ui/ before creating new ones; every page must compose from shared building blocks
Do not recreate status dots inline; use the <StatusBadge> component from @/components/ui/status-badge
Do not build card-with-header layouts from scratch; use the <SectionCard> component from @/components/ui/section-card
Do not create metric displays with text-metric font-bold inline; use the <MetricCard> component from @/components/ui/metric-card
Do not render initials circles manually; use the <Avatar> component from @/components/ui/avatar
Do not create complex (>8 line) JSX inside .map() blocks; extract to a shared component instead

Files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx}: Use Tailwind semantic classes (e.g., text-foreground, bg-card, text-accent, text-success, bg-danger) or CSS variables (e.g., var(--so-accent)) for colors; never hardcode hex values in .tsx/.ts files
Use font-sans or font-mono for typography; never set fontFamily directly
Use density-aware tokens (e.g., p-card, gap-section-gap, gap-grid-gap) or standard Tailwind spacing; never hardcode pixel values for layout spacing
Use token variables (e.g., var(--so-shadow-card-hover), border-border, border-bright) for shadows and borders; never hardcode values
Import cn from @/lib/utils for conditional class merging in component files
Do not use rgba() with hardcoded values; use design token variables instead
Do not use hardcoded Framer Motion transition durations; use presets from @/lib/motion instead

web/src/**/*.{ts,tsx}: Web dashboard: ALWAYS reuse existing components from web/src/components/ui/ before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and @/lib/motion presets.
Web dashboard: design token rules enforced by PostToolUse hook (scripts/check_web_design_system.py)
TypeScript/JavaScript: Node.js 22+, TypeScript 6.0+, ESLint with zero warnings enforced on web/src/**/*.{ts,tsx}
Web dashboard YAML parsing: use js-yaml library (in package.json)

Files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

web/src/**/*.{ts,tsx,css}

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard styling: Tailwind CSS 4 for utility classes; never hardcode pixel values or colors

Files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ required; no from __future__ import annotations (PEP 649 native lazy annotations)
Use PEP 758 except syntax: except A, B: (no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time
Use @computed_field for derived values in Pydantic models instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Files:

tests/unit/memory/embedding/test_fine_tune.py
src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Vendor names in test files must use test-provider, test-small-001, etc.
Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing: asyncio_mode = "auto" in conftest -- no manual @pytest.mark.asyncio needed
Default timeout: 30 seconds per test (global in pyproject.toml). Non-default overrides like timeout(60) are allowed but avoid adding per-file markers.
Prefer @pytest.mark.parametrize for testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given + @settings). Profiles in tests/conftest.py: ci (deterministic, 10 examples), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples). Controlled via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit @example(...) decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep(). For indefinite blocking, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

tests/unit/memory/embedding/test_fine_tune.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/memory/embedding/test_fine_tune.py

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr) for handler construction
Logger variable name must always be logger (not _logger, not log)
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Structured kwargs in logging: always logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging for object creation, internal flow, entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names allowed only in: Operations design page, .claude/ files, third-party import paths, and src/synthorg/providers/presets.py

Files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

🧠 Learnings (48)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

📚 Learning: 2026-03-27T22:32:26.927Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T22:32:26.927Z
Learning: Applies to web/src/components/ui/*.{tsx,ts} : For new shared React components: place in web/src/components/ui/ with kebab-case filename, create .stories.tsx with all states, export props as TypeScript interface, use design tokens exclusively

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.stories.tsx : Create a `.stories.tsx` file alongside each new shared component in `web/src/components/ui/` with all states (default, hover, loading, error, empty)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:20:08.544Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from web/src/components/ui/ (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast/ToastContainer, Skeleton variants, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem, Drawer, form fields, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor) before creating new components

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-30T10:41:40.176Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/components/ui/**/*.{ts,tsx} : Create new shared components in `web/src/components/ui/` with `.stories.tsx` Storybook file covering all states (default, hover, loading, error, empty)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create complex (>8 line) JSX inside `.map()` blocks; extract to a shared component instead

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-02T12:21:16.739Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : Do NOT create complex (>8 line) JSX inside `.map()` -- extract to a shared component

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:41:40.176Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside `.map()`; extract to a shared component

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:20:08.544Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside .map()—extract to a shared component

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-02T12:21:16.739Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup, Drawer, InputField, SelectField, SliderField, ToggleField, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor, TagInput, MetadataGrid, ProjectStatusBadge, ContentTypeBadge)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not recreate status dots inline; use the `<StatusBadge>` component from `@/components/ui/status-badge`

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx
web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-27T12:44:29.466Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from `web/src/components/ui/` (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem) before creating new ones

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-15T21:49:53.264Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T21:49:53.264Z
Learning: Fix everything valid — never skip when review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes). No deferring, no 'out of scope' skipping.

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-01T05:46:17.064Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T05:46:17.064Z
Learning: Fix all valid issues found by review agents (including pre-existing issues in surrounding code, suggestions, and adjacent findings) — never skip or defer

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to web/src/**/*.{ts,tsx} : Web dashboard: ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and `@/lib/motion` presets.

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-30T10:41:40.176Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones; refer to design system inventory (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, etc.)

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Always reuse existing components from `web/src/components/ui/` before creating new ones; every page must compose from shared building blocks

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-30T10:20:08.544Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Web dashboard shadows/borders: use token variables (var(--so-shadow-card-hover), border-border, border-bright); never hardcode shadow or border values

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create metric displays with `text-metric font-bold` inline; use the `<MetricCard>` component from `@/components/ui/metric-card`

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.tsx : Place new shared components in `web/src/components/ui/` with descriptive kebab-case filenames

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-27T12:44:29.466Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/stores/**/*.{ts,tsx} : Use Zustand stores in web dashboard for state management (auth, WebSocket, toast, analytics, domain shells)

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-30T10:41:40.176Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Use Zustand stores for state management in the web dashboard; each domain has its own store module (auth, WebSocket, toast, analytics, setup, company, agents, budget, tasks, settings, providers, theme, per-domain stores)

Applied to files:

web/src/pages/fine-tuning/PipelineControlPanel.tsx

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Test markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Coverage: 80% minimum. Async: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Timeout: 30 seconds per test. Parallelism: `pytest-xdist` via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-17T06:30:14.180Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/persistence/**/*.py : Persistence uses pluggable PersistenceBackend protocol. SQLite is the initial backend. Settings use SettingsRepository (namespaced settings CRUD).

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Persistence backend: pluggable PersistenceBackend protocol in `src/synthorg/persistence/`, SQLite initial, SettingsRepository (namespaced settings CRUD).

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-31T21:07:37.470Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) per PEP 758 exception syntax on Python 3.14

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-16T07:22:28.134Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (no parentheses) for exception handling — PEP 758 exception syntax enforced by ruff on Python 3.14

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-20T21:44:04.528Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (without parentheses) per PEP 758 for exception handling in Python 3.14

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-14T15:43:05.601Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) — enforced by ruff on Python 3.14

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-14T16:18:57.267Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Use PEP 758 except syntax with `except A, B:` (no parentheses) for multiple exceptions—ruff enforces this on Python 3.14.

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-15T16:55:07.730Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) -- ruff enforces this on Python 3.14

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-14T16:18:57.267Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Handle errors explicitly—never silently swallow exceptions.

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-14T15:43:05.601Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-16T07:22:28.134Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py
src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-31T21:07:37.470Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

src/synthorg/persistence/sqlite/fine_tune_repo.py

📚 Learning: 2026-03-19T21:11:37.538Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T21:11:37.538Z
Learning: Applies to **/*.py : Errors: handle explicitly, never silently swallow.

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-04-02T21:38:30.127Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T21:38:30.127Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Handle errors explicitly; never silently swallow exceptions

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-26T15:18:16.848Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Handle errors explicitly, never silently swallow them. All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-16T20:14:00.937Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-19T07:13:44.964Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-04-01T14:22:06.315Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T14:22:06.315Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-20T08:28:32.845Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-04-01T15:36:39.993Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src/synthorg/**/*.py,tests/**/*.py,web/src/**/*.{ts,tsx}} : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small`

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

📚 Learning: 2026-03-19T07:12:14.508Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/backup/**/*.py : Backup package (backup/): scheduled/manual/lifecycle backups of persistence DB, agent memory, company config. BackupService orchestrator, BackupScheduler (periodic asyncio task), RetentionManager (count + age pruning), tar.gz compression, SHA-256 checksums, manifest tracking, validated restore with atomic rollback and safety backup. handlers/ subpackage: ComponentHandler protocol + concrete handlers (PersistenceComponentHandler, MemoryComponentHandler, ConfigComponentHandler)

Applied to files:

src/synthorg/memory/embedding/fine_tune.py

🔇 Additional comments (30)

src/synthorg/persistence/sqlite/fine_tune_repo.py (8)
152-159: Use PEP 758 except syntax without parentheses.

All except (sqlite3.Error, aiosqlite.Error) as exc: clauses should use the comma-separated form without parentheses per PEP 758 on Python 3.14.
-        except (sqlite3.Error, aiosqlite.Error) as exc:
+        except sqlite3.Error, aiosqlite.Error as exc:
Apply this pattern to all occurrences throughout the file.

As per coding guidelines: "Use except A, B: syntax (no parentheses) for exception handling — PEP 758 exception syntax enforced by ruff on Python 3.14."

Also applies to: 169-176, 192-198, 229-235, 260-267, 295-301, 356-363, 376-383, 414-420, 450-459, 468-474, 516-523, 534-540

78-80: Use PEP 758 except syntax in row mappers.

The exception handling in _run_from_row and _checkpoint_from_row uses parenthesized form. Apply the comma-separated PEP 758 syntax:
-    except (ValueError, TypeError, json.JSONDecodeError) as exc:
+    except ValueError, TypeError, json.JSONDecodeError as exc:
As per coding guidelines: "Use PEP 758 except syntax: except A, B: (no parentheses) — enforced by ruff on Python 3.14."

Also applies to: 107-109

145-147: Normalize timestamps to UTC before storing.

The .isoformat() calls preserve arbitrary timezone offsets. Since timestamps are stored as TEXT and sorted lexically via ORDER BY ... DESC, different offsets will break chronological ordering.
🕒 Suggested helper
+def _serialize_ts(value: datetime | None) -> str | None:
+    """Serialize datetime to UTC ISO-8601 string."""
+    if value is None:
+        return None
+    dt = value if value.tzinfo is not None else value.replace(tzinfo=UTC)
+    return dt.astimezone(UTC).isoformat()
Then replace all .isoformat() calls with _serialize_ts().
119-120: Transaction boundaries need explicit synchronization.

Multiple coroutines sharing the same aiosqlite.Connection can interleave their execute()/commit() calls into the same transaction. Additionally, most write methods lack rollback on error (only set_active and delete_checkpoint roll back), leaving failed transactions to affect concurrent operations.

Consider adding a shared asyncio.Lock for all write operations on the same connection, and ensure consistent rollback on all error paths.

26-35: LGTM: Active stages tuple is well-defined.

The _ACTIVE_STAGES tuple correctly derives from FineTuneStage enum, excluding terminal states (IDLE, COMPLETE, FAILED). This ensures get_active_run() and mark_interrupted() target the correct stage values.

40-56: LGTM: Timestamp parsing and row mappers with proper error handling.

The timestamp helpers (_parse_ts, _parse_required_ts) correctly handle UTC fallback for naive datetimes. The row mappers (_run_from_row, _checkpoint_from_row) now properly wrap parsing failures as QueryError, preventing raw exceptions from bubbling out.

Also applies to: 58-81, 83-109

423-459: LGTM: set_active uses proper atomic deactivate-then-activate with rollback.

The set_active method correctly:

Deactivates all checkpoints

Activates the target checkpoint

Checks rowcount to verify the checkpoint exists

Rolls back on both logical failure (checkpoint not found) and DB errors

476-523: LGTM: delete_checkpoint has TOCTOU protection.

The DELETE ... WHERE id = ? AND is_active = 0 guard combined with the rowcount check properly handles the race condition where a checkpoint becomes active between the SELECT and DELETE. The rollback is correctly called before raising.
src/synthorg/memory/embedding/fine_tune.py (12)

712-718: Backup-read exception handling is overly broad.

The code correctly re-raises MemoryError and RecursionError, but catching generic Exception can hide real settings failures like SettingsEncryptionError or QueryError that should abort deployment rather than just skip the backup key.

Consider catching SettingsError (the base class for settings-related exceptions) instead of Exception.

1-14: LGTM: Module docstring accurately reflects implementation.

The docstring correctly documents the 5-stage pipeline and explicitly notes that synthetic data generation uses extractive fallback with LLM path stubbed. This addresses the previous inconsistency concern.

61-76: LGTM: Lazy dependency imports with friendly errors.

The _import_sentence_transformers() and _import_torch() helpers properly:

Attempt lazy import

Log the missing dependency with structured event

Raise FineTuneDependencyError with install instructions

Also applies to: 79-94

143-151: LGTM: UTF-8 decode errors handled explicitly.

The _scan_documents function now catches UnicodeDecodeError explicitly, logs with context, and skips the problematic file rather than silently dropping bytes with errors="ignore".

196-227: LGTM: Validation split and minimum pairs properly validated.

The function now:

Rejects validation_split outside (0, 1) exclusive range

Requires at least 2 query-document pairs

Clamps split_idx to ensure both train and validation sets have at least 1 item

237-254: LGTM: Extractive query generation with proper bounds.

The _generate_query function now:

Handles empty first by falling back to chunk[:100]

Caps length to 200 characters

Correctly documents that llm_provider is unused (forward compatibility)

260-342: LGTM: Hard negative mining implementation.

The function correctly:

Lazy-imports sentence-transformers

Encodes all passages and queries

Selects hard negatives using a margin-based approach (95% of positive similarity)

Checks cancellation periodically (every 50 queries)

Reports progress incrementally

421-432: LGTM: All numeric training parameters validated.

The function now validates all numeric parameters before use:

epochs >= 1

batch_size >= 1

learning_rate > 0

temperature > 0

This addresses the previous concern about invalid values reaching the optimizer.

448-458: LGTM: Cancellation checked during training.

The _progress_hook callback checks cancellation every 10 steps, enabling cooperative cancellation during the training loop.

552-587: LGTM: Cancellation checks between evaluation phases.

The function now checks cancellation between each expensive encoding phase:

After loading models (line 552)

After fine-tuned query encoding (line 562)

After fine-tuned passage encoding (line 569)

After base query encoding (line 579)

This addresses the previous concern about long-running evaluation not observing cancel requests.

736-742: LGTM: settings_service.set() call is correct.

The call passes only the three required positional arguments (namespace, key, value), which matches the SettingsService.set() signature. The optional expected_updated_at is not needed here.

720-734: LGTM: Backup file only written when deployment proceeds.

The early return at line 727 correctly avoids writing backup_config.json when no settings_service is available. The backup file is only written (lines 729-734) when deployment will actually update settings.

tests/unit/memory/embedding/test_fine_tune.py (8)

25-36: LGTM: Stage enum test includes EVALUATING.

The test correctly verifies all FineTuneStage values including the new EVALUATING stage.

41-78: LGTM: Helper function tests cover edge cases.

Tests for _chunk_text, _generate_query, and _scan_documents appropriately cover:

Basic functionality

Empty inputs

Edge cases (empty files, non-text files)

100-134: LGTM: Training data generation tests are comprehensive.

Tests cover:

Empty source directory error

Correct JSONL output format with required fields (query, positive_passage)

Validation split producing both training and validation files

154-169: LGTM: Cancellation test uses pre-cancelled token.

The test correctly verifies that a pre-cancelled CancellationToken raises FineTuneCancelledError immediately.

189-207: LGTM: Dependency error test properly mocks lazy import.

The test patches _import_sentence_transformers to raise FineTuneDependencyError, verifying the error propagates correctly without requiring the actual ML dependencies.

244-279: LGTM: Evaluation tests cover validation paths.

Tests verify:

Blank checkpoint_path rejection

Empty validation data raises ValueError with "empty" message

301-311: LGTM: Deploy test verifies no backup without settings_service.

The test correctly asserts that:

deploy_checkpoint returns None when no settings_service is provided

No backup_config.json file is created in this case

317-337: LGTM: Metrics computation tests verify boundary conditions.

The tests correctly verify:

Perfect ranking: Identity embeddings yield NDCG@10 = 1.0 and Recall@10 = 1.0 (each query's ground truth passage is at rank 1)

Random embeddings: Scores are bounded within [0.0, 1.0]

web/src/pages/fine-tuning/RunHistoryTable.tsx (2)

20-76: Good decomposition of the row rendering.

Pulling the row markup out of runs.map() keeps the table body easy to scan, and reusing StatusBadge matches the existing UI primitives.

As per coding guidelines "Do not create complex (>8 line) JSX inside .map() blocks; extract to a shared component instead" and "Do not recreate status dots inline; use the <StatusBadge> component from @/components/ui/status-badge".

9-18: Nice use of exhaustive stage maps.

Typing both lookup tables with FineTuneStage keeps the badge state and label rendering aligned if the backend adds a new stage.

Also applies to: 91-104

- Tests for batch_size=0, learning_rate<0, temperature=0 rejection - formatDuration normalizes fractional seconds at top (59.6 -> 1m)

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/memory/embedding/test_fine_tune.py`:
- Around line 227-277: Consolidate the four nearly identical tests into one
parametrized test using pytest.mark.parametrize: replace the separate async
test_rejects_* functions with a single async test (e.g.,
test_rejects_invalid_hyperparameters) that parametrizes tuples of (param_name,
invalid_value, match_string) for ("epochs", 0, "epochs"), ("batch_size", 0,
"batch_size"), ("learning_rate", -0.001, "learning_rate"), and ("temperature",
0.0, "temperature"), and inside the test call contrastive_fine_tune with the
given param_name set to invalid_value and assert with pytest.raises(ValueError,
match=match_string); keep imports of contrastive_fine_tune and ensure await is
used for the async call.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e129d0f-a311-480a-9d4a-5b9d72ae0415

📥 Commits

Reviewing files that changed from the base of the PR and between 22831d8 and aac735b.

📒 Files selected for processing (2)

tests/unit/memory/embedding/test_fine_tune.py
web/src/pages/fine-tuning/RunHistoryTable.tsx

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test (Python 3.14)
GitHub Check: Dashboard Test
GitHub Check: Build Backend
GitHub Check: Build Web
GitHub Check: Dependency Review
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (5)

web/src/{components,pages}/**/*.tsx

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/{components,pages}/**/*.tsx: Always reuse existing components from web/src/components/ui/ before creating new ones; every page must compose from shared building blocks
Do not recreate status dots inline; use the <StatusBadge> component from @/components/ui/status-badge
Do not build card-with-header layouts from scratch; use the <SectionCard> component from @/components/ui/section-card
Do not create metric displays with text-metric font-bold inline; use the <MetricCard> component from @/components/ui/metric-card
Do not render initials circles manually; use the <Avatar> component from @/components/ui/avatar
Do not create complex (>8 line) JSX inside .map() blocks; extract to a shared component instead

Files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx}: Use Tailwind semantic classes (e.g., text-foreground, bg-card, text-accent, text-success, bg-danger) or CSS variables (e.g., var(--so-accent)) for colors; never hardcode hex values in .tsx/.ts files
Use font-sans or font-mono for typography; never set fontFamily directly
Use density-aware tokens (e.g., p-card, gap-section-gap, gap-grid-gap) or standard Tailwind spacing; never hardcode pixel values for layout spacing
Use token variables (e.g., var(--so-shadow-card-hover), border-border, border-bright) for shadows and borders; never hardcode values
Import cn from @/lib/utils for conditional class merging in component files
Do not use rgba() with hardcoded values; use design token variables instead
Do not use hardcoded Framer Motion transition durations; use presets from @/lib/motion instead

web/src/**/*.{ts,tsx}: Web dashboard: ALWAYS reuse existing components from web/src/components/ui/ before creating new ones. NEVER hardcode hex colors, font-family, pixel spacing, or Framer Motion transitions -- use design tokens and @/lib/motion presets.
Web dashboard: design token rules enforced by PostToolUse hook (scripts/check_web_design_system.py)
TypeScript/JavaScript: Node.js 22+, TypeScript 6.0+, ESLint with zero warnings enforced on web/src/**/*.{ts,tsx}
Web dashboard YAML parsing: use js-yaml library (in package.json)

Files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

web/src/**/*.{ts,tsx,css}

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard styling: Tailwind CSS 4 for utility classes; never hardcode pixel values or colors

Files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ required; no from __future__ import annotations (PEP 649 native lazy annotations)
Use PEP 758 except syntax: except A, B: (no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time
Use @computed_field for derived values in Pydantic models instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Files:

tests/unit/memory/embedding/test_fine_tune.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Vendor names in test files must use test-provider, test-small-001, etc.
Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing: asyncio_mode = "auto" in conftest -- no manual @pytest.mark.asyncio needed
Default timeout: 30 seconds per test (global in pyproject.toml). Non-default overrides like timeout(60) are allowed but avoid adding per-file markers.
Prefer @pytest.mark.parametrize for testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given + @settings). Profiles in tests/conftest.py: ci (deterministic, 10 examples), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples). Controlled via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit @example(...) decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep(). For indefinite blocking, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

tests/unit/memory/embedding/test_fine_tune.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/memory/embedding/test_fine_tune.py

🧠 Learnings (16)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

📚 Learning: 2026-03-27T22:32:26.927Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T22:32:26.927Z
Learning: Applies to web/src/components/ui/*.{tsx,ts} : For new shared React components: place in web/src/components/ui/ with kebab-case filename, create .stories.tsx with all states, export props as TypeScript interface, use design tokens exclusively

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/components/ui/**/*.stories.tsx : Create a `.stories.tsx` file alongside each new shared component in `web/src/components/ui/` with all states (default, hover, loading, error, empty)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-27T12:44:29.466Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/stores/**/*.{ts,tsx} : Use Zustand stores in web dashboard for state management (auth, WebSocket, toast, analytics, domain shells)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:41:40.176Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/components/ui/**/*.{ts,tsx} : Create new shared components in `web/src/components/ui/` with `.stories.tsx` Storybook file covering all states (default, hover, loading, error, empty)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:20:08.544Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from web/src/components/ui/ (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast/ToastContainer, Skeleton variants, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem, Drawer, form fields, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor) before creating new components

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-27T12:44:29.466Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-27T12:44:29.466Z
Learning: Applies to web/src/**/*.{ts,tsx} : Always reuse existing components from `web/src/components/ui/` (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup/StaggerItem) before creating new ones

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create metric displays with `text-metric font-bold` inline; use the `<MetricCard>` component from `@/components/ui/metric-card`

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-31T14:31:11.894Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:31:11.894Z
Learning: Applies to web/src/**/*.{ts,tsx} : Use React 19, TypeScript 6.0+, and design system tokens from shadcn/ui + Tailwind CSS 4 + Radix UI in web dashboard

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:41:40.176Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Use Zustand stores for state management in the web dashboard; each domain has its own store module (auth, WebSocket, toast, analytics, setup, company, agents, budget, tasks, settings, providers, theme, per-domain stores)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not recreate status dots inline; use the `<StatusBadge>` component from `@/components/ui/status-badge`

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-04T10:51:58.893Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-04T10:51:58.893Z
Learning: Applies to web/src/{components,pages}/**/*.tsx : Do not create complex (>8 line) JSX inside `.map()` blocks; extract to a shared component instead

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-02T12:21:16.739Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : Do NOT create complex (>8 line) JSX inside `.map()` -- extract to a shared component

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:41:40.176Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:41:40.176Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside `.map()`; extract to a shared component

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-03-30T10:20:08.544Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-30T10:20:08.544Z
Learning: Applies to web/src/**/*.{ts,tsx} : Do NOT create complex (>8 line) JSX inside .map()—extract to a shared component

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

📚 Learning: 2026-04-02T12:21:16.739Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: web/CLAUDE.md:0-0
Timestamp: 2026-04-02T12:21:16.739Z
Learning: Applies to web/src/**/*.{tsx,ts} : ALWAYS reuse existing components from `web/src/components/ui/` before creating new ones (StatusBadge, MetricCard, Sparkline, SectionCard, AgentCard, DeptHealthBar, ProgressGauge, StatPill, Avatar, Button, Toast, Skeleton, EmptyState, ErrorBoundary, ConfirmDialog, CommandPalette, InlineEdit, AnimatedPresence, StaggerGroup, Drawer, InputField, SelectField, SliderField, ToggleField, TaskStatusIndicator, PriorityBadge, ProviderHealthBadge, TokenUsageBar, CodeMirrorEditor, SegmentedControl, ThemeToggle, LiveRegion, MobileUnsupportedOverlay, LazyCodeMirrorEditor, TagInput, MetadataGrid, ProjectStatusBadge, ContentTypeBadge)

Applied to files:

web/src/pages/fine-tuning/RunHistoryTable.tsx

🔇 Additional comments (9)

web/src/pages/fine-tuning/RunHistoryTable.tsx (3)

20-47: Good row extraction and shared UI usage.

RunRow keeps the table rendering clean, and status rendering correctly reuses the shared <StatusBadge> component.

As per coding guidelines "Do not recreate status dots inline; use the <StatusBadge> component from @/components/ui/status-badge".

49-80: Table/empty-state flow is clean and maintainable.

Empty-state handling is straightforward, and the .map() body stays minimal by delegating row JSX to RunRow.

As per coding guidelines "Do not create complex (>8 line) JSX inside .map() blocks; extract to a shared component instead".

83-105: Formatting helpers look correct and consistent.

formatDuration avoids minute rollover edge cases, and stage label mapping remains explicit and easy to reason about.

tests/unit/memory/embedding/test_fine_tune.py (6)

32-35: Stage enum assertion update looks correct.

The added EVALUATING assertion matches the current pipeline lifecycle and keeps enum coverage complete.

41-79: Helper-function test coverage is solid.

The chunk/query/scan tests cover empty, nominal, and filtering behavior well without over-mocking.

100-170: Stage 1 tests now validate critical runtime behaviors well.

Great additions for no-document failure, artifact creation, progress callback completion, and cooperative cancellation.

298-318: Empty-validation-data guard test is well targeted.

Patching _read_jsonl to [] directly validates the intended failure path in evaluation.

330-350: Deploy tests correctly cover invalid path and no-settings-service behavior.

The assertions align with current deploy semantics, including ensuring no backup artifact is created when settings updates are unavailable.

356-376: Metrics tests give good confidence on bounds and ideal ranking.

Both perfect-match and randomized-range assertions provide useful sanity coverage for _compute_metrics.

tests/unit/memory/embedding/test_fine_tune.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/memory/embedding/test_fine_tune.py`:
- Around line 248-252: The test uses a nonstandard base model name; update the
kwargs dictionary (variable kwargs used to pass "training_data_path",
"base_model", "output_dir") and replace the "base_model" value "test-model" with
the repository-standard test model name such as "test-small-001" so the test
follows the vendor/model naming convention.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 247090f9-344c-4ce8-aceb-c3676a7a2497

📥 Commits

Reviewing files that changed from the base of the PR and between aac735b and 3edc2bd.

📒 Files selected for processing (1)

tests/unit/memory/embedding/test_fine_tune.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Dashboard Test
GitHub Check: Test (Python 3.14)
GitHub Check: Build Sandbox
GitHub Check: Build Web
GitHub Check: Build Backend
GitHub Check: Dependency Review
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ required; no from __future__ import annotations (PEP 649 native lazy annotations)
Use PEP 758 except syntax: except A, B: (no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time
Use @computed_field for derived values in Pydantic models instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Files:

tests/unit/memory/embedding/test_fine_tune.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Vendor names in test files must use test-provider, test-small-001, etc.
Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing: asyncio_mode = "auto" in conftest -- no manual @pytest.mark.asyncio needed
Default timeout: 30 seconds per test (global in pyproject.toml). Non-default overrides like timeout(60) are allowed but avoid adding per-file markers.
Prefer @pytest.mark.parametrize for testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given + @settings). Profiles in tests/conftest.py: ci (deterministic, 10 examples), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples). Controlled via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit @example(...) decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep(). For indefinite blocking, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

tests/unit/memory/embedding/test_fine_tune.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/memory/embedding/test_fine_tune.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-31T20:29:10.177Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:29:10.177Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-15T18:28:13.207Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases in Python tests

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Test markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Coverage: 80% minimum. Async: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Timeout: 30 seconds per test. Parallelism: `pytest-xdist` via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-20T21:44:04.528Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

🔇 Additional comments (8)

tests/unit/memory/embedding/test_fine_tune.py (8)

32-33: Nice stage enum coverage update.

Adding the explicit EVALUATING assertion closes the gap for the new lifecycle stage.

41-79: Helper tests are concrete and behavior-focused.

Good shift from placeholder expectations to observable behavior for chunking, query fallback, and document scanning rules.

100-169: Stage-1 tests cover key operational paths well.

The no-documents, output generation, progress callback, and cooperative cancellation checks are all present and practical.

189-207: Dependency-failure path is correctly exercised.

Mocking _import_sentence_transformers to raise FineTuneDependencyError gives a clean, deterministic guard for missing ML deps.

227-241: Great consolidation with parametrized validation cases.

This removes repetitive tests while preserving per-parameter error assertions.

Based on learnings: Applies to tests/**/*.py : Prefer pytest.mark.parametrize for testing similar cases.

261-296: Evaluation-stage negative-path tests look solid.

Blank checkpoint-path validation and empty validation-data handling are both now explicitly covered.

308-328: Deploy-stage edge cases are well covered.

Nonexistent checkpoint rejection and “no settings service” behavior (including no backup file side-effect) are tested clearly.

334-354: Metric tests provide good sanity guarantees.

Perfect-match and bounded-random cases together give useful confidence for NDCG/Recall outputs.

tests/unit/memory/embedding/test_fine_tune.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/memory/embedding/test_fine_tune.py`:
- Around line 341-343: Replace exact float equality checks in the test that
calls _compute_metrics by using pytest.approx: change "assert ndcg == 1.0" to
"assert ndcg == pytest.approx(1.0)" and "assert recall == 1.0" to "assert recall
== pytest.approx(1.0)"; if pytest is not already imported in
tests/unit/memory/embedding/test_fine_tune.py, add "import pytest" at the top so
the approx matcher is available.
- Around line 127-130: The current assertions use train_lines =
train_path.read_text().strip().split("\n") and val_lines =
val_path.read_text().strip().split("\n") which yields [""] for an empty file and
makes len(val_lines) > 0 a false positive; change the checks to filter out blank
lines (e.g., use .splitlines() then keep only lines where line.strip() != "")
and assert there is at least one non-blank line for both train_lines and
val_lines, or even better, attempt to json.loads the first non-blank line from
val_lines to ensure it is valid JSONL before asserting; update the variables
train_lines and val_lines accordingly and adjust the assertions to reflect the
filtered/parsed content.
- Around line 337-347: The tests import NumPy directly (see test functions like
test_random_embeddings_positive and the use of _compute_metrics that loads numpy
arrays), but NumPy is missing from the test optional dependencies; update the
test dependency list in pyproject.toml to include "numpy" in the test extras so
the imports in these tests succeed during CI and local runs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5faee6d9-a1f1-4f38-946c-716d2f910706

📥 Commits

Reviewing files that changed from the base of the PR and between 3edc2bd and f7b9c40.

📒 Files selected for processing (1)

tests/unit/memory/embedding/test_fine_tune.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Dashboard Test
GitHub Check: Test (Python 3.14)
GitHub Check: Build Web
GitHub Check: Build Backend
GitHub Check: Dependency Review
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ required; no from __future__ import annotations (PEP 649 native lazy annotations)
Use PEP 758 except syntax: except A, B: (no parentheses) -- ruff enforces this on Python 3.14
Type hints required on all public functions; mypy strict mode enforced
Docstrings in Google style, required on all public classes and functions (enforced by ruff D rules)
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Pydantic v2: use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time
Use @computed_field for derived values in Pydantic models instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields in Pydantic models -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced)
Functions must be < 50 lines, files < 800 lines
Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Files:

tests/unit/memory/embedding/test_fine_tune.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Vendor names in test files must use test-provider, test-small-001, etc.
Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Coverage minimum 80% enforced in CI
Async testing: asyncio_mode = "auto" in conftest -- no manual @pytest.mark.asyncio needed
Default timeout: 30 seconds per test (global in pyproject.toml). Non-default overrides like timeout(60) are allowed but avoid adding per-file markers.
Prefer @pytest.mark.parametrize for testing similar cases in Python tests
Property-based testing in Python: use Hypothesis (@given + @settings). Profiles in tests/conftest.py: ci (deterministic, 10 examples), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples). Controlled via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failure, fix the underlying bug and add an explicit @example(...) decorator to permanently cover the case in CI
Never skip, dismiss, or ignore flaky tests -- always fix them fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep(). For indefinite blocking, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

tests/unit/memory/embedding/test_fine_tune.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/memory/embedding/test_fine_tune.py

🧠 Learnings (18)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-15T18:28:13.207Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-31T20:29:10.177Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:29:10.177Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases in Python tests

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-20T21:44:04.528Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-16T07:22:28.134Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-31T21:07:37.470Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Test markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Coverage: 80% minimum. Async: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Timeout: 30 seconds per test. Parallelism: `pytest-xdist` via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-04-02T07:18:02.381Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to tests/**/*.py : Never skip, dismiss, or ignore flaky tests; fix them fully and fundamentally by mocking `time.monotonic()` and `asyncio.sleep()` for determinism or using `asyncio.Event().wait()` for indefinite blocking

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to tests/**/*.py : Vendor names in test files must use `test-provider`, `test-small-001`, etc.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-04-01T15:36:39.993Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to tests/**/*.py : Tests must use `test-provider`, `test-small-001`, etc. instead of real vendor names

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-04-01T15:36:39.993Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src,tests}/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small` as aliases

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-15T18:28:13.207Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Tests must use test-provider, test-small-001, etc. for vendor-agnostic test data.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-04-01T15:36:39.993Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src/synthorg/**/*.py,tests/**/*.py,web/src/**/*.{ts,tsx}} : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small`

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-15T18:28:13.207Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to {src,tests,web,cli,site}/**/*.{py,ts,tsx,go,astro} : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-04-04T15:08:44.009Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T15:08:44.009Z
Learning: Applies to src/synthorg/**/*.py : Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`. Vendor names allowed only in: Operations design page, `.claude/` files, third-party import paths, and `src/synthorg/providers/presets.py`

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

📚 Learning: 2026-03-26T15:18:16.848Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` for all identifier/name fields.

Applied to files:

tests/unit/memory/embedding/test_fine_tune.py

🔇 Additional comments (3)

tests/unit/memory/embedding/test_fine_tune.py (3)

32-33: Good enum contract coverage for the new EVALUATING stage.

This keeps the test aligned with the expanded pipeline lifecycle.

154-169: Cancellation-path test is solid.

Pre-cancelling CancellationToken and asserting FineTuneCancelledError gives good coverage for cooperative cancellation checks.

227-255: Nice use of parametrization for invalid hyperparameters.

This reduces duplication and keeps validation coverage concise and maintainable.

tests/unit/memory/embedding/test_fine_tune.py

coderabbitai · 2026-04-04T17:36:57Z

tests/unit/memory/embedding/test_fine_tune.py

+        import numpy as np
+
+        n = 5
+        embs = np.eye(n, dtype=np.float32)
+        ndcg, recall = _compute_metrics(embs, embs)
+        assert ndcg == 1.0
+        assert recall == 1.0
+
+    def test_random_embeddings_positive(self) -> None:
+        import numpy as np
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check dependency manifests for numpy/test extras declarations. fd -a -t f 'pyproject.toml|setup.py|setup.cfg|requirements.*txt' rg -n --iglob 'pyproject.toml' --iglob 'setup.py' --iglob 'setup.cfg' --iglob 'requirements*.txt' \ '\bnumpy\b|optional-dependencies|extras_require|\[project.optional-dependencies\]|test' # Confirm direct numpy imports in changed areas. rg -n -C2 --type=py 'import numpy as np|from numpy' \ tests/unit/memory/embedding/test_fine_tune.py \ src/synthorg/memory/embedding/fine_tune.py

Repository: Aureliolo/synthorg

Length of output: 3987

Add NumPy to test dependencies.

Tests in this file import NumPy directly (lines 337, 346), but NumPy is not listed in the test optional dependency group in pyproject.toml. Tests will fail at import time without it. Add numpy to the test dependencies in pyproject.toml:

test = [ "pytest==9.0.2", "pytest-asyncio==1.3.0", "pytest-cov==7.1.0", "pytest-mock==3.15.1", "pytest-timeout==2.4.0", "pytest-xdist==3.8.0", "numpy", # Add this ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/memory/embedding/test_fine_tune.py` around lines 337 - 347, The tests import NumPy directly (see test functions like test_random_embeddings_positive and the use of _compute_metrics that loads numpy arrays), but NumPy is missing from the test optional dependencies; update the test dependency list in pyproject.toml to include "numpy" in the test extras so the imports in these tests succeed during CI and local runs.

coderabbitai · 2026-04-04T17:36:57Z

tests/unit/memory/embedding/test_fine_tune.py

+        ndcg, recall = _compute_metrics(embs, embs)
+        assert ndcg == 1.0
+        assert recall == 1.0


⚠️ Potential issue | 🟡 Minor

Use tolerant float assertions for metrics.

Exact equality on floating-point results can be brittle; prefer pytest.approx.

♻️ Proposed fix

- assert ndcg == 1.0 - assert recall == 1.0 + assert ndcg == pytest.approx(1.0) + assert recall == pytest.approx(1.0)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/memory/embedding/test_fine_tune.py` around lines 341 - 343, Replace exact float equality checks in the test that calls _compute_metrics by using pytest.approx: change "assert ndcg == 1.0" to "assert ndcg == pytest.approx(1.0)" and "assert recall == 1.0" to "assert recall == pytest.approx(1.0)"; if pytest is not already imported in tests/unit/memory/embedding/test_fine_tune.py, add "import pytest" at the top so the approx matcher is available.

github-actions · 2026-04-04T17:43:49Z

Preview Deployment

URL: https://pr-1065.synthorg-pr-preview.pages.dev

Built from commit a514767

🤖 I have created a release *beep* *boop* --- ## [0.6.2](v0.6.1...v0.6.2) (2026-04-06) ### Features * add issue analyzer script for priority/scope management ([#1084](#1084)) ([1ccba27](1ccba27)) * config fixes and deferred improvements from PR [#1058](#1058) review ([#1067](#1067)) ([2cac2d3](2cac2d3)), closes [#1061](#1061) [#1060](#1060) * cumulative risk-unit action budgets ([#806](#806)) and automated reporting ([#245](#245)) ([#1063](#1063)) ([4689816](4689816)) * fine-tuning pipeline + CompositeBackend + workflow lifecycle ([#1065](#1065)) ([85b05bc](85b05bc)), closes [#1001](#1001) [#850](#850) [#1058](#1058) * memory consolidation upgrades (LLM Merge, Search-and-Ask, diversity penalty, distillation capture) ([#1071](#1071)) ([174e2be](174e2be)), closes [#704](#704) * migrate web dashboard from Radix UI to Base UI, activate CSP nonce, rebuild org chart page, and fix agent routing ([#1083](#1083)) ([ebc6921](ebc6921)) * v0.7.0 engine foundations -- structured failure diagnosis + auditable decisions ([#1072](#1072)) ([d341d37](d341d37)) * workflow templates and versioning with diff and rollback ([#1069](#1069)) ([7af94de](7af94de)), closes [#1006](#1006) [#1008](#1008) ### Documentation * unify REST API docs under /docs/openapi/ and patch sitemap ([#1073](#1073)) ([af19382](af19382)) ### Maintenance * bump hypothesis from 6.151.10 to 6.151.11 in the all group ([#1086](#1086)) ([3176318](3176318)) * bump nginxinc/nginx-unprivileged from `f99cc61` to `601c823` in /docker/web in the all group ([#1085](#1085)) ([5eb99ac](5eb99ac)) * bump the all group in /web with 3 updates ([#1087](#1087)) ([8deae44](8deae44)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 4, 2026 13:21

Aureliolo temporarily deployed to ci April 4, 2026 13:21 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Aureliolo April 4, 2026 13:21 View session

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 13:22 — with GitHub Actions Inactive

gemini-code-assist bot reviewed Apr 4, 2026

View reviewed changes

Copilot AI reviewed Apr 4, 2026

View reviewed changes

coderabbitai bot requested changes Apr 4, 2026

View reviewed changes

Aureliolo temporarily deployed to ci April 4, 2026 14:17 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 14:18 — with GitHub Actions Inactive

coderabbitai bot requested changes Apr 4, 2026

View reviewed changes

Aureliolo temporarily deployed to ci April 4, 2026 15:07 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 15:08 — with GitHub Actions Inactive

coderabbitai bot requested changes Apr 4, 2026

View reviewed changes

Aureliolo added 14 commits April 4, 2026 17:25

fix: delete endpoint status_code=200 for body response

abe5521

fix: address 75 PR review items from local agents, CodeRabbit, Copilo…

3209363

…t, and Gemini

fix: deploy_checkpoint returns backup JSON for checkpoint persistence

b46230c

fix: address 16 CodeRabbit round-2 findings

82d2020

style: ruff format after round-3 fixes

d0e1835

fix: use 'is not None' for SettingValue truthiness (mypy truthy-bool)

46746e2

Aureliolo temporarily deployed to ci April 4, 2026 16:33 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 16:34 — with GitHub Actions Inactive

coderabbitai bot requested changes Apr 4, 2026

View reviewed changes

fix: address 5 CodeRabbit round-6 findings

22831d8

Aureliolo temporarily deployed to ci April 4, 2026 16:56 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 16:57 — with GitHub Actions Inactive

coderabbitai bot reviewed Apr 4, 2026

View reviewed changes

fix: add param validation tests + normalize formatDuration

aac735b

- Tests for batch_size=0, learning_rate<0, temperature=0 rejection - formatDuration normalizes fractional seconds at top (59.6 -> 1m)

Aureliolo temporarily deployed to ci April 4, 2026 17:10 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 17:11 — with GitHub Actions Inactive

coderabbitai bot requested changes Apr 4, 2026

View reviewed changes

tests/unit/memory/embedding/test_fine_tune.py Outdated Show resolved Hide resolved

Aureliolo added 3 commits April 4, 2026 19:16

refactor: parametrize hyperparameter validation tests

6e36c7e

refactor: parametrize hyperparameter validation tests

cd6f4f9

refactor: parametrize hyperparameter validation tests

3edc2bd

Aureliolo temporarily deployed to ci April 4, 2026 17:24 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 17:25 — with GitHub Actions Inactive

coderabbitai bot requested changes Apr 4, 2026

View reviewed changes

tests/unit/memory/embedding/test_fine_tune.py Show resolved Hide resolved

fix: use vendor-agnostic test-small-001 model name in fine-tune tests

f7b9c40

Aureliolo temporarily deployed to ci April 4, 2026 17:32 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 17:33 — with GitHub Actions Inactive

coderabbitai bot requested changes Apr 4, 2026

View reviewed changes

fix: use pytest.approx for float metrics + filter blank JSONL lines

a514767

Aureliolo merged commit 85b05bc into main Apr 4, 2026
7 of 8 checks passed

Aureliolo deleted the feat/memory-fine-tuning-composite branch April 4, 2026 17:41

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 17:41 — with GitHub Actions Inactive

Aureliolo mentioned this pull request Apr 4, 2026

chore(main): release 0.6.2 #1066

Merged

Aureliolo temporarily deployed to ci April 4, 2026 17:42 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview April 4, 2026 17:43 — with GitHub Actions Inactive

		Five-stage offline pipeline for domain-specific embedding fine-tuning:

		1. Synthetic data generation (LLM-powered, no manual annotation)

		def __init__(self, db: aiosqlite.Connection) -> None:
		self._db = db

Conversation

Aureliolo commented Apr 4, 2026

Summary

Fine-tuning pipeline + dashboard (#1001)

CompositeBackend (#850)

Workflow execution lifecycle (#1058)

Test plan

Review coverage

Uh oh!

coderabbitai bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Suggested labels

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

coderabbitai bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

codecov bot commented Apr 4, 2026 •

edited

Loading