Skip to content

Commit a4cbc4e

Browse files
authored
feat: auto-select embedding model + fine-tuning pipeline wiring (#999)
## Summary Implements embedding model auto-selection from LMEB rankings (#965) and wires the fine-tuning checkpoint lookup into the Mem0 adapter (#966). ### #965: Auto-Select Embedding Model - **LMEB ranking data** (`memory/embedding/rankings.py`): 6 models with per-memory-type NDCG@10 scores, deployment tier classification, and output dimensions - **Model selector** (`memory/embedding/selector.py`): `select_embedding_model()` intersects available models with LMEB rankings (substring + case-insensitive match for Ollama tags); `infer_deployment_tier()` maps provider presets to GPU_FULL/GPU_CONSUMER/CPU - **Embedder config resolution** (`memory/embedding/resolve.py`): `resolve_embedder_config()` priority chain -- settings DB > YAML config override > auto-selection with tier-filtered fallback - **Config integration**: `EmbedderOverrideConfig` on `CompanyMemoryConfig` (model requires dims validator); 3 new ADVANCED-level settings in memory namespace; `memory` dict field on `CompanyTemplate` - **Setup wizard**: `auto_select_embedder()` wired into `complete_setup()` -- best-effort, does not block setup on failure ### #966: Fine-Tuning Pipeline Wiring - **Checkpoint lookup**: Removed `_reject_unimplemented_fine_tune` validator; added `_resolve_effective_model()` to `build_mem0_config_dict()` -- when `fine_tune.enabled=True`, uses checkpoint path as model identifier - **Pipeline stubs** (`memory/embedding/fine_tune.py`): 4-stage async functions (generate data, mine negatives, contrastive train, deploy) with input validation and `NotImplementedError` -- actual ML logic deferred to when `synthorg[fine-tune]` extra is installed - **Admin API** (`api/controllers/memory.py`): `MemoryAdminController` at `/admin/memory/` with `POST /fine-tune`, `GET /fine-tune/status`, `GET /embedder` endpoints (CEO/SYSTEM role guard) - **Optional deps**: `[project.optional-dependencies]` fine-tune group (torch, sentence-transformers) - **Docs updated**: Removed "not yet implemented" language from design spec and reference docs ## Test Plan - 109 new tests across 8 test files covering: LMEB data integrity, selector logic, resolution priority chain, fine-tune stage validation, checkpoint lookup (enabled/disabled/none), setup auto-selection, controller model validation - Full suite: 12337 passed, 0 failed - Pre-reviewed by 4 agents (code-reviewer, conventions-audit, test-analyzer, docs-silence-audit), 15 findings addressed ## Review Coverage | Agent | Findings | |-------|----------| | code-reviewer | 5 (2 critical, 3 major) | | conventions-audit | 4 (2 critical, 2 major) | | test-analyzer | 9 (3 critical, 3 major, 3 medium) | | docs-silence-audit | 6 (1 critical, 5 major) | All 15 valid findings implemented. Key fixes: event constants for log calls, MemoryError/RecursionError re-raise guards, explicit `is not None` merge logic, missing logger in selector, test coverage for controller/models/edge cases. Closes #965 Closes #966
1 parent 5cb232d commit a4cbc4e

34 files changed

Lines changed: 2651 additions & 58 deletions

.github/workflows/dependency-review.yml

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,12 @@ jobs:
9090
# only invoked during development and CI linting. GPL copyleft does
9191
# not apply to the project output. The action cannot distinguish
9292
# tool deps from runtime deps, so they need per-package exemptions.
93+
#
94+
# Fine-tune optional dep group (torch, sentence-transformers) and
95+
# transitive CUDA/NVIDIA deps. Optional -- only installed via
96+
# synthorg[fine-tune]. License metadata missing from PyPI for NVIDIA
97+
# CUDA packages (proprietary, freely redistributable). torch is
98+
# BSD-style. scikit-learn has compound BSD-3-Clause AND scancode tag.
9399
allow-dependencies-licenses: >-
94100
pkg:pypi/mem0ai@1.0.9,
95101
pkg:pypi/numpy@2.4.4,
@@ -113,5 +119,25 @@ jobs:
113119
pkg:golang/github.com/alfatraining/structtag@1.0.0,
114120
pkg:golang/github.com/fatih/structtag@1.2.0,
115121
pkg:npm/json-schema-typed@8.0.2,
116-
pkg:npm/victory-vendor@37.3.6
122+
pkg:npm/victory-vendor@37.3.6,
123+
pkg:pypi/scikit-learn@1.8.0,
124+
pkg:pypi/torch@2.11.0,
125+
pkg:pypi/cuda-bindings@13.2.0,
126+
pkg:pypi/cuda-pathfinder@1.5.0,
127+
pkg:pypi/cuda-toolkit@13.0.2,
128+
pkg:pypi/nvidia-cublas@13.1.0.3,
129+
pkg:pypi/nvidia-cuda-cupti@13.0.85,
130+
pkg:pypi/nvidia-cuda-nvrtc@13.0.88,
131+
pkg:pypi/nvidia-cuda-runtime@13.0.96,
132+
pkg:pypi/nvidia-cudnn-cu13@9.19.0.56,
133+
pkg:pypi/nvidia-cufft@12.0.0.61,
134+
pkg:pypi/nvidia-cufile@1.15.1.6,
135+
pkg:pypi/nvidia-curand@10.4.0.35,
136+
pkg:pypi/nvidia-cusolver@12.0.4.66,
137+
pkg:pypi/nvidia-cusparse@12.6.3.3,
138+
pkg:pypi/nvidia-cusparselt-cu13@0.8.0,
139+
pkg:pypi/nvidia-nccl-cu13@2.28.9,
140+
pkg:pypi/nvidia-nvjitlink@13.0.88,
141+
pkg:pypi/nvidia-nvshmem-cu13@3.4.5,
142+
pkg:pypi/nvidia-nvtx@13.0.85
117143
comment-summary-in-pr: always

CLAUDE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ curl http://localhost:3000/api/v1/health # backend (via web proxy)
8888

8989
```text
9090
src/synthorg/
91-
api/ # Litestar REST + WebSocket API, RFC 9457 errors, setup wizard, personality presets, auth/, guards (role-based access control), user management, auto-wiring, lifecycle, bootstrap (agent registry init from config), template packs (list + live-apply)
91+
api/ # Litestar REST + WebSocket API, RFC 9457 errors, setup wizard, personality presets, auth/, guards (role-based access control), user management, auto-wiring, lifecycle, bootstrap (agent registry init from config), template packs (list + live-apply), memory admin (fine-tuning pipeline, embedder queries)
9292
backup/ # Backup/restore orchestrator, scheduler, retention, handlers/
9393
budget/ # Cost tracking, budget enforcement, quota degradation (including synchronous peek for routing-time selector hints), CFO optimization, trend analysis, budget forecasting, configurable currency formatting
9494
cli/ # Python CLI module (superseded by top-level cli/ Go binary)
@@ -97,7 +97,7 @@ src/synthorg/
9797
core/ # Shared domain models, base classes, resilience config
9898
engine/ # Orchestration, execution loops, task engine, coordination, checkpoint recovery, approval/review gates, stagnation detection, context budget, compaction, hybrid loop, workspace/ (git worktree isolation, merge orchestration, semantic conflict detection), workflow/ (Kanban board, Agile sprints, WIP limits, sprint lifecycle, velocity tracking, ceremony scheduling, strategies/ (pluggable scheduling strategies), velocity_calculators/ (pluggable velocity calculators))
9999
hr/ # Hiring, firing, onboarding, agent registry, performance tracking, activity timeline, activity event types, cost event redaction, career history, promotion/demotion
100-
memory/ # Pluggable MemoryBackend, retrieval pipeline, org memory, consolidation
100+
memory/ # Pluggable MemoryBackend, retrieval pipeline, org memory, consolidation, embedding/ (LMEB-ranked model selection, embedder config resolution, fine-tuning pipeline)
101101
persistence/ # Pluggable PersistenceBackend, SQLite, settings + user + artifact + project + preset repositories, artifact content storage (pluggable ArtifactStorageBackend, filesystem impl)
102102
observability/ # Structured logging, correlation tracking, redaction, third-party logger taming, log shipping (syslog, HTTP), compressed archival, events/
103103
providers/ # LLM provider abstraction, presets, model auto-discovery, capabilities, runtime CRUD (management/), provider families, discovery SSRF allowlist, health tracking, active health probing, routing/ (strategy-based model routing, multi-provider resolution with ModelCandidateSelector protocol, QuotaAwareSelector, CheapestSelector)

docs/architecture/decisions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ All significant design and architecture decisions, organized by domain. Each ent
100100
| MTEB | General passage retrieval | MTEB performance does not transfer to memory retrieval (Pearson: -0.115). Optimizing for MTEB may actively harm memory retrieval quality |
101101
| Manual evaluation | Custom retrieval benchmarks | Too expensive to maintain. LMEB provides a standardized, reproducible alternative |
102102

103-
**Model selection:** Three deployment tiers recommended based on LMEB scores. See [Embedding Evaluation](../reference/embedding-evaluation.md) for the full analysis. Domain-specific fine-tuning (+10-27% improvement) documented as a planned configuration stub via `EmbeddingFineTuneConfig`; the Mem0 adapter does not yet consume this config at initialization.
103+
**Model selection:** Three deployment tiers recommended based on LMEB scores. See [Embedding Evaluation](../reference/embedding-evaluation.md) for the full analysis. Domain-specific fine-tuning (+10-27% improvement) configured via `EmbeddingFineTuneConfig`; when enabled, the Mem0 adapter uses the checkpoint path as the model identifier. The fine-tuning pipeline stages themselves raise `NotImplementedError` -- only the checkpoint lookup is wired (see #1001).
104104

105105
## Overarching Pattern
106106

docs/design/memory.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -359,12 +359,12 @@ Key findings:
359359
lr=1e-5). Single GPU, 1-2 hours for ~500 documents
360360
4. **Deploy** -- save checkpoint; update `Mem0EmbedderConfig` to point to fine-tuned model
361361

362-
**Integration design (planned):** fine-tuning is an offline pipeline, not a runtime
363-
operation. The optional `EmbeddingFineTuneConfig` (disabled by default) stores the
364-
checkpoint path. In a future implementation, backend initialization will check for a
365-
checkpoint and prefer the fine-tuned model when available, falling back to the base
366-
model with a logged warning. The config is currently defined but not wired into the
367-
Mem0 adapter initialization.
362+
**Integration design:** fine-tuning is an offline pipeline triggered via
363+
`POST /admin/memory/fine-tune` (see `MemoryAdminController`). The optional
364+
`EmbeddingFineTuneConfig` (disabled by default) stores the checkpoint path. When
365+
`enabled=True` and `checkpoint_path` is set, backend initialization uses the
366+
checkpoint path as the model identifier passed to the Mem0 SDK. The embedding
367+
provider must serve the fine-tuned model under this identifier.
368368

369369
```python
370370
class EmbeddingFineTuneConfig(BaseModel):
@@ -376,6 +376,10 @@ Key findings:
376376
training_data_dir: NotBlankStr | None = None
377377
```
378378

379+
When `enabled=True`, both `checkpoint_path` and `base_model` are required
380+
(enforced by model validation). Path traversal (`..`) and Windows-style
381+
paths are rejected to prevent container path escapes.
382+
379383
A future `FineTuningPipeline` protocol would formalize the four stages:
380384

381385
```python

docs/reference/embedding-evaluation.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -214,12 +214,13 @@ single GPU.
214214

215215
Fine-tuning is an **offline pipeline**, not a runtime operation. The `EmbeddingFineTuneConfig`
216216
(see [Memory Design Spec](../design/memory.md#embedding-model-selection))
217-
stores the configuration. Planned initialization behavior (not yet implemented in the Mem0 adapter):
217+
stores the configuration. Initialization behavior in the Mem0 adapter:
218218

219-
1. If `fine_tune.enabled` and checkpoint exists at `fine_tune.checkpoint_path`: use fine-tuned model
220-
2. If `fine_tune.enabled` but no checkpoint: log warning, use base model
221-
3. If `fine_tune.enabled` is `False` (default): use base model, no checkpoint check
219+
1. If `fine_tune.enabled` and `checkpoint_path` is set: the checkpoint path is used as the model
220+
identifier passed to the Mem0 SDK (the embedding provider must serve the fine-tuned model)
221+
2. If `fine_tune.enabled` is `False` (default): the base model is used, no checkpoint check
222222

223+
The pipeline is triggered via `POST /admin/memory/fine-tune` (see `MemoryAdminController`).
223224
This follows the project's pattern of disabled-by-default optional features
224225
(cf. `DualModeConfig` in consolidation).
225226

docs/roadmap/open-questions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Numbers are stable identifiers -- resolved questions are removed without renumbe
2525
| Cost explosion from agent loops | High | Budget hard stops, loop detection, max iterations per task. |
2626
| Agent quality degradation with cheap models | Medium | Quality gates, minimum model requirements per task type. |
2727
| Third-party library breaking changes | Medium | Pin versions, integration tests, abstraction layers. |
28-
| Memory retrieval quality | Medium | Mem0 selected as initial backend (see [Decision Log](../architecture/decisions.md)). LMEB evaluation ([arXiv:2603.12572](https://arxiv.org/abs/2603.12572)) shows MTEB scores do not predict memory retrieval quality (Spearman: -0.130). Embedding model selection should be guided by LMEB episodic + procedural scores. Optional domain fine-tuning (+10-27%) planned via an offline pipeline configured with `EmbeddingFineTuneConfig` (currently a stub; the Mem0 adapter does not yet use it). See [Embedding Evaluation](../reference/embedding-evaluation.md). |
28+
| Memory retrieval quality | Medium | Mem0 selected as initial backend (see [Decision Log](../architecture/decisions.md)). LMEB evaluation ([arXiv:2603.12572](https://arxiv.org/abs/2603.12572)) shows MTEB scores do not predict memory retrieval quality (Spearman: -0.130). Embedding model selection should be guided by LMEB episodic + procedural scores. Optional domain fine-tuning (+10-27%) via an offline pipeline configured with `EmbeddingFineTuneConfig`. Checkpoint lookup is wired into the Mem0 adapter; pipeline stages (data generation, hard negative mining, contrastive training) are not yet implemented (see #1001). See [Embedding Evaluation](../reference/embedding-evaluation.md). |
2929
| Agent personality inconsistency | Low | Strong system prompts, few-shot examples, personality tests. |
3030
| WebSocket scaling | Low | Start local, add Redis pub/sub when needed. |
3131

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ dependencies = [
3434
requires = ["hatchling==1.29.0"]
3535
build-backend = "hatchling.build"
3636

37+
[project.optional-dependencies]
38+
fine-tune = ["torch==2.11.0", "sentence-transformers==5.3.0"]
39+
3740
[tool.hatch.version]
3841
path = "src/synthorg/__init__.py"
3942

src/synthorg/api/controllers/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
from synthorg.api.controllers.departments import DepartmentController
1818
from synthorg.api.controllers.health import HealthController
1919
from synthorg.api.controllers.meetings import MeetingController
20+
from synthorg.api.controllers.memory import MemoryAdminController
2021
from synthorg.api.controllers.messages import MessageController
2122
from synthorg.api.controllers.personalities import (
2223
PersonalityPresetController,
@@ -57,6 +58,7 @@
5758
SetupPersonalityController,
5859
PersonalityPresetController,
5960
BackupController,
61+
MemoryAdminController,
6062
TemplatePackController,
6163
UserController,
6264
)
@@ -79,6 +81,7 @@
7981
"DepartmentController",
8082
"HealthController",
8183
"MeetingController",
84+
"MemoryAdminController",
8285
"MessageController",
8386
"PersonalityPresetController",
8487
"ProjectController",
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
"""Memory admin controller -- fine-tuning and embedder endpoints.
2+
3+
All endpoints require CEO or the internal SYSTEM role
4+
(used by the CLI for admin operations).
5+
"""
6+
7+
from litestar import Controller, get, post
8+
from litestar.datastructures import State # noqa: TC002
9+
from pydantic import BaseModel, ConfigDict, Field
10+
11+
from synthorg.api.dto import ApiResponse
12+
from synthorg.api.guards import HumanRole, require_roles
13+
from synthorg.api.state import AppState # noqa: TC001
14+
from synthorg.core.types import NotBlankStr # noqa: TC001
15+
from synthorg.memory.embedding.fine_tune import FineTuneStage
16+
from synthorg.memory.embedding.fine_tune_models import (
17+
FineTuneRequest,
18+
FineTuneStatus,
19+
)
20+
from synthorg.observability import get_logger
21+
from synthorg.observability.events.memory import (
22+
MEMORY_EMBEDDER_SETTINGS_READ_FAILED,
23+
MEMORY_FINE_TUNE_REQUESTED,
24+
)
25+
26+
logger = get_logger(__name__)
27+
28+
29+
class ActiveEmbedderResponse(BaseModel):
30+
"""Active embedder configuration read from settings."""
31+
32+
model_config = ConfigDict(frozen=True, allow_inf_nan=False)
33+
34+
provider: NotBlankStr | None = Field(
35+
default=None,
36+
description="Embedding provider name",
37+
)
38+
model: NotBlankStr | None = Field(
39+
default=None,
40+
description="Embedding model identifier",
41+
)
42+
dims: int | None = Field(
43+
default=None,
44+
ge=1,
45+
description="Embedding vector dimensions",
46+
)
47+
48+
49+
class MemoryAdminController(Controller):
50+
"""Admin endpoints for memory management.
51+
52+
Provides fine-tuning pipeline control and embedder configuration
53+
queries. All endpoints require CEO or SYSTEM role.
54+
"""
55+
56+
path = "/admin/memory"
57+
tags = ("admin", "memory")
58+
guards = [require_roles(HumanRole.CEO, HumanRole.SYSTEM)] # noqa: RUF012
59+
60+
@post("/fine-tune")
61+
async def start_fine_tune(
62+
self,
63+
state: State, # noqa: ARG002
64+
data: FineTuneRequest,
65+
) -> ApiResponse[FineTuneStatus]:
66+
"""Trigger a fine-tuning pipeline run.
67+
68+
Args:
69+
state: Application state.
70+
data: Fine-tuning request parameters.
71+
72+
Returns:
73+
Current pipeline status.
74+
"""
75+
logger.info(
76+
MEMORY_FINE_TUNE_REQUESTED,
77+
source_dir=data.source_dir,
78+
base_model=data.base_model,
79+
)
80+
# Pipeline stages are not yet implemented -- return status
81+
# indicating the pipeline is idle with a descriptive error.
82+
# See issue #1001 for the implementation roadmap.
83+
return ApiResponse(
84+
data=FineTuneStatus(
85+
stage=FineTuneStage.FAILED,
86+
error=(
87+
"Fine-tuning pipeline stages are not yet "
88+
"implemented. Install synthorg[fine-tune] "
89+
"and check back in a future release."
90+
),
91+
),
92+
)
93+
94+
@get("/fine-tune/status")
95+
async def get_fine_tune_status(
96+
self,
97+
state: State, # noqa: ARG002
98+
) -> ApiResponse[FineTuneStatus]:
99+
"""Get the current fine-tuning pipeline status.
100+
101+
Args:
102+
state: Application state.
103+
104+
Returns:
105+
Current pipeline status.
106+
"""
107+
return ApiResponse(
108+
data=FineTuneStatus(stage=FineTuneStage.IDLE),
109+
)
110+
111+
@get("/embedder")
112+
async def get_active_embedder(
113+
self,
114+
state: State,
115+
) -> ApiResponse[ActiveEmbedderResponse]:
116+
"""Get the active embedder configuration.
117+
118+
Args:
119+
state: Application state.
120+
121+
Returns:
122+
Active embedder provider, model, and dims.
123+
"""
124+
app_state: AppState = state.app_state
125+
result = ActiveEmbedderResponse()
126+
if app_state.has_settings_service:
127+
svc = app_state.settings_service
128+
try:
129+
provider_sv = await svc.get("memory", "embedder_provider")
130+
model_sv = await svc.get("memory", "embedder_model")
131+
dims_sv = await svc.get("memory", "embedder_dims")
132+
dims_value: int | None = None
133+
if dims_sv.value:
134+
try:
135+
dims_value = int(dims_sv.value)
136+
except ValueError, TypeError:
137+
logger.warning(
138+
MEMORY_EMBEDDER_SETTINGS_READ_FAILED,
139+
setting="embedder_dims",
140+
value=dims_sv.value,
141+
reason="invalid integer value",
142+
)
143+
result = ActiveEmbedderResponse(
144+
provider=provider_sv.value or None,
145+
model=model_sv.value or None,
146+
dims=dims_value,
147+
)
148+
except MemoryError, RecursionError:
149+
raise
150+
except Exception:
151+
logger.warning(
152+
MEMORY_EMBEDDER_SETTINGS_READ_FAILED,
153+
exc_info=True,
154+
)
155+
return ApiResponse(data=result)

src/synthorg/api/controllers/setup.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@
2727
from synthorg.api.controllers.setup_helpers import (
2828
auto_create_template_agents as _auto_create_template_agents,
2929
)
30+
from synthorg.api.controllers.setup_helpers import (
31+
auto_select_embedder,
32+
)
3033
from synthorg.api.controllers.setup_helpers import (
3134
check_has_agents as _check_has_agents,
3235
)
@@ -45,6 +48,9 @@
4548
from synthorg.api.controllers.setup_helpers import (
4649
check_setup_not_complete as _check_setup_not_complete,
4750
)
51+
from synthorg.api.controllers.setup_helpers import (
52+
collect_model_ids as _collect_model_ids,
53+
)
4854
from synthorg.api.controllers.setup_helpers import (
4955
persist_company_settings as _persist_company_settings,
5056
)
@@ -96,6 +102,7 @@
96102
SETUP_AGENTS_AUTO_CREATED,
97103
SETUP_AGENTS_LISTED,
98104
SETUP_COMPANY_CREATED,
105+
SETUP_COMPLETE_CHECK_ERROR,
99106
SETUP_COMPLETED,
100107
SETUP_NAME_LOCALES_LISTED,
101108
SETUP_NAME_LOCALES_SAVED,
@@ -745,6 +752,25 @@ async def complete_setup(
745752
logger.warning(SETUP_NO_PROVIDERS)
746753
raise ApiValidationError(msg)
747754

755+
# Auto-select embedding model from configured providers.
756+
# Best-effort: does not block setup completion on failure.
757+
# TODO(#1001): forward provider_preset_name and has_gpu from
758+
# the setup context so tier inference uses real hardware info.
759+
try:
760+
model_ids = await _collect_model_ids(app_state)
761+
await auto_select_embedder(
762+
settings_svc=settings_svc,
763+
available_model_ids=model_ids,
764+
)
765+
except MemoryError, RecursionError:
766+
raise
767+
except Exception:
768+
logger.warning(
769+
SETUP_COMPLETE_CHECK_ERROR,
770+
check="auto_select_embedder",
771+
exc_info=True,
772+
)
773+
748774
await settings_svc.set("api", "setup_complete", "true")
749775

750776
logger.info(SETUP_COMPLETED)

0 commit comments

Comments
 (0)