-
Notifications
You must be signed in to change notification settings - Fork 0
feat: implement fine-tuning pipeline stages + CLI command #1001
Description
Context
PR #999 shipped the fine-tuning pipeline wiring -- checkpoint lookup in the Mem0 adapter, admin API stubs, config models, and settings. The four pipeline stage functions (generate_training_data, mine_hard_negatives, contrastive_fine_tune, deploy_checkpoint) validate inputs but raise NotImplementedError.
This issue covers implementing the actual ML logic and the CLI command.
Requirements
1. Implement pipeline stages
Replace NotImplementedError stubs in src/synthorg/memory/embedding/fine_tune.py with actual implementations:
- Synthetic data generation -- LLM generates query-document pairs from org documents
- Hard negative mining -- base model embeds all passages, selects top-k confusing negatives
- Contrastive fine-tuning -- biencoder training with InfoNCE loss (tau=0.02, 3 epochs, lr=1e-5)
- Deploy -- save checkpoint, update config to point to fine-tuned model
Pipeline design is documented in docs/reference/embedding-evaluation.md.
2. CLI command
Add a memory fine-tune subcommand to the Go CLI (cli/) that:
- Calls
POST /admin/memory/fine-tuneto trigger the pipeline - Polls
GET /admin/memory/fine-tune/statusfor progress - Displays stage transitions and progress
3. Update admin API
Replace the hardcoded FAILED response in MemoryAdminController.start_fine_tune with actual pipeline orchestration (enqueue/start the pipeline, return tracking info).
Acceptance Criteria
- Pipeline stages implement actual ML logic (not stubs)
- Pipeline works on single GPU with no manual annotation
- CLI command runs the 4-stage pipeline end-to-end
-
POST /admin/memory/fine-tuneenqueues and tracks pipeline runs -
GET /admin/memory/fine-tune/statusreturns real progress - Tests cover pipeline execution (mock ML deps for unit tests)
-
synthorg[fine-tune]extra installs required ML dependencies
References
- PR feat: auto-select embedding model + fine-tuning pipeline wiring #999 -- pipeline wiring, config models, admin API stubs
- feat: embedding fine-tuning pipeline -- wire checkpoint lookup + CLI command #966 -- original issue (wiring portion delivered in PR feat: auto-select embedding model + fine-tuning pipeline wiring #999)
src/synthorg/memory/embedding/fine_tune.py-- stage stubssrc/synthorg/api/controllers/memory.py-- admin controllerdocs/reference/embedding-evaluation.md-- pipeline design- NVIDIA domain-specific fine-tuning blog