Skip to content

feat: implement fine-tuning pipeline stages + CLI command #1001

@Aureliolo

Description

@Aureliolo

Context

PR #999 shipped the fine-tuning pipeline wiring -- checkpoint lookup in the Mem0 adapter, admin API stubs, config models, and settings. The four pipeline stage functions (generate_training_data, mine_hard_negatives, contrastive_fine_tune, deploy_checkpoint) validate inputs but raise NotImplementedError.

This issue covers implementing the actual ML logic and the CLI command.

Requirements

1. Implement pipeline stages

Replace NotImplementedError stubs in src/synthorg/memory/embedding/fine_tune.py with actual implementations:

  1. Synthetic data generation -- LLM generates query-document pairs from org documents
  2. Hard negative mining -- base model embeds all passages, selects top-k confusing negatives
  3. Contrastive fine-tuning -- biencoder training with InfoNCE loss (tau=0.02, 3 epochs, lr=1e-5)
  4. Deploy -- save checkpoint, update config to point to fine-tuned model

Pipeline design is documented in docs/reference/embedding-evaluation.md.

2. CLI command

Add a memory fine-tune subcommand to the Go CLI (cli/) that:

  • Calls POST /admin/memory/fine-tune to trigger the pipeline
  • Polls GET /admin/memory/fine-tune/status for progress
  • Displays stage transitions and progress

3. Update admin API

Replace the hardcoded FAILED response in MemoryAdminController.start_fine_tune with actual pipeline orchestration (enqueue/start the pipeline, return tracking info).

Acceptance Criteria

  • Pipeline stages implement actual ML logic (not stubs)
  • Pipeline works on single GPU with no manual annotation
  • CLI command runs the 4-stage pipeline end-to-end
  • POST /admin/memory/fine-tune enqueues and tracks pipeline runs
  • GET /admin/memory/fine-tune/status returns real progress
  • Tests cover pipeline execution (mock ML deps for unit tests)
  • synthorg[fine-tune] extra installs required ML dependencies

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:mediumShould do, but not blockingscope:large3+ days of workspec:memoryDESIGN_SPEC Section 7 - Memory & Persistencetype:featureNew feature implementationv0.6Minor version v0.6v0.6.4Patch release v0.6.4

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions