-
Notifications
You must be signed in to change notification settings - Fork 0
research: LMEB-guided embedding model selection + domain fine-tuning for org memory #695
Copy link
Copy link
Closed
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:medium1-3 days of work1-3 days of workspec:memoryDESIGN_SPEC Section 7 - Memory & PersistenceDESIGN_SPEC Section 7 - Memory & Persistencetype:researchEvaluate options, make tech decisionsEvaluate options, make tech decisionsv0.6Minor version v0.6Minor version v0.6v0.6.0Patch release v0.6.0Patch release v0.6.0
Description
Context
Two findings on embedding quality for agent memory:
- LMEB Benchmark -- 22 datasets, 193 tasks. MTEB performance does NOT generalize to memory retrieval (correlation ~-0.13). Episodic/dialogue/procedural taxonomy maps directly to SynthOrg's memory use cases.
- NVIDIA Domain-Specific Embedding Fine-Tune -- Automated pipeline (synthetic data gen, hard negative mining, contrastive fine-tuning). No manual annotation. Single GPU. +10-27% retrieval improvement.
Action Items
- Evaluate current embedding model against LMEB leaderboard (not MTEB)
- Select embedding model optimized for episodic + procedural memory retrieval patterns
- Design optional embedding fine-tuning as
OrgMemoryBackendinitialization hook - Pipeline: synthetic data from org documents -> hard negative mining -> fine-tune -> deploy
References
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:medium1-3 days of work1-3 days of workspec:memoryDESIGN_SPEC Section 7 - Memory & PersistenceDESIGN_SPEC Section 7 - Memory & Persistencetype:researchEvaluate options, make tech decisionsEvaluate options, make tech decisionsv0.6Minor version v0.6Minor version v0.6v0.6.0Patch release v0.6.0Patch release v0.6.0