Context
The budget system (CostRecord, CostTracker, BudgetEnforcer) tracks LLM completion API calls -- input/output tokens, cost per call, categorized as productive/coordination/system. However, embedding model calls are completely invisible to the budget system.
Embedding calls happen inside the Mem0 SDK on every store() (embed content before storing vector) and retrieve() (embed query for similarity search). For cloud embedding APIs, each call has a cost that currently goes untracked. For local models (Ollama), it's free compute but still worth tracking for observability.
This gap will become more significant as:
Requirements
1. Embedding cost tracking
- Add an
EMBEDDING category or separate cost type to distinguish embedding calls from LLM completion calls
- Instrument Mem0 SDK calls (or wrap them) to capture per-call metrics: model, token count (input only -- embeddings have no output tokens), cost
- Record embedding costs as
CostRecord entries (or a new EmbeddingCostRecord if the schema diverges too much)
2. Budget enforcement integration
- Include embedding costs in budget totals and per-agent spend
- Evaluate whether embedding calls should be gated by budget enforcement (currently only LLM completions are gated)
- Consider: embedding is on the critical path for memory store/retrieve -- budget-gating it could break memory entirely
3. Dashboard visibility
4. Fine-tuning cost tracking (#966 follow-up)
- Stage 1 (synthetic data generation) makes LLM API calls -- these should flow through the provider system and get tracked as
SYSTEM category costs
- Stage 3 (GPU training) is compute-only, not an API cost -- consider whether to track duration/resource usage separately
Design Considerations
- Mem0 SDK calls the embedding provider directly -- SynthOrg doesn't intercept these calls. Options:
- Wrap the Mem0 client with a proxy that intercepts embedding calls
- Use Mem0's callback/hook system if available
- Estimate costs from known model pricing + input text length
- For local models (Ollama), cost is zero but call count and latency are still useful metrics
CostRecord currently requires output_tokens >= 0 -- embedding calls have zero output tokens (just the vector), so this should work as-is
References
Context
The budget system (
CostRecord,CostTracker,BudgetEnforcer) tracks LLM completion API calls -- input/output tokens, cost per call, categorized as productive/coordination/system. However, embedding model calls are completely invisible to the budget system.Embedding calls happen inside the Mem0 SDK on every
store()(embed content before storing vector) andretrieve()(embed query for similarity search). For cloud embedding APIs, each call has a cost that currently goes untracked. For local models (Ollama), it's free compute but still worth tracking for observability.This gap will become more significant as:
Requirements
1. Embedding cost tracking
EMBEDDINGcategory or separate cost type to distinguish embedding calls from LLM completion callsCostRecordentries (or a newEmbeddingCostRecordif the schema diverges too much)2. Budget enforcement integration
3. Dashboard visibility
4. Fine-tuning cost tracking (#966 follow-up)
SYSTEMcategory costsDesign Considerations
CostRecordcurrently requiresoutput_tokens >= 0-- embedding calls have zero output tokens (just the vector), so this should work as-isReferences
src/synthorg/budget/cost_record.py-- CostRecord modelsrc/synthorg/budget/tracker.py-- CostTracker servicesrc/synthorg/budget/enforcer.py-- BudgetEnforcersrc/synthorg/memory/backends/mem0/adapter.py-- where Mem0 SDK calls happen