feat: track embedding model costs in budget system

## Context

The budget system (`CostRecord`, `CostTracker`, `BudgetEnforcer`) tracks LLM completion API calls -- input/output tokens, cost per call, categorized as productive/coordination/system. However, **embedding model calls are completely invisible** to the budget system.

Embedding calls happen inside the Mem0 SDK on every `store()` (embed content before storing vector) and `retrieve()` (embed query for similarity search). For cloud embedding APIs, each call has a cost that currently goes untracked. For local models (Ollama), it's free compute but still worth tracking for observability.

This gap will become more significant as:
- #965 auto-selects embedding models (users may not realize which model is being called)
- #966 wires fine-tuning (Stage 1 synthetic data generation uses LLM calls that also go untracked)
- Memory usage scales with agent count and activity

## Requirements

### 1. Embedding cost tracking

- Add an `EMBEDDING` category or separate cost type to distinguish embedding calls from LLM completion calls
- Instrument Mem0 SDK calls (or wrap them) to capture per-call metrics: model, token count (input only -- embeddings have no output tokens), cost
- Record embedding costs as `CostRecord` entries (or a new `EmbeddingCostRecord` if the schema diverges too much)

### 2. Budget enforcement integration

- Include embedding costs in budget totals and per-agent spend
- Evaluate whether embedding calls should be gated by budget enforcement (currently only LLM completions are gated)
- Consider: embedding is on the critical path for memory store/retrieve -- budget-gating it could break memory entirely

### 3. Dashboard visibility

- Show embedding costs in the budget dashboard (separate line item or breakdown)
- Include in provider usage metrics (#894)

### 4. Fine-tuning cost tracking (#966 follow-up)

- Stage 1 (synthetic data generation) makes LLM API calls -- these should flow through the provider system and get tracked as `SYSTEM` category costs
- Stage 3 (GPU training) is compute-only, not an API cost -- consider whether to track duration/resource usage separately

## Design Considerations

- Mem0 SDK calls the embedding provider directly -- SynthOrg doesn't intercept these calls. Options:
  - Wrap the Mem0 client with a proxy that intercepts embedding calls
  - Use Mem0's callback/hook system if available
  - Estimate costs from known model pricing + input text length
- For local models (Ollama), cost is zero but call count and latency are still useful metrics
- `CostRecord` currently requires `output_tokens >= 0` -- embedding calls have zero output tokens (just the vector), so this should work as-is

## References

- `src/synthorg/budget/cost_record.py` -- CostRecord model
- `src/synthorg/budget/tracker.py` -- CostTracker service
- `src/synthorg/budget/enforcer.py` -- BudgetEnforcer
- `src/synthorg/memory/backends/mem0/adapter.py` -- where Mem0 SDK calls happen
- #965 -- auto-select embedding model (precursor)
- #966 -- fine-tuning pipeline (precursor)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: track embedding model costs in budget system #997

Context

Requirements

1. Embedding cost tracking

2. Budget enforcement integration

3. Dashboard visibility

4. Fine-tuning cost tracking (#966 follow-up)

Design Considerations

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: track embedding model costs in budget system #997

Description

Context

Requirements

1. Embedding cost tracking

2. Budget enforcement integration

3. Dashboard visibility

4. Fine-tuning cost tracking (#966 follow-up)

Design Considerations

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions