[Feature]: Scheduled Agent Memory Indexing for RAG-Optimized Context Retrieval

### Summary

Implement automated nightly indexing of agent memories to build a RAG (Retrieval-Augmented Generation) layer that enables efficient semantic search before LLM context injection, significantly reducing token consumption as agents accumulate user knowledge over time.

### Problem to solve

As agents learn more about users through extended interactions, their memory context grows substantially, leading to:

Token bloat: Full memory context must be loaded into every LLM call, exponentially increasing costs
Performance degradation: Large context windows slow response times
Inefficient retrieval: Current linear memory loading lacks semantic ranking, pulling irrelevant memories into context
Cost escalation: Token costs grow proportionally with memory size, making long-term agent usage increasingly expensive

For users with months of interaction history, a single query can consume thousands of tokens just loading memory context, even when only a small subset is relevant.

### Proposed solution

**Implement a scheduled memory indexing system that:**

**Off-Peak Indexing:**

Run memory embedding/indexing during configurable low-usage hours (default: 2-6 AM local time)
Process only new/modified memory entries since last indexing cycle
Generate vector embeddings for semantic search capabilities


**RAG-First Retrieval Pipeline:**

Before LLM context injection, query the indexed memory with user's current input
Retrieve only the top-K most semantically relevant memory chunks
Inject only filtered memories into LLM context, dramatically reducing token load


**Configuration Options:**

{
  agents: {
    defaults: {
      memory: {
        indexing: {
          enabled: true,
          schedule: "0 2 * * *",  // cron format: 2 AM daily
          provider: "openai",      // or local embeddings
          chunkSize: 512,
          retrievalTopK: 10        // max memories per query
        }
      }
    }
  }
}

Backward Compatibility:

4. Graceful fallback to current full-memory loading if indexing fails
Optional feature that doesn't break existing workflows

**Benefits**

- Massive token savings: Reduce memory-related token consumption by 70-90% for users with extensive histories
I- mproved relevance: Semantic search surfaces only contextually relevant memories
- Faster responses: Smaller context = faster LLM processing
- Scalability: Enables indefinite memory growth without linear cost increase
- Cost predictability: Token costs stabilize rather than scaling with agent tenure
- Resource efficiency: Off-peak processing minimizes impact on active usage hours

**Technical Considerations**
Leverage existing memorySearch configuration infrastructure (already supports embeddings)
Index storage in ~/.openclaw/agents/<agentId>/memory-index/
Incremental updates to avoid re-indexing entire memory history nightly
Monitoring via openclaw doctor to verify index health/freshness

### Alternatives considered

Real-time embedding on every query

❌ Adds latency to every user interaction
❌ Multiplies embedding API costs (embedding on every query vs. once nightly)
❌ Doesn't solve the core problem of growing token consumption



Manual memory pruning/archiving

❌ Requires constant user intervention
❌ Risk of losing valuable context permanently
❌ Doesn't scale for non-technical users



Sliding window approach (keep only recent N memories)

❌ Loses valuable long-term context about user preferences
❌ Arbitrary cutoff ignores semantic relevance
❌ "Old but relevant" memories get dropped incorrectly



Compression-based approaches

❌ Lossy compression degrades memory quality
❌ Still requires loading full compressed context into LLM
❌ Adds computational overhead without semantic filtering


Why RAG indexing is superior: Preserves all memories permanently while loading only relevant subsets, combining the benefits of comprehensive memory with efficient retrieval.

### Impact

### **Impact**

**For Users:**
- **Lower costs**: 70-90% reduction in memory-related token consumption for long-term users
- **Faster responses**: Smaller context windows = quicker LLM processing
- **Better quality**: More relevant memories surfaced instead of information overload
- **Future-proof**: Agents can accumulate unlimited memories without degrading performance

**For OpenClaw:**
- **Competitive advantage**: Enables sustainable long-term agent relationships that competitors can't match economically
- **Resource efficiency**: Reduced token usage = lower infrastructure costs at scale
- **User retention**: Users won't abandon agents due to escalating costs
- **Differentiator**: "Memory that scales" becomes a key product feature

**For Developers:**
- **Reuses existing infrastructure**: Leverages `memorySearch` provider system already in place
- **Modular implementation**: Can be developed incrementally without breaking changes
- **Clear metrics**: Easy to measure token savings and retrieval relevance

### Evidence/examples

**Real-world scenario:**
```
User with 6 months of daily OpenClaw usage:
- Total memories accumulated: ~1,200 entries
- Average memory size: 150 tokens
- Full context load: 180,000 tokens per query
- Anthropic API cost: ~$1.44 per query (Claude Sonnet input tokens)

With RAG indexing (top-10 retrieval):
- Indexed memories: 1,200 (one-time embedding cost)
- Context load per query: ~1,500 tokens (10 memories)
- API cost per query: ~$0.012
- Savings: 98.8% reduction in memory-related token costs

Existing patterns in the ecosystem:

LangChain, LlamaIndex, and other frameworks use RAG for exactly this purpose
Enterprise AI assistants (GitHub Copilot Workspace, Cursor) use vector stores for code context
Anthropic's own Claude Projects feature likely uses similar retrieval mechanisms internally

Similar OpenClaw features that prove feasibility:

Issue #4461 shows OpenClaw already has embedding provider infrastructure
Existing memorySearch configuration supports custom embedding endpoints
Cron system (#5452, Configuration docs) provides scheduling foundation

### Additional information

Implementation phases:

Phase 1 (MVP):

Basic nightly indexing with OpenAI embeddings
Simple top-K retrieval before context injection
Opt-in configuration flag


Phase 2 (Enhancement):

Local embedding models (no API dependency)
Hybrid search (vector + keyword)
Per-agent indexing schedules


Phase 3 (Advanced):

Real-time incremental indexing for high-activity agents
Memory clustering for topic-based retrieval
Automatic relevance tuning based on user feedback



Storage estimates:

1,000 memories → ~4MB vector index (1536-dim embeddings)
Negligible disk space impact compared to session logs

Fallback behavior:

If indexing service is unavailable → fall back to current full-memory loading
If index is stale (>7 days) → warn user via openclaw doctor
If retrieval fails → graceful degradation to recent memories only

Related issues/features:

Complements #9264 (Cross-Channel Context Sharing) - indexed memories could be shared across channels
Builds on existing memorySearch configuration framework
Aligns with OpenClaw's philosophy of "personal AI that learns and scales"

Open questions for maintainers:

Should local embedding models be bundled by default, or require separate installation?
Preferred vector store backend (FAISS, Chroma, custom SQLite-based)?
Should users be able to manually trigger re-indexing via openclaw memory reindex?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Scheduled Agent Memory Indexing for RAG-Optimized Context Retrieval #27848

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Impact

Evidence/examples

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Scheduled Agent Memory Indexing for RAG-Optimized Context Retrieval #27848

Description

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Impact

Evidence/examples

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions