Feature Request: Warm-up / session reuse for embedded agents (active-memory)

## Problem

`runEmbeddedPiAgent` performs a full cold start on every invocation. For active-memory, this means every user message triggers:

1. Session lane enqueue (queue wait)
2. Workspace resolution (`resolveRunWorkspaceDir`)
3. Plugin loading (`ensureRuntimePluginsLoaded`)
4. Model resolution (`resolveModelAsync` + `resolveEffectiveRuntimeModel`)
5. Auth profile lookup (`authStore`, `profileOrder`)
6. Hook execution (`resolveHookModelSelection`)
7. **Only then** does the actual LLM call begin

Even when the target model responds in ~1.5s (e.g., GLM-4.5-Air), the total active-memory latency is 10-30s due to initialization overhead. This makes active-memory impractical for real-time conversations.

## Evidence

- Direct API call to GLM-4.5-Air: **1.5s**
- active-memory with GLM-4.5-Air: **10s+ timeout** (never completes first LLM call)
- active-memory with MiniMax-M2.7: **19-29s** (completes but extremely slow)
- No intermediate tool_call logs between start and timeout — initialization itself is the bottleneck

## Proposed Solution

Add a **warm-standby or session reuse** mechanism for embedded agents:

### Option A: Lazy session pool
- After first successful run, keep the session warm for a configurable TTL (e.g., `warmStandbyMs`)
- Subsequent calls reuse the initialized session, skipping steps 1-6
- Only re-initialize on TTL expiry or config change

### Option B: Pre-initialization on gateway start
- Allow `active-memory` config to specify `warmStart: true`
- On gateway startup (or plugin load), pre-initialize the embedded agent session
- First user message benefits from warm session

### Option C: Persistent embedded session
- Keep the embedded agent session alive across multiple `runEmbeddedPiAgent` calls
- Similar to how main session persists — the sub-agent session would maintain its workspace/plugin/model resolution state

## Configuration Example

```json
{
  "plugins": {
    "entries": {
      "active-memory": {
        "config": {
          "enabled": true,
          "warmStandbyMs": 300000,
          "preInitialize": true
        }
      }
    }
  }
}
```

## Impact

This would reduce active-memory latency from 10-30s to ~2-5s (model response time + minimal overhead), making it practical for real-time use.

## Environment

- OpenClaw: 2026.4.14
- Provider: trapi (Anthropic Messages API proxy)
- Models tested: GLM-4.5-Air (1.5s direct, 10s+ embedded), MiniMax-M2.7 (19-29s embedded)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Warm-up / session reuse for embedded agents (active-memory) #67000

Problem

Evidence

Proposed Solution

Option A: Lazy session pool

Option B: Pre-initialization on gateway start

Option C: Persistent embedded session

Configuration Example

Impact

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Warm-up / session reuse for embedded agents (active-memory) #67000

Description

Problem

Evidence

Proposed Solution

Option A: Lazy session pool

Option B: Pre-initialization on gateway start

Option C: Persistent embedded session

Configuration Example

Impact

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions