Skip to content

Feature Request: Warm-up / session reuse for embedded agents (active-memory) #67000

@TNTest

Description

@TNTest

Problem

runEmbeddedPiAgent performs a full cold start on every invocation. For active-memory, this means every user message triggers:

  1. Session lane enqueue (queue wait)
  2. Workspace resolution (resolveRunWorkspaceDir)
  3. Plugin loading (ensureRuntimePluginsLoaded)
  4. Model resolution (resolveModelAsync + resolveEffectiveRuntimeModel)
  5. Auth profile lookup (authStore, profileOrder)
  6. Hook execution (resolveHookModelSelection)
  7. Only then does the actual LLM call begin

Even when the target model responds in ~1.5s (e.g., GLM-4.5-Air), the total active-memory latency is 10-30s due to initialization overhead. This makes active-memory impractical for real-time conversations.

Evidence

  • Direct API call to GLM-4.5-Air: 1.5s
  • active-memory with GLM-4.5-Air: 10s+ timeout (never completes first LLM call)
  • active-memory with MiniMax-M2.7: 19-29s (completes but extremely slow)
  • No intermediate tool_call logs between start and timeout — initialization itself is the bottleneck

Proposed Solution

Add a warm-standby or session reuse mechanism for embedded agents:

Option A: Lazy session pool

  • After first successful run, keep the session warm for a configurable TTL (e.g., warmStandbyMs)
  • Subsequent calls reuse the initialized session, skipping steps 1-6
  • Only re-initialize on TTL expiry or config change

Option B: Pre-initialization on gateway start

  • Allow active-memory config to specify warmStart: true
  • On gateway startup (or plugin load), pre-initialize the embedded agent session
  • First user message benefits from warm session

Option C: Persistent embedded session

  • Keep the embedded agent session alive across multiple runEmbeddedPiAgent calls
  • Similar to how main session persists — the sub-agent session would maintain its workspace/plugin/model resolution state

Configuration Example

{
  "plugins": {
    "entries": {
      "active-memory": {
        "config": {
          "enabled": true,
          "warmStandbyMs": 300000,
          "preInitialize": true
        }
      }
    }
  }
}

Impact

This would reduce active-memory latency from 10-30s to ~2-5s (model response time + minimal overhead), making it practical for real-time use.

Environment

  • OpenClaw: 2026.4.14
  • Provider: trapi (Anthropic Messages API proxy)
  • Models tested: GLM-4.5-Air (1.5s direct, 10s+ embedded), MiniMax-M2.7 (19-29s embedded)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions