Problem
runEmbeddedPiAgent performs a full cold start on every invocation. For active-memory, this means every user message triggers:
- Session lane enqueue (queue wait)
- Workspace resolution (
resolveRunWorkspaceDir)
- Plugin loading (
ensureRuntimePluginsLoaded)
- Model resolution (
resolveModelAsync + resolveEffectiveRuntimeModel)
- Auth profile lookup (
authStore, profileOrder)
- Hook execution (
resolveHookModelSelection)
- Only then does the actual LLM call begin
Even when the target model responds in ~1.5s (e.g., GLM-4.5-Air), the total active-memory latency is 10-30s due to initialization overhead. This makes active-memory impractical for real-time conversations.
Evidence
- Direct API call to GLM-4.5-Air: 1.5s
- active-memory with GLM-4.5-Air: 10s+ timeout (never completes first LLM call)
- active-memory with MiniMax-M2.7: 19-29s (completes but extremely slow)
- No intermediate tool_call logs between start and timeout — initialization itself is the bottleneck
Proposed Solution
Add a warm-standby or session reuse mechanism for embedded agents:
Option A: Lazy session pool
- After first successful run, keep the session warm for a configurable TTL (e.g.,
warmStandbyMs)
- Subsequent calls reuse the initialized session, skipping steps 1-6
- Only re-initialize on TTL expiry or config change
Option B: Pre-initialization on gateway start
- Allow
active-memory config to specify warmStart: true
- On gateway startup (or plugin load), pre-initialize the embedded agent session
- First user message benefits from warm session
Option C: Persistent embedded session
- Keep the embedded agent session alive across multiple
runEmbeddedPiAgent calls
- Similar to how main session persists — the sub-agent session would maintain its workspace/plugin/model resolution state
Configuration Example
{
"plugins": {
"entries": {
"active-memory": {
"config": {
"enabled": true,
"warmStandbyMs": 300000,
"preInitialize": true
}
}
}
}
}
Impact
This would reduce active-memory latency from 10-30s to ~2-5s (model response time + minimal overhead), making it practical for real-time use.
Environment
- OpenClaw: 2026.4.14
- Provider: trapi (Anthropic Messages API proxy)
- Models tested: GLM-4.5-Air (1.5s direct, 10s+ embedded), MiniMax-M2.7 (19-29s embedded)
Problem
runEmbeddedPiAgentperforms a full cold start on every invocation. For active-memory, this means every user message triggers:resolveRunWorkspaceDir)ensureRuntimePluginsLoaded)resolveModelAsync+resolveEffectiveRuntimeModel)authStore,profileOrder)resolveHookModelSelection)Even when the target model responds in ~1.5s (e.g., GLM-4.5-Air), the total active-memory latency is 10-30s due to initialization overhead. This makes active-memory impractical for real-time conversations.
Evidence
Proposed Solution
Add a warm-standby or session reuse mechanism for embedded agents:
Option A: Lazy session pool
warmStandbyMs)Option B: Pre-initialization on gateway start
active-memoryconfig to specifywarmStart: trueOption C: Persistent embedded session
runEmbeddedPiAgentcallsConfiguration Example
{ "plugins": { "entries": { "active-memory": { "config": { "enabled": true, "warmStandbyMs": 300000, "preInitialize": true } } } } }Impact
This would reduce active-memory latency from 10-30s to ~2-5s (model response time + minimal overhead), making it practical for real-time use.
Environment