Skip to content

MEMORY/RUNTIME: persistent agent workshop — total agent control over context #547

@Hmbown

Description

@Hmbown

Thesis

The agent should own its context end-to-end. Today RLM is paper-spec one-shot (Algorithm 1 from Zhang/Kraska/Khattab arXiv:2512.24601, implemented in crates/tui/src/rlm/mod.rs:5-13): tool invoked, sandboxed Python REPL is born and dies, returns one synthesized string. Per the Mismanaged Geniuses Hypothesis (Zhang 2026), the leverage isn't the model — it's the management substrate around it.

Promote the REPL from a per-call sandbox to a persistent workshop: lives across turns, accumulates state, and exposes context-control primitives that let the agent build its own context piece by piece. Combined with cache discipline (#528-533), offload-by-default (sibling issue), reasoning-capture (#544), and the memory cluster (#534-539), the agent gets total control over what's resident, what's in the workshop, what's archived — and can branch back through any of it as needed.

Current behavior

  • crates/tui/src/rlm/mod.rs:21 already runs a long-lived python3 -u subprocess via single stdin/stdout pipe — but only within a single tool call. Subprocess dies when rlm returns.
  • crates/tui/src/rlm/bridge.rs exposes the seam: RpcDispatcher handles Llm, LlmBatch, Rlm, RlmBatch request types. Adding new helpers = add an RPC variant + Python wrapper + dispatch handler.
  • crates/tui/src/session_manager.rs:174-177 already has save_checkpoint writing full conversation state to ~/.deepseek/sessions/checkpoints/latest.json. Pattern exists; just doesn't include REPL state.
  • crates/tui/src/cycle_manager.rs:12-16 archives prior cycles to JSONL so future tooling can search. Pattern for "transient → archived" already established.
  • Existing REPL helpers (per crates/tui/src/rlm/prompt.rs:18-26): context/ctx, llm_query, llm_query_batched, rlm_query, rlm_query_batched, SHOW_VARS, FINAL, FINAL_VAR, print.

Proposed change

Phase A — Persistent within session

  • Don't kill the python3 subprocess between turns. Keep RpcDispatcher alive for the session's lifetime.
  • New always-loaded tool: workshop_exec(code) — executes in the persistent REPL. Variables, helpers, loaded data persist across calls.
  • New tool: workshop_inspect() — returns the current variable namespace and types from outside the REPL.
  • The legacy rlm tool stays as a one-shot convenience wrapper for sub-agent dispatch.

Phase B — Standard orchestration helpers

Add RPC variants + Python wrappers for:

Phase C — Agent context-control primitives

The agent owns its context. New tools / REPL helpers:

  • promote_to_context(workshop_var | recall_hit | archive_id) — pull a workshop result, memory hit, or archived turn back into the parent's resident context.
  • evict_from_context(range) — drop a section that's no longer earning its place. No summarization — just drop.
  • branch_from(turn_id) — load a prior turn's reasoning trace + state into the workshop, work it from there.
  • pin(item) — mark resident; never evict on compaction.
  • archive(item) — move to cycle archive (#cycle_manager) without losing.

Phase D — Cross-session persistence

  • Extend SavedSession schema (session_manager.rs:102+) with a workshop_state field.
  • Serialize REPL globals (the serializable subset; non-serializable values are re-derived on resume from a recorded load_file / load_context log).
  • New tool: workshop_resume() — reloads globals + replays the load log on session restoration.
  • Workshop state checkpoints alongside conversation state — same save_checkpoint mechanism.

Phase E — Pattern capture to memory

Cache discipline integration

Workshop work happens outside the parent's context — only synthesis returns. This protects the cache prefix from haystack pollution. Combined with #528-533 (cache-maximal context defaults), the agent has both: rich cached resident context for active work, plus unlimited workshop scratch for everything else.

V4 specifics

  • Sub-LLM calls in the workshop should use V4 sampling (temperature=1.0, top_p=1.0) — see V4: sampling defaults + thinking-mode parameter wiring #540 for the bridge.rs:108-109 fix that this issue depends on for correct sub-call behavior.
  • Workshop default child model: deepseek-v4-flash (matches current tools/rlm.rs:24). Cheap, fast, RL'd hard on tool use.

Open questions / risks

Acceptance signals

  • Workshop loaded with a corpus on turn 1 is queryable on turn 5 without reloading.
  • Helper functions defined in turn 1 are callable in turn 5.
  • recall() from inside the workshop returns memory objects the model can .filter() / .map() over.
  • After a successful extraction pattern, the same task shape in a later session surfaces the prior pattern as a memory recall hit.
  • Measurable shift: fewer "load haystack into context" patterns, more workshop usage; cache hit rate stays high because the parent context isn't getting polluted.
  • The agent demonstrates context curation: pulls workshop results into context only when needed, evicts when done, branches from prior turns when revisiting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status
    In progress

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions