Skip to content

feat(cli): add --resume and -p flags to netclaw chat for scripted multi-turn sessions #611

@Aaronontheweb

Description

@Aaronontheweb

Summary

The netclaw -p (headless single-prompt) mode and netclaw chat --resume <id> (interactive TUI resume) are separate code paths that don't compose. This blocks multi-turn eval cases, KV cache benchmarking, compaction regression testing, and any scripted conversation that needs more than one turn.

What Changes

1. Move -p under chat as a flag

Current:

netclaw -p "hello"              # top-level shortcut, headless
netclaw chat                    # interactive TUI
netclaw chat --resume <id>      # resume, interactive TUI

Proposed:

netclaw chat -p "hello"                         # new session, headless
netclaw chat -p --resume <id> "follow-up"       # resume session, headless
netclaw chat --resume <id>                      # resume session, interactive TUI
netclaw chat                                    # new session, interactive TUI

Keep netclaw -p as a backward-compat alias that delegates to netclaw chat -p.

2. --resume with "ensure session" semantics

When --resume <id> specifies a session ID that doesn't exist, create a new session with that ID instead of failing. This gives callers deterministic session naming without needing to capture IDs from prior turns.

Semantics:

  • Session exists → resume it (append to existing conversation)
  • Session doesn't exist → create it with the given ID

This is idempotent, which makes scripting trivial:

# Eval runner: deterministic session names, no capture/parse between turns
netclaw chat -p --resume "eval/grounding-test-1" "hello"
netclaw chat -p --resume "eval/grounding-test-1" "what did I just say?"
netclaw chat -p --resume "eval/grounding-test-1" "what is your session id?"

3. Output format for headless resume

chat -p --resume should output the same format as current -p:

  • Default: plain text (assistant's response)
  • --json: structured JSON including sessionId, response text, tool calls, usage

The sessionId field in --json output is how callers discover the ID when they DON'T use --resume (auto-generated session). When they DO use --resume, it echoes back the ID they specified.

Motivation

Multi-turn evals

The eval suite (evals/run-evals.sh) is single-turn only — every case runs netclaw -p which creates a fresh session. This means we can't test:

  • Compaction behavior (needs 10+ turns to trigger)
  • Post-compaction grounding (does the agent remember context after compaction?)
  • KV cache performance (does turn N respond faster than turn 1?)
  • Conversation continuity (does the agent maintain coherence across turns?)
  • Session ID self-awareness (does the agent know its own session ID after compaction?)

All of these were real production failures during the compaction rework (PR #597, #598).

KV cache benchmarking

Session-sticky LLM routing (PR #610 / issue #609) pins same-session requests to the same GPU for KV cache reuse. Measuring the impact requires multi-turn conversations where turn 2+ should be measurably faster than turn 1. Single-turn evals can't observe this.

Scripted test scenarios

QA workflows, regression tests, and demo scripts all benefit from scripted multi-turn conversations without needing the interactive TUI.

Docker Smoke Test

The Smoke Sandbox CI check (scripts/docker/smoke-test.sh or equivalent) should gain a basic multi-turn validation:

# Turn 1: create named session
netclaw chat -p --resume "smoke/multi-turn" "hello"

# Turn 2: resume and verify continuity
RESPONSE=$(netclaw chat -p --resume "smoke/multi-turn" "what was my first message?")

# Assert the agent references "hello" in some form
echo "$RESPONSE" | grep -qi "hello"

This validates that --resume creates, resumes, and maintains conversation state through the daemon's persistence layer.

Acceptance Criteria

  • netclaw chat -p "prompt" works identically to current netclaw -p "prompt"
  • netclaw chat -p --resume <id> "prompt" sends a headless prompt to an existing or new session with the given ID
  • netclaw -p remains as a backward-compat alias
  • --resume with a non-existent ID creates the session with that ID (ensure semantics)
  • --json output includes sessionId field
  • Docker smoke test validates a 2-turn conversation via chat -p --resume
  • Existing -p tests continue to pass

Out of scope

Metadata

Metadata

Assignees

No one assigned

    Labels

    cleanupCode quality improvements and tech debt reductionsessionsLLM session actor, turn lifecycle, pipelinestuiTerminal UI (Termina) issues

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions