Skip to content

[Feat] Meta operations MCP server for agent-driven CAO session management #161

@patricka3125

Description

@patricka3125

Problem Statement

All CAO management operations — discovering agent profiles, installing them, launching sessions, monitoring state, and cleanup — require either the cao CLI or direct HTTP API calls. Users working inside an AI agent session must leave the agent interface to perform these operations, breaking workflow continuity and preventing agents from autonomously composing multi-step CAO workflows.

The existing cao-mcp-server does not address this — its tools (handoff, assign, send_message) are scoped to inter-agent orchestration within active CAO sessions, not to managing CAO itself.

Motivation

Enable end-to-end agent-driven CAO workflows without leaving the agent interface:

  • Profile discovery and recommendation — an agent discovers available profiles and recommends one based on the user's task, without the user running CLI commands in another terminal
  • AI-assisted profile creation and evaluation — an agent creates a profile, launches a session with a test prompt, evaluates the result, and iterates
  • Session lifecycle management — an agent lists active sessions, identifies stale ones, and offers cleanup
  • Environment configuration — an agent sets required environment variables before installing and launching profiles

Example Use Cases

Automatic agent profile discovery and session launch. A user asks their agent to "set up an agent to review my PR." The agent calls discover_profiles, which returns a list of all installed profiles with their name, description, role, and provider metadata into the agent's context. The agent examines the list, identifies the code_supervisor profile as the appropriate entry point — since all CAO sessions are launched through a supervisor that coordinates worker agents — and either recommends it to the user for confirmation or immediately calls launch_session with the supervisor profile and a prompt derived from the user's original request (e.g., "review PR #42 on the main branch") — all within the same conversation, no terminal switching required. The supervisor then handles spawning and coordinating the appropriate worker agents (such as a reviewer) within the CAO session.

AI-assisted profile creation and evaluation. A user asks for a "security auditor agent." The agent's tool call chain would look like:

  1. discover_profiles — check if a suitable profile already exists. None found.
  2. The agent authors a new profile with an appropriate name, description, role, system prompt, and tool restrictions based on its understanding of what a security auditor needs.
  3. install_profile — install the newly created profile so it's available for use.
  4. launch_session with the new profile and a test prompt (e.g., "audit the authentication module in src/auth/ for common vulnerabilities") — a session is created, the provider initializes, and the test task is sent automatically.
  5. get_session_info — the agent checks back on the session's terminal status. Once completed, it can review the results.
  6. If the output is unsatisfactory, the agent iterates: updates the profile's system prompt or tool restrictions via create_profile, re-installs, and launches another test session — a fully autonomous create-install-launch-evaluate loop.

The user's involvement is limited to the initial request and approving the final profile. The agent handles the entire iteration cycle.

Session management and cleanup. A user starts their day and asks "what CAO sessions are running?" The agent calls list_sessions, which returns all active sessions with their terminal counts, statuses, providers, and last activity timestamps. The agent presents a summary — e.g., "You have 3 sessions: one active from today, two idle since yesterday." The user says "clean up the old ones." The agent calls get_session_info on each to confirm they're safe to remove (no in-progress terminals), then calls shutdown_session on the stale sessions and confirms the cleanup. What would normally require tmux ls, cross-referencing terminal IDs, and manual cao shutdown commands becomes a brief conversational exchange.

Proposed Solution

Introduce a new MCP server (cao-ops-mcp) separate from the existing cao-mcp-server, designed to be added to a user's primary agent's MCP configuration. It would expose tools for:

  • Profile management — discover, inspect, install, and create agent profiles
  • Session lifecycle — launch (with optional initial prompt), list, inspect, and shut down sessions
  • Provider and config — list available providers, manage environment variables

The launch_session tool would accept an optional prompt parameter so that a session can be created and started with a task in a single tool call — no terminal switching required.

This server is separate in both scope and implementation from cao-mcp-server. The existing server serves agents inside CAO sessions; the new server serves the user's primary agent managing CAO from outside.

Alternatives

  1. CLI usage skill — A skill file that teaches agents how to invoke cao CLI commands via bash. This is a lightweight alternative for users who prefer not to add an MCP server. It covers command reference, common workflows, and output parsing guidance. Requires the agent to have execute_bash permission. No new code in CAO — just a reference document.

  2. Meta operations MCP server — The full-capability path described above. Provides structured, typed tool responses rather than raw CLI output. Does not require bash access. Enables autonomous agent-driven workflows that are impractical through CLI invocation alone.

  3. Web UI — The existing web interface provides a visual control plane for users to manage sessions, profiles, and configuration. The meta MCP server is complementary — the Web UI serves as the user control plane while cao-ops-mcp serves as the agentic control plane. Both interact with the same CAO API backend and can coexist; the choice depends on whether a human or an agent is driving the workflow.

The skill serves as a low-setup on-ramp; the MCP server is the full-capability path; the Web UI is the visual management layer. All three can coexist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions