PRD: AgentsKit v3 — Full Ecosystem Vision

## Problem Statement

AgentsKit today is a **chat streaming layer** — it handles sending messages to LLMs and rendering responses in React, Ink, and CLI. But building real AI agents requires much more: autonomous execution loops, tool ecosystems, persistent memory, RAG, sandboxed code execution, observability, and evaluation. Developers currently have to stitch together dozens of unrelated libraries to build production agents, with no unified contracts or interoperability guarantees.

The ecosystem needs to evolve from "chat UI kit" to "the complete foundation for the agent era in JavaScript" — while keeping its core principle of being lightweight, plug-and-play, and easy to integrate.

## Solution

Expand AgentsKit from 6 packages to 14, organized in layers:

- **Foundation**: `core` (contracts + primitives), `adapters` (LLM providers)
- **UI**: `react` (browser), `ink` (terminal), `cli` (command-line)
- **Agent Runtime**: `runtime` (autonomous execution), `tools` (executable actions), `skills` (behavioral prompts)
- **Data**: `memory` (persistence + vector), `rag` (retrieval-augmented generation)
- **Infrastructure**: `sandbox` (secure code execution), `observability` (logging/tracing), `eval` (benchmarking)
- **Compatibility**: `compat-react-core` (legacy bridge)

Every package follows three vital rules:
1. `@agentskit/core` stays extremely lightweight — zero external dependencies, under 10 KB gzipped
2. Every package is plug-and-play — simple imports, minimal configuration, clear contracts
3. Any combination of packages works together seamlessly

## User Stories

1. As a frontend developer, I want to add AI chat to my React app with `useChat` and pre-built components, so that I can ship a streaming chat UI in under 10 lines of code
2. As a backend developer, I want to run autonomous agents without any UI, so that I can automate tasks like research, code generation, and data analysis
3. As an agent builder, I want a ReAct loop with tool execution, so that my agent can observe, think, and act iteratively until a task is complete
4. As a developer, I want to swap LLM providers in one line (OpenAI → Anthropic → Gemini), so that I'm not locked into any single vendor
5. As a developer, I want to use pre-built tools (web search, filesystem, browser, code execution), so that I don't have to implement common integrations from scratch
6. As a developer, I want to create custom tools with a simple contract (name, description, schema, execute), so that I can extend agent capabilities for my domain
7. As a developer, I want tools to support JSON Schema for their parameters, so that LLMs can reliably generate valid function calls
8. As a developer, I want tools with optional `init()`/`dispose()` lifecycle methods, so that stateful tools (browser sessions, DB connections) are properly managed
9. As a developer, I want tools that can stream incremental output via `AsyncIterable`, so that long-running tools (web scraping, code execution) show progress
10. As a developer, I want to use pre-built skills (researcher, coder, planner, critic), so that I can give agents specialized behavior without writing complex prompts
11. As a developer, I want skills to reference other skills via `delegates`, so that a planner skill can delegate subtasks to researcher and coder skills
12. As a developer, I want skills with `onActivate` hooks, so that activating a skill can auto-register relevant tools
13. As a developer, I want persistent conversation memory across sessions, so that my chat app remembers previous conversations
14. As a developer, I want vector memory with semantic search, so that my agent can retrieve relevant context from large knowledge bases
15. As a developer, I want multiple memory backends (in-memory, localStorage, file, SQLite, Redis, LanceDB), so that I can choose the right storage for my use case
16. As a developer, I want plug-and-play RAG with document chunking, embedding, and retrieval, so that I can add knowledge retrieval in a few lines
17. As a developer, I want to bring my own embedding function to RAG, so that I'm not locked into any specific embedding provider
18. As a developer, I want adapter-provided embedders (openaiEmbedder, geminiEmbedder) for convenience, so that setup is a one-liner
19. As a developer, I want secure sandboxed code execution for agent-generated code, so that untrusted code can't harm my system
20. As a developer, I want sandbox with configurable security (network, timeout, memory limits), so that I can tune restrictions per use case
21. As a developer, I want the sandbox exposed as a standalone primitive, so that my custom tools can also execute code securely
22. As a developer, I want observability that captures LLM calls, tool executions, memory operations, and agent steps, so that I can debug and monitor my agents
23. As a developer, I want to use my own custom logger/observer, so that I can integrate with whatever logging system my team uses (Datadog, Sentry, etc.)
24. As a developer, I want built-in observers for LangSmith, OpenTelemetry, and console, so that common integrations work out of the box
25. As a developer, I want observability to be non-blocking and optional, so that it never slows down my agent or requires installation
26. As a developer, I want to evaluate my agents with metrics (accuracy, latency, cost, token usage), so that I can measure and improve performance
27. As a developer, I want multi-agent orchestration with directed delegation, so that a parent agent can assign subtasks to specialized child agents
28. As a developer, I want a shared context for collaborating agents, so that agents in a multi-agent setup can share information
29. As a CLI user, I want `agentskit chat` to launch an interactive terminal chat, so that I can quickly test different providers and models
30. As a CLI user, I want `agentskit init` to scaffold a new project with React or Ink templates, so that I can start building immediately
31. As a CLI user, I want `agentskit run` to execute runtime agents from the terminal, so that I can run automation tasks without writing an app
32. As a terminal developer, I want Ink components identical to the React ones, so that the developer experience is consistent across platforms
33. As an existing user, I want backward compatibility with `@agentskit-react/core` imports, so that my v2 code keeps working
34. As a library author, I want to build custom adapters with a clear contract, so that I can add support for new LLM providers
35. As a library author, I want to build custom memory backends with a clear contract, so that I can integrate with my existing data infrastructure
36. As a developer, I want all packages independently installable, so that I only pull in what I need

## Implementation Decisions

### Architecture: Separation of Chat and Agent Engines (Option B)

The system has two execution engines that share primitives but serve different purposes:

- **ChatController** (in core) — UI-oriented chat state machine. Manages messages, streaming, input state. Designed for request-response cycles driven by user interaction.
- **AgentRunner** (in runtime) — Autonomous execution engine. Manages ReAct loops, reflection, planning, multi-agent delegation. Designed for headless execution without UI state overhead.

Shared primitives extracted into core: `executeToolCall`, `consumeStream`, `buildMessage`. Both engines use these, eliminating duplication while keeping their execution models independent.

### Core Contracts (types only, zero runtime cost)

Core defines all contracts as TypeScript interfaces. New types added to core:

- **ToolDefinition** (evolved): `schema` becomes `JSONSchema7` type (enforced), adds optional `init()`/`dispose()` lifecycle, `execute` can return `MaybePromise<unknown> | AsyncIterable<unknown>`, adds optional `tags`/`category` for discovery
- **SkillDefinition** (new): `name`, `description`, `systemPrompt`, optional `examples`, `tools` (tool name hints), `delegates` (other skill names), `temperature`, `onActivate` hook
- **VectorMemory** (new): `store(docs)`, `search(query, options)`, `delete(ids)` — separate from ChatMemory. ChatMemory stays as conversation persistence, VectorMemory handles embeddings/semantic search
- **AgentEvent** (new): Union type for lifecycle events (`llm:start`, `llm:end`, `tool:start`, `tool:end`, `memory:load`, `agent:step`, `error`, etc.)
- **Observer** (new): `{ name: string, on: (event: AgentEvent) => void | Promise<void> }` — fully extensible, anyone can implement custom observers
- **EvalSuite / EvalResult** (new): Minimal contract for evaluation

### Adapter Embedding Extension

`@agentskit/adapters` exports standalone embedder functions (`openaiEmbedder`, `geminiEmbedder`, etc.) that satisfy the `embed: (text: string) => MaybePromise<number[]>` contract used by RAG. These are separate from the chat adapters — no coupling between chat and embedding.

### Tool System

- Tools follow a strict contract with enforced JSON Schema for parameters
- Stateful tools use `init()`/`dispose()` lifecycle (e.g., browser tool opens/closes Puppeteer)
- Streaming tools return `AsyncIterable` for incremental output
- Auto-discovery via `tags`/`category` on the tool definition itself (no separate registry/manifest)
- Tools support parallel execution and human confirmation (`requiresConfirmation`)

### Skill Composition

- Multiple skills compose by concatenating `systemPrompt` values with clear delimiters
- `delegates` field enables multi-agent patterns: a planner skill references researcher and coder skills
- `onActivate` hook runs when a skill is activated, enabling auto-registration of related tools
- Skills are pure data + optional hooks — no runtime logic, no state

### Multi-Agent Orchestration

- Directed delegation as the default: parent agent explicitly assigns subtasks to child agents
- Optional shared context (`createSharedContext()`) for collaborative patterns (debate, consensus)
- Parent writes and reads shared context; children read shared context but only write to their result
- Tree-shaped execution: parent waits for child results before proceeding

### Sandbox Architecture

- Exposed as a standalone primitive (`sandbox.execute(code, options)`) — not just a tool wrapper
- Primary backend: E2B; fallback: WebContainer
- `@agentskit/tools` ships a `codeExecution` tool that uses the sandbox internally
- Security defaults: no network, 30s timeout, 50MB memory — all overridable
- Supports JavaScript and Python

### Observability: Event-Based, Fully Extensible

- Core emits `AgentEvent` from both ChatController and AgentRunner
- Observer interface is trivially simple: `{ name, on(event) }`
- Multiple observers run in parallel, asynchronously, non-blocking
- `@agentskit/observability` provides convenience implementations (LangSmith, OpenTelemetry, console)
- Anyone can implement custom observers for any logging system — the package is optional
- Zero coupling: if observability isn't installed, events are emitted but ignored

### RAG Pipeline

- `embed` function provided by the user (no default provider)
- `@agentskit/adapters` exports convenience embedders for common providers
- RAG handles chunking, embedding, retrieval, and context injection
- Default integration with `@agentskit/memory`'s VectorMemory for storage
- Simple API: `rag.retrieve(query)` and `useRAGChat()` hook for React

### Eval (Minimal for v3)

- Contract defined in core (`EvalSuite`, `EvalResult` types)
- One basic runner in `@agentskit/eval`: accuracy + latency metrics
- Designed for CI/CD: `eval.run(agent, testCases)` returns structured results
- Advanced metrics, test suites, and dashboard deferred to v3.1+

## Testing Decisions

Good tests verify external behavior through public interfaces — not implementation details. If the test would break from an internal refactor without any behavior change, it's testing the wrong thing.

### Critical (must ship with v3)

- **`core`**: Unit tests for ChatController state machine (message lifecycle, streaming, abort, retry), memory implementations (load/save/clear round-trips), shared primitives (tool execution, stream consumption, message building). These are the foundation — if they break, everything breaks.
- **`runtime`**: Unit tests for AgentRunner ReAct loop (observe→think→act cycle, tool result re-injection, termination conditions), delegation (parent→child task assignment, result collection), shared context (read/write isolation between parent and children).

### High Priority

- **`tools`**: Contract compliance tests (every tool satisfies ToolDefinition shape), unit tests per tool with mocked externals (browser tool doesn't launch real Puppeteer), lifecycle tests (init/dispose called correctly for stateful tools).
- **`memory`**: Integration tests per backend — SQLite, Redis, and LanceDB with test containers or in-process equivalents. Test ChatMemory and VectorMemory contracts against each backend.

### Medium Priority

- **`adapters`**: Contract tests (every adapter returns valid StreamSource shape, streams valid StreamChunks). Integration tests optional (require API keys, better suited for CI with secrets).
- **`rag`**: Unit tests for chunking logic and retrieval scoring. Integration test combining embedder + VectorMemory + retrieval pipeline.
- **`sandbox`**: Integration tests with E2B (requires API key) and WebContainer. Security boundary tests (verify network blocked, timeout enforced, memory limited).
- **`ink`**: Expand from smoke test to component render tests (message rendering, input handling, tool call display).

### Low Priority

- **`skills`**: Contract compliance (shape validation), prompt composition tests (multiple skills concatenate correctly).
- **`observability`**: Unit tests for event routing (events reach all registered observers, errors in one observer don't break others).
- **`eval`**: Contract tests only — verify EvalResult shape from basic runner.

### Maintain Current

- **`react`**: Already has thorough coverage (10+ tests). Maintain and extend as new features land.
- **`cli`**: Already decent. Add tests for new `agentskit run` command.

### Prior Art

Existing test patterns in the repo: `packages/react/tests/` for component and hook testing with mock adapters, `packages/cli/tests/init.test.ts` for CLI command testing, `packages/adapters/tests/` for contract shape tests.

## Out of Scope

- **Authentication/authorization middleware** — AgentsKit is a client-side/runtime toolkit, not a server framework
- **Server-side API routes or hosting** — use Vercel AI SDK, Express, or Next.js for that layer
- **Visual/GUI agent builder** — AgentsKit is code-first
- **Billing or usage metering** — belongs in the application layer
- **Training or fine-tuning** — use provider-specific tools
- **Advanced eval dashboard and CI integration** — deferred to v3.1+
- **Agent marketplace or registry service** — tools and skills are local packages, not a hosted registry
- **Mobile (React Native)** — React package targets browser DOM only

## Further Notes

### Phasing Recommendation

While this PRD covers the full v3 vision, implementation should be phased by dependency order:

1. **Phase 1 — Foundation**: Evolve `core` contracts (ToolDefinition, SkillDefinition, VectorMemory, AgentEvent, Observer types), extract shared primitives, add core tests
2. **Phase 2 — Runtime + Tools**: Build `runtime` (AgentRunner, ReAct loop, delegation), `tools` (contract + first batch of tools), `skills` (contract + starter skills)
3. **Phase 3 — Data Layer**: Build `memory` (SQLite, Redis, LanceDB backends), `rag` (chunking, embedding, retrieval pipeline)
4. **Phase 4 — Infrastructure**: Build `sandbox` (E2B + WebContainer), `observability` (event routing + built-in observers), `eval` (minimal runner)
5. **Phase 5 — Polish**: Update `cli` with `agentskit run`, update docs, update examples, ensure all packages interop cleanly

### Naming

All packages use the `@agentskit` scope. The GitHub repo is `EmersonBraun/agentskit`. The ecosystem is referred to as "AgentsKit" in prose.

### Versioning

All packages ship at `0.2.0` currently. The v3 work should bump to `0.3.0` per phase (or `1.0.0` when the full vision is stable). Changesets manage versioning across the monorepo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PRD: AgentsKit v3 — Full Ecosystem Vision #2

Problem Statement

Solution

User Stories

Implementation Decisions

Architecture: Separation of Chat and Agent Engines (Option B)

Core Contracts (types only, zero runtime cost)

Adapter Embedding Extension

Tool System

Skill Composition

Multi-Agent Orchestration

Sandbox Architecture

Observability: Event-Based, Fully Extensible

RAG Pipeline

Eval (Minimal for v3)

Testing Decisions

Critical (must ship with v3)

High Priority

Medium Priority

Low Priority

Maintain Current

Prior Art

Out of Scope

Further Notes

Phasing Recommendation

Naming

Versioning

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

PRD: AgentsKit v3 — Full Ecosystem Vision #2

Description

Problem Statement

Solution

User Stories

Implementation Decisions

Architecture: Separation of Chat and Agent Engines (Option B)

Core Contracts (types only, zero runtime cost)

Adapter Embedding Extension

Tool System

Skill Composition

Multi-Agent Orchestration

Sandbox Architecture

Observability: Event-Based, Fully Extensible

RAG Pipeline

Eval (Minimal for v3)

Testing Decisions

Critical (must ship with v3)

High Priority

Medium Priority

Low Priority

Maintain Current

Prior Art

Out of Scope

Further Notes

Phasing Recommendation

Naming

Versioning

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions