Problem Statement
AgentsKit today is a chat streaming layer — it handles sending messages to LLMs and rendering responses in React, Ink, and CLI. But building real AI agents requires much more: autonomous execution loops, tool ecosystems, persistent memory, RAG, sandboxed code execution, observability, and evaluation. Developers currently have to stitch together dozens of unrelated libraries to build production agents, with no unified contracts or interoperability guarantees.
The ecosystem needs to evolve from "chat UI kit" to "the complete foundation for the agent era in JavaScript" — while keeping its core principle of being lightweight, plug-and-play, and easy to integrate.
Solution
Expand AgentsKit from 6 packages to 14, organized in layers:
- Foundation:
core (contracts + primitives), adapters (LLM providers)
- UI:
react (browser), ink (terminal), cli (command-line)
- Agent Runtime:
runtime (autonomous execution), tools (executable actions), skills (behavioral prompts)
- Data:
memory (persistence + vector), rag (retrieval-augmented generation)
- Infrastructure:
sandbox (secure code execution), observability (logging/tracing), eval (benchmarking)
- Compatibility:
compat-react-core (legacy bridge)
Every package follows three vital rules:
@agentskit/core stays extremely lightweight — zero external dependencies, under 10 KB gzipped
- Every package is plug-and-play — simple imports, minimal configuration, clear contracts
- Any combination of packages works together seamlessly
User Stories
- As a frontend developer, I want to add AI chat to my React app with
useChat and pre-built components, so that I can ship a streaming chat UI in under 10 lines of code
- As a backend developer, I want to run autonomous agents without any UI, so that I can automate tasks like research, code generation, and data analysis
- As an agent builder, I want a ReAct loop with tool execution, so that my agent can observe, think, and act iteratively until a task is complete
- As a developer, I want to swap LLM providers in one line (OpenAI → Anthropic → Gemini), so that I'm not locked into any single vendor
- As a developer, I want to use pre-built tools (web search, filesystem, browser, code execution), so that I don't have to implement common integrations from scratch
- As a developer, I want to create custom tools with a simple contract (name, description, schema, execute), so that I can extend agent capabilities for my domain
- As a developer, I want tools to support JSON Schema for their parameters, so that LLMs can reliably generate valid function calls
- As a developer, I want tools with optional
init()/dispose() lifecycle methods, so that stateful tools (browser sessions, DB connections) are properly managed
- As a developer, I want tools that can stream incremental output via
AsyncIterable, so that long-running tools (web scraping, code execution) show progress
- As a developer, I want to use pre-built skills (researcher, coder, planner, critic), so that I can give agents specialized behavior without writing complex prompts
- As a developer, I want skills to reference other skills via
delegates, so that a planner skill can delegate subtasks to researcher and coder skills
- As a developer, I want skills with
onActivate hooks, so that activating a skill can auto-register relevant tools
- As a developer, I want persistent conversation memory across sessions, so that my chat app remembers previous conversations
- As a developer, I want vector memory with semantic search, so that my agent can retrieve relevant context from large knowledge bases
- As a developer, I want multiple memory backends (in-memory, localStorage, file, SQLite, Redis, LanceDB), so that I can choose the right storage for my use case
- As a developer, I want plug-and-play RAG with document chunking, embedding, and retrieval, so that I can add knowledge retrieval in a few lines
- As a developer, I want to bring my own embedding function to RAG, so that I'm not locked into any specific embedding provider
- As a developer, I want adapter-provided embedders (openaiEmbedder, geminiEmbedder) for convenience, so that setup is a one-liner
- As a developer, I want secure sandboxed code execution for agent-generated code, so that untrusted code can't harm my system
- As a developer, I want sandbox with configurable security (network, timeout, memory limits), so that I can tune restrictions per use case
- As a developer, I want the sandbox exposed as a standalone primitive, so that my custom tools can also execute code securely
- As a developer, I want observability that captures LLM calls, tool executions, memory operations, and agent steps, so that I can debug and monitor my agents
- As a developer, I want to use my own custom logger/observer, so that I can integrate with whatever logging system my team uses (Datadog, Sentry, etc.)
- As a developer, I want built-in observers for LangSmith, OpenTelemetry, and console, so that common integrations work out of the box
- As a developer, I want observability to be non-blocking and optional, so that it never slows down my agent or requires installation
- As a developer, I want to evaluate my agents with metrics (accuracy, latency, cost, token usage), so that I can measure and improve performance
- As a developer, I want multi-agent orchestration with directed delegation, so that a parent agent can assign subtasks to specialized child agents
- As a developer, I want a shared context for collaborating agents, so that agents in a multi-agent setup can share information
- As a CLI user, I want
agentskit chat to launch an interactive terminal chat, so that I can quickly test different providers and models
- As a CLI user, I want
agentskit init to scaffold a new project with React or Ink templates, so that I can start building immediately
- As a CLI user, I want
agentskit run to execute runtime agents from the terminal, so that I can run automation tasks without writing an app
- As a terminal developer, I want Ink components identical to the React ones, so that the developer experience is consistent across platforms
- As an existing user, I want backward compatibility with
@agentskit-react/core imports, so that my v2 code keeps working
- As a library author, I want to build custom adapters with a clear contract, so that I can add support for new LLM providers
- As a library author, I want to build custom memory backends with a clear contract, so that I can integrate with my existing data infrastructure
- As a developer, I want all packages independently installable, so that I only pull in what I need
Implementation Decisions
Architecture: Separation of Chat and Agent Engines (Option B)
The system has two execution engines that share primitives but serve different purposes:
- ChatController (in core) — UI-oriented chat state machine. Manages messages, streaming, input state. Designed for request-response cycles driven by user interaction.
- AgentRunner (in runtime) — Autonomous execution engine. Manages ReAct loops, reflection, planning, multi-agent delegation. Designed for headless execution without UI state overhead.
Shared primitives extracted into core: executeToolCall, consumeStream, buildMessage. Both engines use these, eliminating duplication while keeping their execution models independent.
Core Contracts (types only, zero runtime cost)
Core defines all contracts as TypeScript interfaces. New types added to core:
- ToolDefinition (evolved):
schema becomes JSONSchema7 type (enforced), adds optional init()/dispose() lifecycle, execute can return MaybePromise<unknown> | AsyncIterable<unknown>, adds optional tags/category for discovery
- SkillDefinition (new):
name, description, systemPrompt, optional examples, tools (tool name hints), delegates (other skill names), temperature, onActivate hook
- VectorMemory (new):
store(docs), search(query, options), delete(ids) — separate from ChatMemory. ChatMemory stays as conversation persistence, VectorMemory handles embeddings/semantic search
- AgentEvent (new): Union type for lifecycle events (
llm:start, llm:end, tool:start, tool:end, memory:load, agent:step, error, etc.)
- Observer (new):
{ name: string, on: (event: AgentEvent) => void | Promise<void> } — fully extensible, anyone can implement custom observers
- EvalSuite / EvalResult (new): Minimal contract for evaluation
Adapter Embedding Extension
@agentskit/adapters exports standalone embedder functions (openaiEmbedder, geminiEmbedder, etc.) that satisfy the embed: (text: string) => MaybePromise<number[]> contract used by RAG. These are separate from the chat adapters — no coupling between chat and embedding.
Tool System
- Tools follow a strict contract with enforced JSON Schema for parameters
- Stateful tools use
init()/dispose() lifecycle (e.g., browser tool opens/closes Puppeteer)
- Streaming tools return
AsyncIterable for incremental output
- Auto-discovery via
tags/category on the tool definition itself (no separate registry/manifest)
- Tools support parallel execution and human confirmation (
requiresConfirmation)
Skill Composition
- Multiple skills compose by concatenating
systemPrompt values with clear delimiters
delegates field enables multi-agent patterns: a planner skill references researcher and coder skills
onActivate hook runs when a skill is activated, enabling auto-registration of related tools
- Skills are pure data + optional hooks — no runtime logic, no state
Multi-Agent Orchestration
- Directed delegation as the default: parent agent explicitly assigns subtasks to child agents
- Optional shared context (
createSharedContext()) for collaborative patterns (debate, consensus)
- Parent writes and reads shared context; children read shared context but only write to their result
- Tree-shaped execution: parent waits for child results before proceeding
Sandbox Architecture
- Exposed as a standalone primitive (
sandbox.execute(code, options)) — not just a tool wrapper
- Primary backend: E2B; fallback: WebContainer
@agentskit/tools ships a codeExecution tool that uses the sandbox internally
- Security defaults: no network, 30s timeout, 50MB memory — all overridable
- Supports JavaScript and Python
Observability: Event-Based, Fully Extensible
- Core emits
AgentEvent from both ChatController and AgentRunner
- Observer interface is trivially simple:
{ name, on(event) }
- Multiple observers run in parallel, asynchronously, non-blocking
@agentskit/observability provides convenience implementations (LangSmith, OpenTelemetry, console)
- Anyone can implement custom observers for any logging system — the package is optional
- Zero coupling: if observability isn't installed, events are emitted but ignored
RAG Pipeline
embed function provided by the user (no default provider)
@agentskit/adapters exports convenience embedders for common providers
- RAG handles chunking, embedding, retrieval, and context injection
- Default integration with
@agentskit/memory's VectorMemory for storage
- Simple API:
rag.retrieve(query) and useRAGChat() hook for React
Eval (Minimal for v3)
- Contract defined in core (
EvalSuite, EvalResult types)
- One basic runner in
@agentskit/eval: accuracy + latency metrics
- Designed for CI/CD:
eval.run(agent, testCases) returns structured results
- Advanced metrics, test suites, and dashboard deferred to v3.1+
Testing Decisions
Good tests verify external behavior through public interfaces — not implementation details. If the test would break from an internal refactor without any behavior change, it's testing the wrong thing.
Critical (must ship with v3)
core: Unit tests for ChatController state machine (message lifecycle, streaming, abort, retry), memory implementations (load/save/clear round-trips), shared primitives (tool execution, stream consumption, message building). These are the foundation — if they break, everything breaks.
runtime: Unit tests for AgentRunner ReAct loop (observe→think→act cycle, tool result re-injection, termination conditions), delegation (parent→child task assignment, result collection), shared context (read/write isolation between parent and children).
High Priority
tools: Contract compliance tests (every tool satisfies ToolDefinition shape), unit tests per tool with mocked externals (browser tool doesn't launch real Puppeteer), lifecycle tests (init/dispose called correctly for stateful tools).
memory: Integration tests per backend — SQLite, Redis, and LanceDB with test containers or in-process equivalents. Test ChatMemory and VectorMemory contracts against each backend.
Medium Priority
adapters: Contract tests (every adapter returns valid StreamSource shape, streams valid StreamChunks). Integration tests optional (require API keys, better suited for CI with secrets).
rag: Unit tests for chunking logic and retrieval scoring. Integration test combining embedder + VectorMemory + retrieval pipeline.
sandbox: Integration tests with E2B (requires API key) and WebContainer. Security boundary tests (verify network blocked, timeout enforced, memory limited).
ink: Expand from smoke test to component render tests (message rendering, input handling, tool call display).
Low Priority
skills: Contract compliance (shape validation), prompt composition tests (multiple skills concatenate correctly).
observability: Unit tests for event routing (events reach all registered observers, errors in one observer don't break others).
eval: Contract tests only — verify EvalResult shape from basic runner.
Maintain Current
react: Already has thorough coverage (10+ tests). Maintain and extend as new features land.
cli: Already decent. Add tests for new agentskit run command.
Prior Art
Existing test patterns in the repo: packages/react/tests/ for component and hook testing with mock adapters, packages/cli/tests/init.test.ts for CLI command testing, packages/adapters/tests/ for contract shape tests.
Out of Scope
- Authentication/authorization middleware — AgentsKit is a client-side/runtime toolkit, not a server framework
- Server-side API routes or hosting — use Vercel AI SDK, Express, or Next.js for that layer
- Visual/GUI agent builder — AgentsKit is code-first
- Billing or usage metering — belongs in the application layer
- Training or fine-tuning — use provider-specific tools
- Advanced eval dashboard and CI integration — deferred to v3.1+
- Agent marketplace or registry service — tools and skills are local packages, not a hosted registry
- Mobile (React Native) — React package targets browser DOM only
Further Notes
Phasing Recommendation
While this PRD covers the full v3 vision, implementation should be phased by dependency order:
- Phase 1 — Foundation: Evolve
core contracts (ToolDefinition, SkillDefinition, VectorMemory, AgentEvent, Observer types), extract shared primitives, add core tests
- Phase 2 — Runtime + Tools: Build
runtime (AgentRunner, ReAct loop, delegation), tools (contract + first batch of tools), skills (contract + starter skills)
- Phase 3 — Data Layer: Build
memory (SQLite, Redis, LanceDB backends), rag (chunking, embedding, retrieval pipeline)
- Phase 4 — Infrastructure: Build
sandbox (E2B + WebContainer), observability (event routing + built-in observers), eval (minimal runner)
- Phase 5 — Polish: Update
cli with agentskit run, update docs, update examples, ensure all packages interop cleanly
Naming
All packages use the @agentskit scope. The GitHub repo is EmersonBraun/agentskit. The ecosystem is referred to as "AgentsKit" in prose.
Versioning
All packages ship at 0.2.0 currently. The v3 work should bump to 0.3.0 per phase (or 1.0.0 when the full vision is stable). Changesets manage versioning across the monorepo.
Problem Statement
AgentsKit today is a chat streaming layer — it handles sending messages to LLMs and rendering responses in React, Ink, and CLI. But building real AI agents requires much more: autonomous execution loops, tool ecosystems, persistent memory, RAG, sandboxed code execution, observability, and evaluation. Developers currently have to stitch together dozens of unrelated libraries to build production agents, with no unified contracts or interoperability guarantees.
The ecosystem needs to evolve from "chat UI kit" to "the complete foundation for the agent era in JavaScript" — while keeping its core principle of being lightweight, plug-and-play, and easy to integrate.
Solution
Expand AgentsKit from 6 packages to 14, organized in layers:
core(contracts + primitives),adapters(LLM providers)react(browser),ink(terminal),cli(command-line)runtime(autonomous execution),tools(executable actions),skills(behavioral prompts)memory(persistence + vector),rag(retrieval-augmented generation)sandbox(secure code execution),observability(logging/tracing),eval(benchmarking)compat-react-core(legacy bridge)Every package follows three vital rules:
@agentskit/corestays extremely lightweight — zero external dependencies, under 10 KB gzippedUser Stories
useChatand pre-built components, so that I can ship a streaming chat UI in under 10 lines of codeinit()/dispose()lifecycle methods, so that stateful tools (browser sessions, DB connections) are properly managedAsyncIterable, so that long-running tools (web scraping, code execution) show progressdelegates, so that a planner skill can delegate subtasks to researcher and coder skillsonActivatehooks, so that activating a skill can auto-register relevant toolsagentskit chatto launch an interactive terminal chat, so that I can quickly test different providers and modelsagentskit initto scaffold a new project with React or Ink templates, so that I can start building immediatelyagentskit runto execute runtime agents from the terminal, so that I can run automation tasks without writing an app@agentskit-react/coreimports, so that my v2 code keeps workingImplementation Decisions
Architecture: Separation of Chat and Agent Engines (Option B)
The system has two execution engines that share primitives but serve different purposes:
Shared primitives extracted into core:
executeToolCall,consumeStream,buildMessage. Both engines use these, eliminating duplication while keeping their execution models independent.Core Contracts (types only, zero runtime cost)
Core defines all contracts as TypeScript interfaces. New types added to core:
schemabecomesJSONSchema7type (enforced), adds optionalinit()/dispose()lifecycle,executecan returnMaybePromise<unknown> | AsyncIterable<unknown>, adds optionaltags/categoryfor discoveryname,description,systemPrompt, optionalexamples,tools(tool name hints),delegates(other skill names),temperature,onActivatehookstore(docs),search(query, options),delete(ids)— separate from ChatMemory. ChatMemory stays as conversation persistence, VectorMemory handles embeddings/semantic searchllm:start,llm:end,tool:start,tool:end,memory:load,agent:step,error, etc.){ name: string, on: (event: AgentEvent) => void | Promise<void> }— fully extensible, anyone can implement custom observersAdapter Embedding Extension
@agentskit/adaptersexports standalone embedder functions (openaiEmbedder,geminiEmbedder, etc.) that satisfy theembed: (text: string) => MaybePromise<number[]>contract used by RAG. These are separate from the chat adapters — no coupling between chat and embedding.Tool System
init()/dispose()lifecycle (e.g., browser tool opens/closes Puppeteer)AsyncIterablefor incremental outputtags/categoryon the tool definition itself (no separate registry/manifest)requiresConfirmation)Skill Composition
systemPromptvalues with clear delimitersdelegatesfield enables multi-agent patterns: a planner skill references researcher and coder skillsonActivatehook runs when a skill is activated, enabling auto-registration of related toolsMulti-Agent Orchestration
createSharedContext()) for collaborative patterns (debate, consensus)Sandbox Architecture
sandbox.execute(code, options)) — not just a tool wrapper@agentskit/toolsships acodeExecutiontool that uses the sandbox internallyObservability: Event-Based, Fully Extensible
AgentEventfrom both ChatController and AgentRunner{ name, on(event) }@agentskit/observabilityprovides convenience implementations (LangSmith, OpenTelemetry, console)RAG Pipeline
embedfunction provided by the user (no default provider)@agentskit/adaptersexports convenience embedders for common providers@agentskit/memory's VectorMemory for storagerag.retrieve(query)anduseRAGChat()hook for ReactEval (Minimal for v3)
EvalSuite,EvalResulttypes)@agentskit/eval: accuracy + latency metricseval.run(agent, testCases)returns structured resultsTesting Decisions
Good tests verify external behavior through public interfaces — not implementation details. If the test would break from an internal refactor without any behavior change, it's testing the wrong thing.
Critical (must ship with v3)
core: Unit tests for ChatController state machine (message lifecycle, streaming, abort, retry), memory implementations (load/save/clear round-trips), shared primitives (tool execution, stream consumption, message building). These are the foundation — if they break, everything breaks.runtime: Unit tests for AgentRunner ReAct loop (observe→think→act cycle, tool result re-injection, termination conditions), delegation (parent→child task assignment, result collection), shared context (read/write isolation between parent and children).High Priority
tools: Contract compliance tests (every tool satisfies ToolDefinition shape), unit tests per tool with mocked externals (browser tool doesn't launch real Puppeteer), lifecycle tests (init/dispose called correctly for stateful tools).memory: Integration tests per backend — SQLite, Redis, and LanceDB with test containers or in-process equivalents. Test ChatMemory and VectorMemory contracts against each backend.Medium Priority
adapters: Contract tests (every adapter returns valid StreamSource shape, streams valid StreamChunks). Integration tests optional (require API keys, better suited for CI with secrets).rag: Unit tests for chunking logic and retrieval scoring. Integration test combining embedder + VectorMemory + retrieval pipeline.sandbox: Integration tests with E2B (requires API key) and WebContainer. Security boundary tests (verify network blocked, timeout enforced, memory limited).ink: Expand from smoke test to component render tests (message rendering, input handling, tool call display).Low Priority
skills: Contract compliance (shape validation), prompt composition tests (multiple skills concatenate correctly).observability: Unit tests for event routing (events reach all registered observers, errors in one observer don't break others).eval: Contract tests only — verify EvalResult shape from basic runner.Maintain Current
react: Already has thorough coverage (10+ tests). Maintain and extend as new features land.cli: Already decent. Add tests for newagentskit runcommand.Prior Art
Existing test patterns in the repo:
packages/react/tests/for component and hook testing with mock adapters,packages/cli/tests/init.test.tsfor CLI command testing,packages/adapters/tests/for contract shape tests.Out of Scope
Further Notes
Phasing Recommendation
While this PRD covers the full v3 vision, implementation should be phased by dependency order:
corecontracts (ToolDefinition, SkillDefinition, VectorMemory, AgentEvent, Observer types), extract shared primitives, add core testsruntime(AgentRunner, ReAct loop, delegation),tools(contract + first batch of tools),skills(contract + starter skills)memory(SQLite, Redis, LanceDB backends),rag(chunking, embedding, retrieval pipeline)sandbox(E2B + WebContainer),observability(event routing + built-in observers),eval(minimal runner)cliwithagentskit run, update docs, update examples, ensure all packages interop cleanlyNaming
All packages use the
@agentskitscope. The GitHub repo isEmersonBraun/agentskit. The ecosystem is referred to as "AgentsKit" in prose.Versioning
All packages ship at
0.2.0currently. The v3 work should bump to0.3.0per phase (or1.0.0when the full vision is stable). Changesets manage versioning across the monorepo.