Skip to content

PRD: AgentsKit v3 — Full Ecosystem Vision #2

@EmersonBraun

Description

@EmersonBraun

Problem Statement

AgentsKit today is a chat streaming layer — it handles sending messages to LLMs and rendering responses in React, Ink, and CLI. But building real AI agents requires much more: autonomous execution loops, tool ecosystems, persistent memory, RAG, sandboxed code execution, observability, and evaluation. Developers currently have to stitch together dozens of unrelated libraries to build production agents, with no unified contracts or interoperability guarantees.

The ecosystem needs to evolve from "chat UI kit" to "the complete foundation for the agent era in JavaScript" — while keeping its core principle of being lightweight, plug-and-play, and easy to integrate.

Solution

Expand AgentsKit from 6 packages to 14, organized in layers:

  • Foundation: core (contracts + primitives), adapters (LLM providers)
  • UI: react (browser), ink (terminal), cli (command-line)
  • Agent Runtime: runtime (autonomous execution), tools (executable actions), skills (behavioral prompts)
  • Data: memory (persistence + vector), rag (retrieval-augmented generation)
  • Infrastructure: sandbox (secure code execution), observability (logging/tracing), eval (benchmarking)
  • Compatibility: compat-react-core (legacy bridge)

Every package follows three vital rules:

  1. @agentskit/core stays extremely lightweight — zero external dependencies, under 10 KB gzipped
  2. Every package is plug-and-play — simple imports, minimal configuration, clear contracts
  3. Any combination of packages works together seamlessly

User Stories

  1. As a frontend developer, I want to add AI chat to my React app with useChat and pre-built components, so that I can ship a streaming chat UI in under 10 lines of code
  2. As a backend developer, I want to run autonomous agents without any UI, so that I can automate tasks like research, code generation, and data analysis
  3. As an agent builder, I want a ReAct loop with tool execution, so that my agent can observe, think, and act iteratively until a task is complete
  4. As a developer, I want to swap LLM providers in one line (OpenAI → Anthropic → Gemini), so that I'm not locked into any single vendor
  5. As a developer, I want to use pre-built tools (web search, filesystem, browser, code execution), so that I don't have to implement common integrations from scratch
  6. As a developer, I want to create custom tools with a simple contract (name, description, schema, execute), so that I can extend agent capabilities for my domain
  7. As a developer, I want tools to support JSON Schema for their parameters, so that LLMs can reliably generate valid function calls
  8. As a developer, I want tools with optional init()/dispose() lifecycle methods, so that stateful tools (browser sessions, DB connections) are properly managed
  9. As a developer, I want tools that can stream incremental output via AsyncIterable, so that long-running tools (web scraping, code execution) show progress
  10. As a developer, I want to use pre-built skills (researcher, coder, planner, critic), so that I can give agents specialized behavior without writing complex prompts
  11. As a developer, I want skills to reference other skills via delegates, so that a planner skill can delegate subtasks to researcher and coder skills
  12. As a developer, I want skills with onActivate hooks, so that activating a skill can auto-register relevant tools
  13. As a developer, I want persistent conversation memory across sessions, so that my chat app remembers previous conversations
  14. As a developer, I want vector memory with semantic search, so that my agent can retrieve relevant context from large knowledge bases
  15. As a developer, I want multiple memory backends (in-memory, localStorage, file, SQLite, Redis, LanceDB), so that I can choose the right storage for my use case
  16. As a developer, I want plug-and-play RAG with document chunking, embedding, and retrieval, so that I can add knowledge retrieval in a few lines
  17. As a developer, I want to bring my own embedding function to RAG, so that I'm not locked into any specific embedding provider
  18. As a developer, I want adapter-provided embedders (openaiEmbedder, geminiEmbedder) for convenience, so that setup is a one-liner
  19. As a developer, I want secure sandboxed code execution for agent-generated code, so that untrusted code can't harm my system
  20. As a developer, I want sandbox with configurable security (network, timeout, memory limits), so that I can tune restrictions per use case
  21. As a developer, I want the sandbox exposed as a standalone primitive, so that my custom tools can also execute code securely
  22. As a developer, I want observability that captures LLM calls, tool executions, memory operations, and agent steps, so that I can debug and monitor my agents
  23. As a developer, I want to use my own custom logger/observer, so that I can integrate with whatever logging system my team uses (Datadog, Sentry, etc.)
  24. As a developer, I want built-in observers for LangSmith, OpenTelemetry, and console, so that common integrations work out of the box
  25. As a developer, I want observability to be non-blocking and optional, so that it never slows down my agent or requires installation
  26. As a developer, I want to evaluate my agents with metrics (accuracy, latency, cost, token usage), so that I can measure and improve performance
  27. As a developer, I want multi-agent orchestration with directed delegation, so that a parent agent can assign subtasks to specialized child agents
  28. As a developer, I want a shared context for collaborating agents, so that agents in a multi-agent setup can share information
  29. As a CLI user, I want agentskit chat to launch an interactive terminal chat, so that I can quickly test different providers and models
  30. As a CLI user, I want agentskit init to scaffold a new project with React or Ink templates, so that I can start building immediately
  31. As a CLI user, I want agentskit run to execute runtime agents from the terminal, so that I can run automation tasks without writing an app
  32. As a terminal developer, I want Ink components identical to the React ones, so that the developer experience is consistent across platforms
  33. As an existing user, I want backward compatibility with @agentskit-react/core imports, so that my v2 code keeps working
  34. As a library author, I want to build custom adapters with a clear contract, so that I can add support for new LLM providers
  35. As a library author, I want to build custom memory backends with a clear contract, so that I can integrate with my existing data infrastructure
  36. As a developer, I want all packages independently installable, so that I only pull in what I need

Implementation Decisions

Architecture: Separation of Chat and Agent Engines (Option B)

The system has two execution engines that share primitives but serve different purposes:

  • ChatController (in core) — UI-oriented chat state machine. Manages messages, streaming, input state. Designed for request-response cycles driven by user interaction.
  • AgentRunner (in runtime) — Autonomous execution engine. Manages ReAct loops, reflection, planning, multi-agent delegation. Designed for headless execution without UI state overhead.

Shared primitives extracted into core: executeToolCall, consumeStream, buildMessage. Both engines use these, eliminating duplication while keeping their execution models independent.

Core Contracts (types only, zero runtime cost)

Core defines all contracts as TypeScript interfaces. New types added to core:

  • ToolDefinition (evolved): schema becomes JSONSchema7 type (enforced), adds optional init()/dispose() lifecycle, execute can return MaybePromise<unknown> | AsyncIterable<unknown>, adds optional tags/category for discovery
  • SkillDefinition (new): name, description, systemPrompt, optional examples, tools (tool name hints), delegates (other skill names), temperature, onActivate hook
  • VectorMemory (new): store(docs), search(query, options), delete(ids) — separate from ChatMemory. ChatMemory stays as conversation persistence, VectorMemory handles embeddings/semantic search
  • AgentEvent (new): Union type for lifecycle events (llm:start, llm:end, tool:start, tool:end, memory:load, agent:step, error, etc.)
  • Observer (new): { name: string, on: (event: AgentEvent) => void | Promise<void> } — fully extensible, anyone can implement custom observers
  • EvalSuite / EvalResult (new): Minimal contract for evaluation

Adapter Embedding Extension

@agentskit/adapters exports standalone embedder functions (openaiEmbedder, geminiEmbedder, etc.) that satisfy the embed: (text: string) => MaybePromise<number[]> contract used by RAG. These are separate from the chat adapters — no coupling between chat and embedding.

Tool System

  • Tools follow a strict contract with enforced JSON Schema for parameters
  • Stateful tools use init()/dispose() lifecycle (e.g., browser tool opens/closes Puppeteer)
  • Streaming tools return AsyncIterable for incremental output
  • Auto-discovery via tags/category on the tool definition itself (no separate registry/manifest)
  • Tools support parallel execution and human confirmation (requiresConfirmation)

Skill Composition

  • Multiple skills compose by concatenating systemPrompt values with clear delimiters
  • delegates field enables multi-agent patterns: a planner skill references researcher and coder skills
  • onActivate hook runs when a skill is activated, enabling auto-registration of related tools
  • Skills are pure data + optional hooks — no runtime logic, no state

Multi-Agent Orchestration

  • Directed delegation as the default: parent agent explicitly assigns subtasks to child agents
  • Optional shared context (createSharedContext()) for collaborative patterns (debate, consensus)
  • Parent writes and reads shared context; children read shared context but only write to their result
  • Tree-shaped execution: parent waits for child results before proceeding

Sandbox Architecture

  • Exposed as a standalone primitive (sandbox.execute(code, options)) — not just a tool wrapper
  • Primary backend: E2B; fallback: WebContainer
  • @agentskit/tools ships a codeExecution tool that uses the sandbox internally
  • Security defaults: no network, 30s timeout, 50MB memory — all overridable
  • Supports JavaScript and Python

Observability: Event-Based, Fully Extensible

  • Core emits AgentEvent from both ChatController and AgentRunner
  • Observer interface is trivially simple: { name, on(event) }
  • Multiple observers run in parallel, asynchronously, non-blocking
  • @agentskit/observability provides convenience implementations (LangSmith, OpenTelemetry, console)
  • Anyone can implement custom observers for any logging system — the package is optional
  • Zero coupling: if observability isn't installed, events are emitted but ignored

RAG Pipeline

  • embed function provided by the user (no default provider)
  • @agentskit/adapters exports convenience embedders for common providers
  • RAG handles chunking, embedding, retrieval, and context injection
  • Default integration with @agentskit/memory's VectorMemory for storage
  • Simple API: rag.retrieve(query) and useRAGChat() hook for React

Eval (Minimal for v3)

  • Contract defined in core (EvalSuite, EvalResult types)
  • One basic runner in @agentskit/eval: accuracy + latency metrics
  • Designed for CI/CD: eval.run(agent, testCases) returns structured results
  • Advanced metrics, test suites, and dashboard deferred to v3.1+

Testing Decisions

Good tests verify external behavior through public interfaces — not implementation details. If the test would break from an internal refactor without any behavior change, it's testing the wrong thing.

Critical (must ship with v3)

  • core: Unit tests for ChatController state machine (message lifecycle, streaming, abort, retry), memory implementations (load/save/clear round-trips), shared primitives (tool execution, stream consumption, message building). These are the foundation — if they break, everything breaks.
  • runtime: Unit tests for AgentRunner ReAct loop (observe→think→act cycle, tool result re-injection, termination conditions), delegation (parent→child task assignment, result collection), shared context (read/write isolation between parent and children).

High Priority

  • tools: Contract compliance tests (every tool satisfies ToolDefinition shape), unit tests per tool with mocked externals (browser tool doesn't launch real Puppeteer), lifecycle tests (init/dispose called correctly for stateful tools).
  • memory: Integration tests per backend — SQLite, Redis, and LanceDB with test containers or in-process equivalents. Test ChatMemory and VectorMemory contracts against each backend.

Medium Priority

  • adapters: Contract tests (every adapter returns valid StreamSource shape, streams valid StreamChunks). Integration tests optional (require API keys, better suited for CI with secrets).
  • rag: Unit tests for chunking logic and retrieval scoring. Integration test combining embedder + VectorMemory + retrieval pipeline.
  • sandbox: Integration tests with E2B (requires API key) and WebContainer. Security boundary tests (verify network blocked, timeout enforced, memory limited).
  • ink: Expand from smoke test to component render tests (message rendering, input handling, tool call display).

Low Priority

  • skills: Contract compliance (shape validation), prompt composition tests (multiple skills concatenate correctly).
  • observability: Unit tests for event routing (events reach all registered observers, errors in one observer don't break others).
  • eval: Contract tests only — verify EvalResult shape from basic runner.

Maintain Current

  • react: Already has thorough coverage (10+ tests). Maintain and extend as new features land.
  • cli: Already decent. Add tests for new agentskit run command.

Prior Art

Existing test patterns in the repo: packages/react/tests/ for component and hook testing with mock adapters, packages/cli/tests/init.test.ts for CLI command testing, packages/adapters/tests/ for contract shape tests.

Out of Scope

  • Authentication/authorization middleware — AgentsKit is a client-side/runtime toolkit, not a server framework
  • Server-side API routes or hosting — use Vercel AI SDK, Express, or Next.js for that layer
  • Visual/GUI agent builder — AgentsKit is code-first
  • Billing or usage metering — belongs in the application layer
  • Training or fine-tuning — use provider-specific tools
  • Advanced eval dashboard and CI integration — deferred to v3.1+
  • Agent marketplace or registry service — tools and skills are local packages, not a hosted registry
  • Mobile (React Native) — React package targets browser DOM only

Further Notes

Phasing Recommendation

While this PRD covers the full v3 vision, implementation should be phased by dependency order:

  1. Phase 1 — Foundation: Evolve core contracts (ToolDefinition, SkillDefinition, VectorMemory, AgentEvent, Observer types), extract shared primitives, add core tests
  2. Phase 2 — Runtime + Tools: Build runtime (AgentRunner, ReAct loop, delegation), tools (contract + first batch of tools), skills (contract + starter skills)
  3. Phase 3 — Data Layer: Build memory (SQLite, Redis, LanceDB backends), rag (chunking, embedding, retrieval pipeline)
  4. Phase 4 — Infrastructure: Build sandbox (E2B + WebContainer), observability (event routing + built-in observers), eval (minimal runner)
  5. Phase 5 — Polish: Update cli with agentskit run, update docs, update examples, ensure all packages interop cleanly

Naming

All packages use the @agentskit scope. The GitHub repo is EmersonBraun/agentskit. The ecosystem is referred to as "AgentsKit" in prose.

Versioning

All packages ship at 0.2.0 currently. The v3 work should bump to 0.3.0 per phase (or 1.0.0 when the full vision is stable). Changesets manage versioning across the monorepo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions