Open-source, agent-first Python framework for building AI agents that hold up under real, long, tool-heavy work — reliable, observable, and portable across LLM providers. Not a coding copilot. Not a chatbot. The framework you build serious agents with.
Quick Start · Features · Execution Modes · Packages · Docs · Changelog · Release Notes · Agent coverage
If NucleusIQ helps you build maintainable Python agents, please ⭐ star the repo so other engineers can find it while the project is still early.
NucleusIQ Agent Coverage — what we ship today vs planned (May 2026, audited on v0.7.12)
Interactive scorecard on Nucleusbox: every major agent type (research, multi-agent, coding, and more) with support today vs planned (v0.8–v0.9) — radars, category tables, 28-pattern matrix, and gap analysis in one page.
| Today (shipped) | Planned (roadmap) |
|---|---|
| 3 Gearbox modes (Direct / Standard / Autonomous) | Agent-as-Tool, structured sub-agent handoff |
| Context engine + telemetry | A2A (thin adapter) |
| Parallel sub-agents + Critic/Refiner | Public benchmark scorecards & Context Report |
| File tools + MCP 0.1.0 Stable | Optional shell tool for coding/terminal benches |
Best fit now: research and analysis agents. Honest gaps: no LLM-callable AgentTool yet, no published benchmark proof, sub-agent synthesis capped at 2K chars.
- Agent-first Python runtime — build with
Agent,Task,@tool, and typed results instead of chains, graphs, or a custom DSL. - Stable provider ecosystem — OpenAI, Gemini, Anthropic, Groq, Ollama, MCP, and Mock LLM aligned on
nucleusiq>=0.7.12. - Context management before overflow — NucleusIQ can compact / mask / recall before the LLM API rejects an oversized prompt.
- Production posture — 3,700+ tests across the monorepo, provider-agnostic observability, usage tracking, plugins, and clear execution modes.
nucleusiq0.7.12 — May 2026 Coordinated multi-package release promoting every alpha/beta provider to its first stable line:nucleusiq-anthropic0.2.0 Stable (Phase B feature-complete: native server tools, prompt caching, extended thinking, server-tool observability) ·nucleusiq-ollama0.2.0 Stable (vision wire +provider="ollama"enrichment) ·nucleusiq-groq0.1.0 Stable (hosted-tool observability stub + enrichment) ·nucleusiq-mcp0.1.0 Stable (dropsb1; no API changes) ·nucleusiq-openai0.7.0 +nucleusiq-gemini0.3.0 (native-tool observability + enrichment). Plus cross-cutting provider-agnostic native-tool observability in core:ToolCallRecord.executed_by ∈ {"local","provider"},LLMCallRecord.provider / request_id / organization_id / stop_reason / cache_read_input_tokens / cache_creation_input_tokens / metadata.
- 🧩 MCP Tool Adapter (Stable) — Plug any Model Context Protocol server (Slack, GitHub, Postgres, Stripe, …) into a NucleusIQ Agent in one line. Supports stdio + Streamable HTTP + SSE, OAuth 2.1 / Bearer / Env auth, graceful degradation, health checks, and full source-attributed tracing. 98.68% coverage, 235 unit + 13 live integration tests.
- 🪝 Core
ExpandableToolprotocol — Any tool factory (likeMCPTool) can expand into manyBaseToolinstances duringAgent.initialize()— without the core knowing what MCP is.- 🔭
ToolCallRecord.source— Telemetry now records the origin of every tool call (e.g.mcp://server=github (path=A)).- 🛡️ Parallel-safe initialization —
Agent.initialize()cleans up all expandable tools even when peers fail or the process is cancelled.See CHANGELOG.md for the full release notes.
NucleusIQ is an open-source, agent-first Python framework for building AI agents that survive production — long runs, messy tool output, and real stakes — instead of falling apart the moment they leave the demo.
In one line:
Every AI agent works in the demo. Almost none survive production. NucleusIQ is the framework for the ones that have to — reliable, observable, and provider-portable.
Why "survive production"? Most agents work in a demo, then forget what they read, choke when tool output piles up, or fail silently with no explanation. NucleusIQ manages the context window before the model breaks, runs across any LLM provider, and returns a scorecard of exactly what happened on every run.
NucleusIQ is built on a simple belief:
An agent is not a single model call. An agent is a managed runtime with memory, tools, policy, streaming, structure, and responsibilities.
A shared doctrine for what NucleusIQ stands for, why it exists, and how it should evolve over time.
See NucleusIQ Philosophy.
pip install nucleusiq nucleusiq-openai
export OPENAI_API_KEY=sk-...import asyncio
from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.agents.task import Task
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq_openai import BaseOpenAI
async def main() -> None:
agent = Agent(
name="analyst",
prompt=ZeroShotPrompt().configure(
system="You are a concise assistant. Answer in one short sentence.",
),
llm=BaseOpenAI(model_name="gpt-4o-mini"),
config=AgentConfig(execution_mode=ExecutionMode.DIRECT),
)
await agent.initialize()
result = await agent.execute(
Task(id="hello-1", objective="What is the capital of France?"),
)
print(result.output)
asyncio.run(main())See the Quickstart docs for provider setup, .env loading, tools, streaming, and structured output.
# Google Gemini
pip install nucleusiq nucleusiq-gemini
# Anthropic Claude
pip install nucleusiq nucleusiq-anthropic
# Groq inference
pip install nucleusiq nucleusiq-groq
# Ollama for local / remote models
pip install nucleusiq nucleusiq-ollama
# MCP tool adapter — plug any MCP server in as a tool
pip install nucleusiq-mcp nucleusiq-anthropic # or any provider
# uv works too
uv pip install nucleusiq nucleusiq-openaiimport asyncio
from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.agents.task import Task
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq_anthropic import BaseAnthropic
from nucleusiq_mcp import MCPTool
async def main() -> None:
agent = Agent(
name="researcher",
prompt=ZeroShotPrompt().configure(
system="You are a careful research assistant. Cite source ids when available.",
),
llm=BaseAnthropic(model_name="claude-sonnet-4-5-20250929", async_mode=True),
tools=[
# Transport is auto-detected from URL / command; auth is auto-wired.
MCPTool("npx -y @modelcontextprotocol/server-github"),
MCPTool("https://mcp.slack.com/api", auth="xoxb-..."),
],
config=AgentConfig(execution_mode=ExecutionMode.STANDARD, enable_tracing=True),
)
await agent.initialize() # connects to MCP servers and discovers tools
result = await agent.execute(
Task(
id="repo-summary",
objective="Summarise the last 5 issues in nucleusbox/NucleusIQ.",
),
)
print(result.output)
asyncio.run(main())See INSTALLATION.md for full setup instructions (pip, uv, development mode).
| Component | What it does |
|---|---|
| 3 Execution Modes | DIRECT (single call), STANDARD (tool loop), AUTONOMOUS (orchestration + validation + retry) |
| Streaming | execute_stream() — real-time token-by-token output with tool call visibility across all modes |
| 7 Prompt Techniques | ZeroShot, FewShot, ChainOfThought, AutoCoT, RAG, PromptComposer, MetaPrompt |
| Multimodal Attachments | 7 attachment types (text, PDF, images, files) with provider-native optimisation |
| Built-in File Tools | FileReadTool, FileSearchTool, DirectoryListTool, FileExtractTool — sandboxed to workspace |
| Tool System | BaseTool interface + @tool decorator + provider native tools (OpenAI: code_interpreter, file_search, web_search; Gemini: Google Search, Code Execution, URL Context, Maps; Anthropic: web_search, web_fetch, code_execution + extended thinking) |
| MCP Tool Adapter | Connect any Model Context Protocol server (Slack, GitHub, Postgres, Stripe, …) as native tools — stdio + Streamable HTTP + SSE; OAuth/Bearer/Env auth |
| Memory | 5 strategies (full history, sliding window, summary, summary+window, token budget) with file-aware metadata |
| Plugins | 10 built-in: call limits, retry, fallback, PII guard, human approval, tool guard, attachment guard, context window, result validator |
| Usage Tracking | Token usage per call with purpose tagging (main, planning, tool loop, critic, refiner) and cost estimation |
| Structured Output | Schema-based output parsing with Pydantic, dataclass, TypedDict support |
| Observability | ExecutionTracer records every model call + tool call with source attribution (e.g. mcp://server=github) |
| Provider Portability | Swap providers (OpenAI, Gemini, Anthropic, Groq, Ollama, …) with one line — same agent code, same tools, same plugins |
Tool-heavy agents fail when every tool result stays in the active prompt forever. NucleusIQ treats context as a managed runtime resource:
ContextEngine.prepare()runs before LLM calls, not after the provider rejects an oversized prompt.ContextLedgertracks prompt regions (system, user, assistant, tool calls, tool results) so the framework can compact the right thing first.- Large tool results can be masked / offloaded while staying recoverable through recall.
AgentResult.context_telemetryreports peak utilization, compaction events, tokens saved, and estimated savings.
See the context management guide and the observability guide.
Try the runnable Agent Engineering Challenge 01 to test context pressure, noisy tool outputs, and evidence quality on a concrete task.
NucleusIQ agents use the Gearbox Strategy — three execution modes that scale from simple chat to autonomous reasoning:
| Capability | Direct | Standard | Autonomous |
|---|---|---|---|
| Memory | Yes | Yes | Yes |
| Plugins | Yes | Yes | Yes |
| Tools | Yes (max 25) | Yes (max 80) | Yes (max 300) |
| Tool loop | Yes | Yes | Yes |
| Task decomposition | No | No | Yes |
| Independent verification (Critic) | No | No | Yes |
| Targeted correction (Refiner) | No | No | Yes |
| Validation pipeline | No | No | Yes |
Tool limits are configurable via AgentConfig(max_tool_calls=N). The framework validates tool count at agent creation and raises a clear error if the limit is exceeded.
# Direct: fast Q&A, simple lookups (max 25 tool calls)
AgentConfig(execution_mode=ExecutionMode.DIRECT)
# Standard: multi-step tool workflows (max 80 tool calls) — default
AgentConfig(execution_mode=ExecutionMode.STANDARD)
# Autonomous: orchestration + Critic/Refiner verification (max 300 tool calls)
AgentConfig(execution_mode=ExecutionMode.AUTONOMOUS)See the PE Due Diligence notebook for a real-world demo of Autonomous mode achieving 100% accuracy on 8 complex financial analyses with external validation.
NucleusIQ ships as a core framework + thin provider/tool packages. Install only what you need — every package can be added or removed independently.
| Package | Status | Version | Description |
|---|---|---|---|
nucleusiq |
🟢 Stable | 0.7.12 |
Core framework: agents, prompts, tools, memory, plugins, modes, tracing |
| Package | Status | Version | Description |
|---|---|---|---|
nucleusiq-openai |
🟢 Stable | 0.7.0 |
OpenAI (gpt-4o, o-series); Responses API + Chat Completions; native code_interpreter, file_search, web_search — now surfaces server_tool_calls for tracer-side cost split |
nucleusiq-gemini |
🟢 Stable | 0.3.0 |
Google Gemini; native Google Search + Code Execution emitted as ToolCallRecord(executed_by="provider"); URL Context, Maps grounding |
nucleusiq-anthropic |
🟢 Stable | 0.2.0 |
Anthropic Claude (Messages API); native server tools (AnthropicTool.web_search() / web_fetch() / code_execution() w/ auto-anthropic-beta), prompt caching (cache_tools / cache_system), extended thinking (thinking="low"|"medium"|"high"|"max"), server-tool observability · README |
| Package | Status | Version | Description |
|---|---|---|---|
nucleusiq-groq |
🟢 Stable | 0.1.0 |
Groq inference (Chat Completions) via official groq SDK; hosted-tool observability stub (message.executed_tools → server_tool_calls) · README · Guide |
nucleusiq-ollama |
🟢 Stable | 0.2.0 |
Local/remote Ollama via official ollama SDK; vision wire for OpenAI-style multimodal messages; structured output, think pass-through · README · Guide |
| Package | Status | Version | Description |
|---|---|---|---|
nucleusiq-mcp |
🟢 Stable | 0.1.0 |
Model Context Protocol adapter — turn any MCP server (Slack, GitHub, Postgres, Stripe, …) into NucleusIQ tools; stdio + Streamable HTTP + SSE; OAuth 2.1 / Bearer / Env auth · README · Guide |
Maturity legend: 🟢 Stable (production-ready, SemVer guarantees). Future pre-release packages may use 🟡 Beta / 🟠 Alpha while they mature.
src/
nucleusiq/core/ # Core framework (agents, prompts, tools, memory, plugins, modes, tracing)
providers/
llms/
openai/ # nucleusiq-openai
gemini/ # nucleusiq-gemini
anthropic/ # nucleusiq-anthropic
inference/
groq/ # nucleusiq-groq
ollama/ # nucleusiq-ollama
tools/
mcp/ # nucleusiq-mcp (Model Context Protocol adapter)
notebooks/agents/ # Example notebooks (PE due diligence, MCP showcase, …)
docs/ # Internal design/strategy docs (published docs live in nucleusiq-docs)
scripts/ # Repo-wide tooling (e.g. verify_core_package_layout.py)
# Monorepo: verify core setuptools packages + all Hatch provider/tool wheel roots
python scripts/verify_core_package_layout.py
# Core tests (1,795+ passing)
cd src/nucleusiq && python -m pytest tests/ -q
# OpenAI provider tests (224 passing)
cd src/providers/llms/openai && python -m pytest tests/ -q
# Gemini provider unit tests (221 passing)
cd src/providers/llms/gemini && python -m pytest tests/unit/ -q
# Anthropic provider tests (>=95% coverage gate)
cd src/providers/llms/anthropic && python -m pytest tests/ -q
# Groq provider tests (requires dev group / uv; >=90% coverage gate)
cd src/providers/inference/groq && uv run pytest -q
# Ollama provider tests (>=95% coverage gate; 100% line coverage on package)
cd src/providers/inference/ollama && uv run pytest -q
# MCP tool adapter — unit (235 passing; 98.68% coverage; >=90% gate)
cd src/providers/tools/mcp && python -m pytest tests/unit/ -q -m "not integration"
# MCP tool adapter — live integration (requires Node.js + npx)
cd src/providers/tools/mcp && python -m pytest tests/integration/ -m integration -v
# Gemini integration tests (requires GEMINI_API_KEY)
cd src/providers/llms/gemini && python -m pytest tests/integration/ -q- Published docs — https://nucleusbox.github.io/nucleusiq-docs/
- Docs repository — https://github.com/nucleusbox/nucleusiq-docs
- INSTALLATION.md — Setup instructions (pip, uv, development)
- CHANGELOG.md — Release notes
- RELEASE.md — Release process and branching strategy
- v0.7.12 release notes — latest stable release summary
- Provider guides — OpenAI, Gemini, Anthropic, Groq, Ollama, MCP
- MCP integration guide — MCP adapter usage
- File handling guide — Attachment vs Tool vs Both decision guide
- Fork the repository
- Create a branch:
git checkout -b yourname/my-feature main - Make your changes and add tests
- Submit a pull request to
main
See CONTRIBUTING.md for full details, coding standards, and the dev-setup walkthrough.
- 🐛 Bugs & feature requests — GitHub Issues
- 💬 Questions & ideas — GitHub Discussions
- ⭐ If NucleusIQ is useful to you, please consider starring the repo — it helps a lot.
MIT © Nucleusbox