GitHub - nucleusbox/NucleusIQ: NucleusIQ is an open-source Agent first framework

Agents that survive production.

Open-source, agent-first Python framework for building AI agents that hold up under real, long, tool-heavy work — reliable, observable, and portable across LLM providers. Not a coding copilot. Not a chatbot. The framework you build serious agents with.

Quick Start · Features · Execution Modes · Packages · Docs · Changelog · Release Notes · Agent coverage

If NucleusIQ helps you build maintainable Python agents, please ⭐ star the repo so other engineers can find it while the project is still early.

🗺️ Agent coverage map

NucleusIQ Agent Coverage — what we ship today vs planned (May 2026, audited on v0.7.12)

Interactive scorecard on Nucleusbox: every major agent type (research, multi-agent, coding, and more) with support today vs planned (v0.8–v0.9) — radars, category tables, 28-pattern matrix, and gap analysis in one page.

Today (shipped)	Planned (roadmap)
3 Gearbox modes (Direct / Standard / Autonomous)	Agent-as-Tool, structured sub-agent handoff
Context engine + telemetry	A2A (thin adapter)
Parallel sub-agents + Critic/Refiner	Public benchmark scorecards & Context Report
File tools + MCP 0.1.0 Stable	Optional shell tool for coding/terminal benches

Best fit now: research and analysis agents. Honest gaps: no LLM-callable AgentTool yet, no published benchmark proof, sub-agent synthesis capped at 2K chars.

Why Star NucleusIQ?

Agent-first Python runtime — build with Agent, Task, @tool, and typed results instead of chains, graphs, or a custom DSL.
Stable provider ecosystem — OpenAI, Gemini, Anthropic, Groq, Ollama, MCP, and Mock LLM aligned on nucleusiq>=0.7.12.
Context management before overflow — NucleusIQ can compact / mask / recall before the LLM API rejects an oversized prompt.
Production posture — 3,700+ tests across the monorepo, provider-agnostic observability, usage tracking, plugins, and clear execution modes.

✨ What's New

nucleusiq 0.7.12 — May 2026 Coordinated multi-package release promoting every alpha/beta provider to its first stable line: nucleusiq-anthropic 0.2.0 Stable (Phase B feature-complete: native server tools, prompt caching, extended thinking, server-tool observability) · nucleusiq-ollama 0.2.0 Stable (vision wire + provider="ollama" enrichment) · nucleusiq-groq 0.1.0 Stable (hosted-tool observability stub + enrichment) · nucleusiq-mcp 0.1.0 Stable (drops b1; no API changes) · nucleusiq-openai 0.7.0 + nucleusiq-gemini 0.3.0 (native-tool observability + enrichment). Plus cross-cutting provider-agnostic native-tool observability in core: ToolCallRecord.executed_by ∈ {"local","provider"}, LLMCallRecord.provider / request_id / organization_id / stop_reason / cache_read_input_tokens / cache_creation_input_tokens / metadata.

🧩 MCP Tool Adapter (Stable) — Plug any Model Context Protocol server (Slack, GitHub, Postgres, Stripe, …) into a NucleusIQ Agent in one line. Supports stdio + Streamable HTTP + SSE, OAuth 2.1 / Bearer / Env auth, graceful degradation, health checks, and full source-attributed tracing. 98.68% coverage, 235 unit + 13 live integration tests.

🪝 Core ExpandableTool protocol — Any tool factory (like MCPTool) can expand into many BaseTool instances during Agent.initialize() — without the core knowing what MCP is.

🔭 ToolCallRecord.source — Telemetry now records the origin of every tool call (e.g. mcp://server=github (path=A)).

🛡️ Parallel-safe initialization — Agent.initialize() cleans up all expandable tools even when peers fail or the process is cancelled.

See CHANGELOG.md for the full release notes.

What Is NucleusIQ?

NucleusIQ is an open-source, agent-first Python framework for building AI agents that survive production — long runs, messy tool output, and real stakes — instead of falling apart the moment they leave the demo.

In one line:

Every AI agent works in the demo. Almost none survive production. NucleusIQ is the framework for the ones that have to — reliable, observable, and provider-portable.

Why "survive production"? Most agents work in a demo, then forget what they read, choke when tool output piles up, or fail silently with no explanation. NucleusIQ manages the context window before the model breaks, runs across any LLM provider, and returns a scorecard of exactly what happened on every run.

NucleusIQ is built on a simple belief:

An agent is not a single model call. An agent is a managed runtime with memory, tools, policy, streaming, structure, and responsibilities.

NucleusIQ Philosophy

A shared doctrine for what NucleusIQ stands for, why it exists, and how it should evolve over time.

See NucleusIQ Philosophy.

🚀 Quick Start

Fastest path

pip install nucleusiq nucleusiq-openai
export OPENAI_API_KEY=sk-...

Hello agent

import asyncio

from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.agents.task import Task
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq_openai import BaseOpenAI


async def main() -> None:
    agent = Agent(
        name="analyst",
        prompt=ZeroShotPrompt().configure(
            system="You are a concise assistant. Answer in one short sentence.",
        ),
        llm=BaseOpenAI(model_name="gpt-4o-mini"),
        config=AgentConfig(execution_mode=ExecutionMode.DIRECT),
    )

    await agent.initialize()
    result = await agent.execute(
        Task(id="hello-1", objective="What is the capital of France?"),
    )
    print(result.output)


asyncio.run(main())

See the Quickstart docs for provider setup, .env loading, tools, streaming, and structured output.

Install other stable packages

# Google Gemini
pip install nucleusiq nucleusiq-gemini

# Anthropic Claude
pip install nucleusiq nucleusiq-anthropic

# Groq inference
pip install nucleusiq nucleusiq-groq

# Ollama for local / remote models
pip install nucleusiq nucleusiq-ollama

# MCP tool adapter — plug any MCP server in as a tool
pip install nucleusiq-mcp nucleusiq-anthropic   # or any provider

# uv works too
uv pip install nucleusiq nucleusiq-openai

Hello agent + MCP tools

import asyncio

from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.agents.task import Task
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq_anthropic import BaseAnthropic
from nucleusiq_mcp import MCPTool


async def main() -> None:
    agent = Agent(
        name="researcher",
        prompt=ZeroShotPrompt().configure(
            system="You are a careful research assistant. Cite source ids when available.",
        ),
        llm=BaseAnthropic(model_name="claude-sonnet-4-5-20250929", async_mode=True),
        tools=[
            # Transport is auto-detected from URL / command; auth is auto-wired.
            MCPTool("npx -y @modelcontextprotocol/server-github"),
            MCPTool("https://mcp.slack.com/api", auth="xoxb-..."),
        ],
        config=AgentConfig(execution_mode=ExecutionMode.STANDARD, enable_tracing=True),
    )

    await agent.initialize()  # connects to MCP servers and discovers tools
    result = await agent.execute(
        Task(
            id="repo-summary",
            objective="Summarise the last 5 issues in nucleusbox/NucleusIQ.",
        ),
    )
    print(result.output)


asyncio.run(main())

See INSTALLATION.md for full setup instructions (pip, uv, development mode).

🧩 What's Inside

Component	What it does
3 Execution Modes	`DIRECT` (single call), `STANDARD` (tool loop), `AUTONOMOUS` (orchestration + validation + retry)
Streaming	`execute_stream()` — real-time token-by-token output with tool call visibility across all modes
7 Prompt Techniques	ZeroShot, FewShot, ChainOfThought, AutoCoT, RAG, PromptComposer, MetaPrompt
Multimodal Attachments	7 attachment types (text, PDF, images, files) with provider-native optimisation
Built-in File Tools	`FileReadTool`, `FileSearchTool`, `DirectoryListTool`, `FileExtractTool` — sandboxed to workspace
Tool System	`BaseTool` interface + `@tool` decorator + provider native tools (OpenAI: code_interpreter, file_search, web_search; Gemini: Google Search, Code Execution, URL Context, Maps; Anthropic: web_search, web_fetch, code_execution + extended thinking)
MCP Tool Adapter	Connect any Model Context Protocol server (Slack, GitHub, Postgres, Stripe, …) as native tools — stdio + Streamable HTTP + SSE; OAuth/Bearer/Env auth
Memory	5 strategies (full history, sliding window, summary, summary+window, token budget) with file-aware metadata
Plugins	10 built-in: call limits, retry, fallback, PII guard, human approval, tool guard, attachment guard, context window, result validator
Usage Tracking	Token usage per call with purpose tagging (main, planning, tool loop, critic, refiner) and cost estimation
Structured Output	Schema-based output parsing with Pydantic, dataclass, TypedDict support
Observability	`ExecutionTracer` records every model call + tool call with `source` attribution (e.g. `mcp://server=github`)
Provider Portability	Swap providers (OpenAI, Gemini, Anthropic, Groq, Ollama, …) with one line — same agent code, same tools, same plugins

🧠 Context Management

Tool-heavy agents fail when every tool result stays in the active prompt forever. NucleusIQ treats context as a managed runtime resource:

ContextEngine.prepare() runs before LLM calls, not after the provider rejects an oversized prompt.
ContextLedger tracks prompt regions (system, user, assistant, tool calls, tool results) so the framework can compact the right thing first.
Large tool results can be masked / offloaded while staying recoverable through recall.
AgentResult.context_telemetry reports peak utilization, compaction events, tokens saved, and estimated savings.

See the context management guide and the observability guide.

Try the runnable Agent Engineering Challenge 01 to test context pressure, noisy tool outputs, and evidence quality on a concrete task.

⚙️ Execution Modes

NucleusIQ agents use the Gearbox Strategy — three execution modes that scale from simple chat to autonomous reasoning:

Capability	Direct	Standard	Autonomous
Memory	Yes	Yes	Yes
Plugins	Yes	Yes	Yes
Tools	Yes (max 25)	Yes (max 80)	Yes (max 300)
Tool loop	Yes	Yes	Yes
Task decomposition	No	No	Yes
Independent verification (Critic)	No	No	Yes
Targeted correction (Refiner)	No	No	Yes
Validation pipeline	No	No	Yes

Tool limits are configurable via AgentConfig(max_tool_calls=N). The framework validates tool count at agent creation and raises a clear error if the limit is exceeded.

# Direct: fast Q&A, simple lookups (max 25 tool calls)
AgentConfig(execution_mode=ExecutionMode.DIRECT)

# Standard: multi-step tool workflows (max 80 tool calls) — default
AgentConfig(execution_mode=ExecutionMode.STANDARD)

# Autonomous: orchestration + Critic/Refiner verification (max 300 tool calls)
AgentConfig(execution_mode=ExecutionMode.AUTONOMOUS)

See the PE Due Diligence notebook for a real-world demo of Autonomous mode achieving 100% accuracy on 8 complex financial analyses with external validation.

📦 Packages & Ecosystem

NucleusIQ ships as a core framework + thin provider/tool packages. Install only what you need — every package can be added or removed independently.

Core

Package	Status	Version	Description
`nucleusiq`	🟢 Stable	`0.7.12`	Core framework: agents, prompts, tools, memory, plugins, modes, tracing

LLM Providers

Package	Status	Version	Description
`nucleusiq-openai`	🟢 Stable	`0.7.0`	OpenAI (gpt-4o, o-series); Responses API + Chat Completions; native `code_interpreter`, `file_search`, `web_search` — now surfaces `server_tool_calls` for tracer-side cost split
`nucleusiq-gemini`	🟢 Stable	`0.3.0`	Google Gemini; native Google Search + Code Execution emitted as `ToolCallRecord(executed_by="provider")`; URL Context, Maps grounding
`nucleusiq-anthropic`	🟢 Stable	`0.2.0`	Anthropic Claude (Messages API); native server tools (`AnthropicTool.web_search()` / `web_fetch()` / `code_execution()` w/ auto-`anthropic-beta`), prompt caching (`cache_tools` / `cache_system`), extended thinking (`thinking="low"\|"medium"\|"high"\|"max"`), server-tool observability · README

Inference Backends

Package	Status	Version	Description
`nucleusiq-groq`	🟢 Stable	`0.1.0`	Groq inference (Chat Completions) via official `groq` SDK; hosted-tool observability stub (`message.executed_tools` → `server_tool_calls`) · README · Guide
`nucleusiq-ollama`	🟢 Stable	`0.2.0`	Local/remote Ollama via official `ollama` SDK; vision wire for OpenAI-style multimodal messages; structured output, `think` pass-through · README · Guide

Tool Adapters

Package	Status	Version	Description
`nucleusiq-mcp`	🟢 Stable	`0.1.0`	Model Context Protocol adapter — turn any MCP server (Slack, GitHub, Postgres, Stripe, …) into NucleusIQ tools; stdio + Streamable HTTP + SSE; OAuth 2.1 / Bearer / Env auth · README · Guide

Maturity legend: 🟢 Stable (production-ready, SemVer guarantees). Future pre-release packages may use 🟡 Beta / 🟠 Alpha while they mature.

🗂️ Project Structure

src/
  nucleusiq/core/                # Core framework (agents, prompts, tools, memory, plugins, modes, tracing)
  providers/
    llms/
      openai/                    # nucleusiq-openai
      gemini/                    # nucleusiq-gemini
      anthropic/                 # nucleusiq-anthropic
    inference/
      groq/                      # nucleusiq-groq
      ollama/                    # nucleusiq-ollama
    tools/
      mcp/                       # nucleusiq-mcp (Model Context Protocol adapter)
notebooks/agents/                # Example notebooks (PE due diligence, MCP showcase, …)
docs/                            # Internal design/strategy docs (published docs live in nucleusiq-docs)
scripts/                         # Repo-wide tooling (e.g. verify_core_package_layout.py)

🧪 Testing

# Monorepo: verify core setuptools packages + all Hatch provider/tool wheel roots
python scripts/verify_core_package_layout.py

# Core tests (1,795+ passing)
cd src/nucleusiq && python -m pytest tests/ -q

# OpenAI provider tests (224 passing)
cd src/providers/llms/openai && python -m pytest tests/ -q

# Gemini provider unit tests (221 passing)
cd src/providers/llms/gemini && python -m pytest tests/unit/ -q

# Anthropic provider tests (>=95% coverage gate)
cd src/providers/llms/anthropic && python -m pytest tests/ -q

# Groq provider tests (requires dev group / uv; >=90% coverage gate)
cd src/providers/inference/groq && uv run pytest -q

# Ollama provider tests (>=95% coverage gate; 100% line coverage on package)
cd src/providers/inference/ollama && uv run pytest -q

# MCP tool adapter — unit (235 passing; 98.68% coverage; >=90% gate)
cd src/providers/tools/mcp && python -m pytest tests/unit/ -q -m "not integration"

# MCP tool adapter — live integration (requires Node.js + npx)
cd src/providers/tools/mcp && python -m pytest tests/integration/ -m integration -v

# Gemini integration tests (requires GEMINI_API_KEY)
cd src/providers/llms/gemini && python -m pytest tests/integration/ -q

📚 Documentation

Published docs — https://nucleusbox.github.io/nucleusiq-docs/
Docs repository — https://github.com/nucleusbox/nucleusiq-docs
INSTALLATION.md — Setup instructions (pip, uv, development)
CHANGELOG.md — Release notes
RELEASE.md — Release process and branching strategy
v0.7.12 release notes — latest stable release summary
Provider guides — OpenAI, Gemini, Anthropic, Groq, Ollama, MCP
MCP integration guide — MCP adapter usage
File handling guide — Attachment vs Tool vs Both decision guide

🤝 Contributing

Fork the repository
Create a branch: git checkout -b yourname/my-feature main
Make your changes and add tests
Submit a pull request to main

See CONTRIBUTING.md for full details, coding standards, and the dev-setup walkthrough.

Get in touch

🐛 Bugs & feature requests — GitHub Issues
💬 Questions & ideas — GitHub Discussions
⭐ If NucleusIQ is useful to you, please consider starring the repo — it helps a lot.

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
.github		.github
assets/images		assets/images
notebooks/agents		notebooks/agents
scripts		scripts
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTION_OPPORTUNITIES.md		CONTRIBUTION_OPPORTUNITIES.md
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
NucleusIQ_Philosphy.md		NucleusIQ_Philosphy.md
README.md		README.md
RELEASE.md		RELEASE.md
RELEASE_PLAN_v070.md		RELEASE_PLAN_v070.md
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agents that survive production.

🗺️ Agent coverage map

Why Star NucleusIQ?

✨ What's New

What Is NucleusIQ?

NucleusIQ Philosophy

🚀 Quick Start

Fastest path

Hello agent

Install other stable packages

Hello agent + MCP tools

🧩 What's Inside

🧠 Context Management

⚙️ Execution Modes

📦 Packages & Ecosystem

Core

LLM Providers

Inference Backends

Tool Adapters

🗂️ Project Structure

🧪 Testing

📚 Documentation

🤝 Contributing

Get in touch

📄 License

About

Uh oh!

Releases 21

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Agents that survive production.

🗺️ Agent coverage map

Why Star NucleusIQ?

✨ What's New

What Is NucleusIQ?

NucleusIQ Philosophy

🚀 Quick Start

Fastest path

Hello agent

Install other stable packages

Hello agent + MCP tools

🧩 What's Inside

🧠 Context Management

⚙️ Execution Modes

📦 Packages & Ecosystem

Core

LLM Providers

Inference Backends

Tool Adapters

🗂️ Project Structure

🧪 Testing

📚 Documentation

🤝 Contributing

Get in touch

📄 License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages