Skip to content

nucleusbox/NucleusIQ

NucleusIQ logo

Agents that survive production.

Open-source, agent-first Python framework for building AI agents that hold up under real, long, tool-heavy work — reliable, observable, and portable across LLM providers. Not a coding copilot. Not a chatbot. The framework you build serious agents with.

CI CodeQL License: MIT Python versions PyPI downloads per month Docs PRs welcome

nucleusiq nucleusiq-openai nucleusiq-gemini nucleusiq-anthropic nucleusiq-groq nucleusiq-ollama nucleusiq-mcp

GitHub stars GitHub forks GitHub issues GitHub PRs

Quick Start · Features · Execution Modes · Packages · Docs · Changelog · Release Notes · Agent coverage


If NucleusIQ helps you build maintainable Python agents, please ⭐ star the repo so other engineers can find it while the project is still early.

🗺️ Agent coverage map

NucleusIQ Agent Coverage — what we ship today vs planned (May 2026, audited on v0.7.12)

Interactive scorecard on Nucleusbox: every major agent type (research, multi-agent, coding, and more) with support today vs planned (v0.8–v0.9) — radars, category tables, 28-pattern matrix, and gap analysis in one page.

Today (shipped) Planned (roadmap)
3 Gearbox modes (Direct / Standard / Autonomous) Agent-as-Tool, structured sub-agent handoff
Context engine + telemetry A2A (thin adapter)
Parallel sub-agents + Critic/Refiner Public benchmark scorecards & Context Report
File tools + MCP 0.1.0 Stable Optional shell tool for coding/terminal benches

Best fit now: research and analysis agents. Honest gaps: no LLM-callable AgentTool yet, no published benchmark proof, sub-agent synthesis capped at 2K chars.

Why Star NucleusIQ?

  • Agent-first Python runtime — build with Agent, Task, @tool, and typed results instead of chains, graphs, or a custom DSL.
  • Stable provider ecosystem — OpenAI, Gemini, Anthropic, Groq, Ollama, MCP, and Mock LLM aligned on nucleusiq>=0.7.12.
  • Context management before overflow — NucleusIQ can compact / mask / recall before the LLM API rejects an oversized prompt.
  • Production posture — 3,700+ tests across the monorepo, provider-agnostic observability, usage tracking, plugins, and clear execution modes.

✨ What's New

nucleusiq 0.7.12 — May 2026 Coordinated multi-package release promoting every alpha/beta provider to its first stable line: nucleusiq-anthropic 0.2.0 Stable (Phase B feature-complete: native server tools, prompt caching, extended thinking, server-tool observability) · nucleusiq-ollama 0.2.0 Stable (vision wire + provider="ollama" enrichment) · nucleusiq-groq 0.1.0 Stable (hosted-tool observability stub + enrichment) · nucleusiq-mcp 0.1.0 Stable (drops b1; no API changes) · nucleusiq-openai 0.7.0 + nucleusiq-gemini 0.3.0 (native-tool observability + enrichment). Plus cross-cutting provider-agnostic native-tool observability in core: ToolCallRecord.executed_by ∈ {"local","provider"}, LLMCallRecord.provider / request_id / organization_id / stop_reason / cache_read_input_tokens / cache_creation_input_tokens / metadata.

  • 🧩 MCP Tool Adapter (Stable) — Plug any Model Context Protocol server (Slack, GitHub, Postgres, Stripe, …) into a NucleusIQ Agent in one line. Supports stdio + Streamable HTTP + SSE, OAuth 2.1 / Bearer / Env auth, graceful degradation, health checks, and full source-attributed tracing. 98.68% coverage, 235 unit + 13 live integration tests.
  • 🪝 Core ExpandableTool protocol — Any tool factory (like MCPTool) can expand into many BaseTool instances during Agent.initialize() — without the core knowing what MCP is.
  • 🔭 ToolCallRecord.source — Telemetry now records the origin of every tool call (e.g. mcp://server=github (path=A)).
  • 🛡️ Parallel-safe initializationAgent.initialize() cleans up all expandable tools even when peers fail or the process is cancelled.

See CHANGELOG.md for the full release notes.


What Is NucleusIQ?

NucleusIQ is an open-source, agent-first Python framework for building AI agents that survive production — long runs, messy tool output, and real stakes — instead of falling apart the moment they leave the demo.

In one line:

Every AI agent works in the demo. Almost none survive production. NucleusIQ is the framework for the ones that have to — reliable, observable, and provider-portable.

Why "survive production"? Most agents work in a demo, then forget what they read, choke when tool output piles up, or fail silently with no explanation. NucleusIQ manages the context window before the model breaks, runs across any LLM provider, and returns a scorecard of exactly what happened on every run.

NucleusIQ is built on a simple belief:

An agent is not a single model call. An agent is a managed runtime with memory, tools, policy, streaming, structure, and responsibilities.

NucleusIQ Philosophy

A shared doctrine for what NucleusIQ stands for, why it exists, and how it should evolve over time.

See NucleusIQ Philosophy.


🚀 Quick Start

Fastest path

pip install nucleusiq nucleusiq-openai
export OPENAI_API_KEY=sk-...

Hello agent

import asyncio

from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.agents.task import Task
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq_openai import BaseOpenAI


async def main() -> None:
    agent = Agent(
        name="analyst",
        prompt=ZeroShotPrompt().configure(
            system="You are a concise assistant. Answer in one short sentence.",
        ),
        llm=BaseOpenAI(model_name="gpt-4o-mini"),
        config=AgentConfig(execution_mode=ExecutionMode.DIRECT),
    )

    await agent.initialize()
    result = await agent.execute(
        Task(id="hello-1", objective="What is the capital of France?"),
    )
    print(result.output)


asyncio.run(main())

See the Quickstart docs for provider setup, .env loading, tools, streaming, and structured output.

Install other stable packages

# Google Gemini
pip install nucleusiq nucleusiq-gemini

# Anthropic Claude
pip install nucleusiq nucleusiq-anthropic

# Groq inference
pip install nucleusiq nucleusiq-groq

# Ollama for local / remote models
pip install nucleusiq nucleusiq-ollama

# MCP tool adapter — plug any MCP server in as a tool
pip install nucleusiq-mcp nucleusiq-anthropic   # or any provider

# uv works too
uv pip install nucleusiq nucleusiq-openai

Hello agent + MCP tools

import asyncio

from nucleusiq.agents import Agent
from nucleusiq.agents.config import AgentConfig, ExecutionMode
from nucleusiq.agents.task import Task
from nucleusiq.prompts.zero_shot import ZeroShotPrompt
from nucleusiq_anthropic import BaseAnthropic
from nucleusiq_mcp import MCPTool


async def main() -> None:
    agent = Agent(
        name="researcher",
        prompt=ZeroShotPrompt().configure(
            system="You are a careful research assistant. Cite source ids when available.",
        ),
        llm=BaseAnthropic(model_name="claude-sonnet-4-5-20250929", async_mode=True),
        tools=[
            # Transport is auto-detected from URL / command; auth is auto-wired.
            MCPTool("npx -y @modelcontextprotocol/server-github"),
            MCPTool("https://mcp.slack.com/api", auth="xoxb-..."),
        ],
        config=AgentConfig(execution_mode=ExecutionMode.STANDARD, enable_tracing=True),
    )

    await agent.initialize()  # connects to MCP servers and discovers tools
    result = await agent.execute(
        Task(
            id="repo-summary",
            objective="Summarise the last 5 issues in nucleusbox/NucleusIQ.",
        ),
    )
    print(result.output)


asyncio.run(main())

See INSTALLATION.md for full setup instructions (pip, uv, development mode).


🧩 What's Inside

Component What it does
3 Execution Modes DIRECT (single call), STANDARD (tool loop), AUTONOMOUS (orchestration + validation + retry)
Streaming execute_stream() — real-time token-by-token output with tool call visibility across all modes
7 Prompt Techniques ZeroShot, FewShot, ChainOfThought, AutoCoT, RAG, PromptComposer, MetaPrompt
Multimodal Attachments 7 attachment types (text, PDF, images, files) with provider-native optimisation
Built-in File Tools FileReadTool, FileSearchTool, DirectoryListTool, FileExtractTool — sandboxed to workspace
Tool System BaseTool interface + @tool decorator + provider native tools (OpenAI: code_interpreter, file_search, web_search; Gemini: Google Search, Code Execution, URL Context, Maps; Anthropic: web_search, web_fetch, code_execution + extended thinking)
MCP Tool Adapter Connect any Model Context Protocol server (Slack, GitHub, Postgres, Stripe, …) as native tools — stdio + Streamable HTTP + SSE; OAuth/Bearer/Env auth
Memory 5 strategies (full history, sliding window, summary, summary+window, token budget) with file-aware metadata
Plugins 10 built-in: call limits, retry, fallback, PII guard, human approval, tool guard, attachment guard, context window, result validator
Usage Tracking Token usage per call with purpose tagging (main, planning, tool loop, critic, refiner) and cost estimation
Structured Output Schema-based output parsing with Pydantic, dataclass, TypedDict support
Observability ExecutionTracer records every model call + tool call with source attribution (e.g. mcp://server=github)
Provider Portability Swap providers (OpenAI, Gemini, Anthropic, Groq, Ollama, …) with one line — same agent code, same tools, same plugins

🧠 Context Management

Tool-heavy agents fail when every tool result stays in the active prompt forever. NucleusIQ treats context as a managed runtime resource:

  • ContextEngine.prepare() runs before LLM calls, not after the provider rejects an oversized prompt.
  • ContextLedger tracks prompt regions (system, user, assistant, tool calls, tool results) so the framework can compact the right thing first.
  • Large tool results can be masked / offloaded while staying recoverable through recall.
  • AgentResult.context_telemetry reports peak utilization, compaction events, tokens saved, and estimated savings.

See the context management guide and the observability guide.

Try the runnable Agent Engineering Challenge 01 to test context pressure, noisy tool outputs, and evidence quality on a concrete task.


⚙️ Execution Modes

NucleusIQ agents use the Gearbox Strategy — three execution modes that scale from simple chat to autonomous reasoning:

Capability Direct Standard Autonomous
Memory Yes Yes Yes
Plugins Yes Yes Yes
Tools Yes (max 25) Yes (max 80) Yes (max 300)
Tool loop Yes Yes Yes
Task decomposition No No Yes
Independent verification (Critic) No No Yes
Targeted correction (Refiner) No No Yes
Validation pipeline No No Yes

Tool limits are configurable via AgentConfig(max_tool_calls=N). The framework validates tool count at agent creation and raises a clear error if the limit is exceeded.

# Direct: fast Q&A, simple lookups (max 25 tool calls)
AgentConfig(execution_mode=ExecutionMode.DIRECT)

# Standard: multi-step tool workflows (max 80 tool calls) — default
AgentConfig(execution_mode=ExecutionMode.STANDARD)

# Autonomous: orchestration + Critic/Refiner verification (max 300 tool calls)
AgentConfig(execution_mode=ExecutionMode.AUTONOMOUS)

See the PE Due Diligence notebook for a real-world demo of Autonomous mode achieving 100% accuracy on 8 complex financial analyses with external validation.


📦 Packages & Ecosystem

NucleusIQ ships as a core framework + thin provider/tool packages. Install only what you need — every package can be added or removed independently.

Core

Package Status Version Description
nucleusiq 🟢 Stable 0.7.12 Core framework: agents, prompts, tools, memory, plugins, modes, tracing

LLM Providers

Package Status Version Description
nucleusiq-openai 🟢 Stable 0.7.0 OpenAI (gpt-4o, o-series); Responses API + Chat Completions; native code_interpreter, file_search, web_search — now surfaces server_tool_calls for tracer-side cost split
nucleusiq-gemini 🟢 Stable 0.3.0 Google Gemini; native Google Search + Code Execution emitted as ToolCallRecord(executed_by="provider"); URL Context, Maps grounding
nucleusiq-anthropic 🟢 Stable 0.2.0 Anthropic Claude (Messages API); native server tools (AnthropicTool.web_search() / web_fetch() / code_execution() w/ auto-anthropic-beta), prompt caching (cache_tools / cache_system), extended thinking (thinking="low"|"medium"|"high"|"max"), server-tool observability · README

Inference Backends

Package Status Version Description
nucleusiq-groq 🟢 Stable 0.1.0 Groq inference (Chat Completions) via official groq SDK; hosted-tool observability stub (message.executed_toolsserver_tool_calls) · README · Guide
nucleusiq-ollama 🟢 Stable 0.2.0 Local/remote Ollama via official ollama SDK; vision wire for OpenAI-style multimodal messages; structured output, think pass-through · README · Guide

Tool Adapters

Package Status Version Description
nucleusiq-mcp 🟢 Stable 0.1.0 Model Context Protocol adapter — turn any MCP server (Slack, GitHub, Postgres, Stripe, …) into NucleusIQ tools; stdio + Streamable HTTP + SSE; OAuth 2.1 / Bearer / Env auth · README · Guide

Maturity legend: 🟢 Stable (production-ready, SemVer guarantees). Future pre-release packages may use 🟡 Beta / 🟠 Alpha while they mature.


🗂️ Project Structure

src/
  nucleusiq/core/                # Core framework (agents, prompts, tools, memory, plugins, modes, tracing)
  providers/
    llms/
      openai/                    # nucleusiq-openai
      gemini/                    # nucleusiq-gemini
      anthropic/                 # nucleusiq-anthropic
    inference/
      groq/                      # nucleusiq-groq
      ollama/                    # nucleusiq-ollama
    tools/
      mcp/                       # nucleusiq-mcp (Model Context Protocol adapter)
notebooks/agents/                # Example notebooks (PE due diligence, MCP showcase, …)
docs/                            # Internal design/strategy docs (published docs live in nucleusiq-docs)
scripts/                         # Repo-wide tooling (e.g. verify_core_package_layout.py)

🧪 Testing

# Monorepo: verify core setuptools packages + all Hatch provider/tool wheel roots
python scripts/verify_core_package_layout.py

# Core tests (1,795+ passing)
cd src/nucleusiq && python -m pytest tests/ -q

# OpenAI provider tests (224 passing)
cd src/providers/llms/openai && python -m pytest tests/ -q

# Gemini provider unit tests (221 passing)
cd src/providers/llms/gemini && python -m pytest tests/unit/ -q

# Anthropic provider tests (>=95% coverage gate)
cd src/providers/llms/anthropic && python -m pytest tests/ -q

# Groq provider tests (requires dev group / uv; >=90% coverage gate)
cd src/providers/inference/groq && uv run pytest -q

# Ollama provider tests (>=95% coverage gate; 100% line coverage on package)
cd src/providers/inference/ollama && uv run pytest -q

# MCP tool adapter — unit (235 passing; 98.68% coverage; >=90% gate)
cd src/providers/tools/mcp && python -m pytest tests/unit/ -q -m "not integration"

# MCP tool adapter — live integration (requires Node.js + npx)
cd src/providers/tools/mcp && python -m pytest tests/integration/ -m integration -v

# Gemini integration tests (requires GEMINI_API_KEY)
cd src/providers/llms/gemini && python -m pytest tests/integration/ -q

📚 Documentation


🤝 Contributing

  1. Fork the repository
  2. Create a branch: git checkout -b yourname/my-feature main
  3. Make your changes and add tests
  4. Submit a pull request to main

See CONTRIBUTING.md for full details, coding standards, and the dev-setup walkthrough.

Get in touch

  • 🐛 Bugs & feature requests — GitHub Issues
  • 💬 Questions & ideas — GitHub Discussions
  • ⭐ If NucleusIQ is useful to you, please consider starring the repo — it helps a lot.

📄 License

MIT © Nucleusbox

About

NucleusIQ is an open-source Agent first framework

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors