Skip to content

darshjme/arsenal

arsenal — the complete production stack for LLM agents

arsenal

tools built in silence, for everyone who builds AI

100 Libraries 4375 Tests Zero Dependencies MIT License Python 3.8+


There is a kind of building that does not require recognition.
That gives tools to strangers who will never know your name,
because the alternative — not building — is worse.


Why Agents Fail in Production

You ship an agent. It works in your notebook. Then reality happens:

Failure Mode What Goes Wrong
🔀 Routing costs Every query hits the expensive model — you pay GPT-4 prices for "what's the weather"
🧠 No memory Each turn starts from zero — your agent forgets what it said three messages ago
♾️ Infinite loops Tool calls chain forever, cost explodes, timeouts crash the whole pipeline
No evaluation You have no idea if the agent is actually doing its job
💥 Bad output Responses fail schema validation, leak PII, or need 5 retries before they're usable
👁️ Blind runs Production breaks and you have zero traces, zero spans, zero idea what happened
💸 No budgets Token costs spiral — no per-run caps, no cost ledger, no circuit breakers
🔄 No recovery A crash mid-run loses all state — no checkpoint, no resume, no rollback
🗳️ Single agent bias One model, one opinion — no consensus, no majority vote, no cross-check
🌊 No flow control Queue overflows, queues block, nothing back-pressures, nothing throttles

arsenal is the fix. 86 surgically targeted libraries. Zero lock-in. Zero external dependencies.


The Production Lifecycle

Route → Budget → Guard → Remember → Compress → Observe → Evaluate → Validate
→ Retry → State → Schema → Pipeline → Cache → Config → Events → Health
→ RateLimit → CircuitBreak → Stream → Benchmark → Router → Notifier → Pool
→ Discovery → Checkpoint → Queue → Planner → Limiter → Sandbox → Session
→ Audit → Consensus → Fallback → Throttle → Workflow → Trace → Saga
→ Signal → Lock → Telemetry → Serializer → Hook → Filter → Command

Each step in this lifecycle is a library. Each library does one thing. None of them know the others exist.


Libraries

🔀 Routing & Dispatch

Library What it does Install
herald Semantic routing without a single token — directs queries to the right model using embeddings pip install herald
agent-router Runtime dispatch — routes agent actions to the correct handler pip install agent-router
agent-dispatcher Fan-out routing — broadcasts tasks to multiple downstream agents pip install agent-dispatcher
agent-classifier Input classification — labels incoming queries before routing pip install agent-classifier
agent-selector Candidate selection — picks the best model, tool, or handler from a pool pip install agent-selector
agent-balancer Load balancing for LLM endpoints — RoundRobin, WeightedRandom, LeastConnections with health tracking and automatic failover pip install agent-balancer

🧠 Memory & Context

Library What it does Install
engram Short-term + episodic memory — context windows that actually persist across turns pip install engram
agent-context Context management — builds, trims, and injects context into prompts pip install agent-context
agent-context-window Window sizing — manages token limits and context eviction strategies pip install agent-context-window
agent-state Agent state machine — models agent lifecycle as explicit state transitions pip install agent-state
agent-store Key-value state storage — persistent agent memory with TTL and namespacing pip install agent-store
agent-kv In-process key-value store — fast ephemeral state for single-run agents pip install agent-kv
agent-cache Response caching — caches LLM outputs by prompt hash to cut costs pip install agent-cache
agent-snapshot State snapshots — point-in-time captures of full agent state pip install agent-snapshot

🛡️ Guarding & Safety

Library What it does Install
sentinel Loop detection, cost caps, timeout guards — stops runaway agents before they bankrupt you pip install sentinel
agent-guard Generic guardrails — pre/post-execution safety checks for any agent action pip install agent-guard
agent-guardrails Output safety — schema enforcement, PII redaction, retry logic pip install agent-guardrails
agent-sandbox Restricted code execution — runs agent-generated code in a confined namespace pip install agent-sandbox
agent-rate-limiter Rate limiting — token-bucket and sliding-window rate limits per agent or API key pip install agent-rate-limiter
agent-circuit-breaker Circuit breaking — opens on repeated failures, recovers gracefully pip install agent-circuit-breaker
agent-policy Policy enforcement — declarative rules for what agents can and cannot do pip install agent-policy
agent-semaphore Concurrency limits — caps parallel agent executions to prevent overload pip install agent-semaphore

📊 Evaluation & Benchmarking

Library What it does Install
verdict 3D evaluation — task completion, reasoning quality, tool-use correctness in one call pip install verdict
agent-scorer Scoring pipelines — pluggable scoring criteria for any agent output pip install agent-scorer
agent-benchmark Performance benchmarking — latency, throughput, and accuracy tracking pip install agent-benchmark
agent-specs Specification testing — test agents against behavioral specs, not just outputs pip install agent-specs
agent-profiler Profiling — identifies bottlenecks in agent execution paths pip install agent-profiler

✅ Validation & Schema

Library What it does Install
agent-validator Output validation — validates LLM responses against typed schemas pip install agent-validator
agent-schema Schema enforcement — defines and enforces structured output contracts pip install agent-schema
agent-formatter Output formatting — normalizes agent responses to consistent formats pip install agent-formatter
agent-template Prompt templates — typed, composable templates with variable injection pip install agent-template
agent-serializer Serialization — converts agent state and outputs to/from wire formats pip install agent-serializer
agent-converter Type conversion — transforms data between agent-compatible formats pip install agent-converter

👁️ Observability & Tracing

Library What it does Install
agent-observability Full-pipeline tracing — spans, latency, token cost, JSONL export for every run pip install agent-observability
agent-observer Event observation — subscribes to agent lifecycle events without modifying them pip install agent-observer
agent-trace Distributed tracing — propagates trace context across agent boundaries pip install agent-trace
agent-tracer Span tracking — creates, manages, and exports spans in any tracing format pip install agent-tracer
agent-telemetry Counter/Gauge/Histogram with percentile tracking — production metrics for agents pip install agent-telemetry
agent-log Structured logging — JSON-first logging with trace correlation pip install agent-log
agent-logger Log management — routing, filtering, and formatting agent log streams pip install agent-logger

🔁 Checkpointing & Recovery

Library What it does Install
agent-checkpoint Save state mid-run, resume from the last good step — no restarting from scratch pip install agent-checkpoint
agent-saga Distributed rollback with compensating actions — handles partial failures gracefully pip install agent-saga
agent-retry Retry policies — exponential backoff, jitter, max-attempts, per-exception rules pip install agent-retry
agent-fallback Graceful degradation — defines fallback chains when primary agents fail pip install agent-fallback

🗳️ Consensus & Coordination

Library What it does Install
agent-consensus Weighted majority vote across multiple agents — resolves disagreements systematically pip install agent-consensus
agent-coordinator Multi-agent coordination — orchestrates agent roles, dependencies, and handoffs pip install agent-coordinator
agent-aggregator Result aggregation — merges outputs from parallel agents into a single result pip install agent-aggregator
agent-reducer Output reduction — folds multiple agent responses into a minimal representation pip install agent-reducer

📋 Planning & Workflow

Library What it does Install
agent-planner Task planning — decomposes goals into executable step sequences pip install agent-planner
agent-workflow Workflow orchestration — defines multi-step agent workflows with branching pip install agent-workflow
agent-pipeline Pipeline construction — chains agents into composable processing pipelines pip install agent-pipeline
agent-flow Control flow — conditional branching, loops, and parallel execution for agents pip install agent-flow
agent-scheduler Task scheduling — time-based and trigger-based agent execution pip install agent-scheduler
agent-queue Task queuing — priority queues with backpressure for agent workloads pip install agent-queue

💸 Budgeting & Throttling

Library What it does Install
agent-budget Token + cost budgets per run — stops agents when spend limits are hit pip install agent-budget
agent-limiter Token budget + cost budget + rate limiting in one — the three-in-one spend enforcer pip install agent-limiter
agent-throttle Request throttling — smooths bursts to protect downstream APIs pip install agent-throttle
agent-ledger Cost ledger — tracks token spend per agent, per session, per user pip install agent-ledger
agent-audit Spend auditing — immutable log of every cost event for compliance and debugging pip install agent-audit

📡 Events & Signals

Library What it does Install
agent-events Event bus — pub/sub for agent lifecycle and state events pip install agent-events
agent-event-sourcing Event sourcing — rebuilds agent state from an append-only event log pip install agent-event-sourcing
agent-signal Signal handling — async signals for cross-agent communication pip install agent-signal
agent-hook Lifecycle hooks — pre/post hooks for any agent execution step pip install agent-hook
agent-notifier Notifications — fires alerts on agent failures, thresholds, or completions pip install agent-notifier
agent-pubsub Publish-subscribe messaging — decoupled topic-based communication between agents pip install agent-pubsub

🏥 Session & Lifecycle

Library What it does Install
agent-session Session management — scopes state, memory, and cost to a single user interaction pip install agent-session
agent-health Health checks — liveness and readiness probes for agent services pip install agent-health
agent-watchdog Crash recovery — monitors agents and restarts on unexpected failures pip install agent-watchdog
agent-timer Time management — deadlines, timeouts, and elapsed-time tracking pip install agent-timer
agent-lock Distributed locking — prevents concurrent agents from corrupting shared state pip install agent-lock

🔄 Data & Transformation

Library What it does Install
agent-mapper Data mapping — transforms agent I/O between different schemas pip install agent-mapper
agent-extractor Extraction — pulls structured data out of unstructured LLM outputs pip install agent-extractor
agent-digest Summarization and hashing — condenses long outputs and fingerprints content pip install agent-digest
agent-tokenizer Token counting — accurate token estimation across models without calling the API pip install agent-tokenizer
agent-metadata Metadata management — attaches and propagates structured metadata through pipelines pip install agent-metadata
agent-filter Data filtering — applies inclusion/exclusion rules to agent inputs and outputs pip install agent-filter
agent-sampler Data sampling — draws representative subsets from large agent datasets pip install agent-sampler
agent-compress Context compression — shrinks large agent payloads while preserving semantic content pip install agent-compress

⚙️ Configuration & Discovery

Library What it does Install
agent-config Configuration management — typed config loading with environment overrides pip install agent-config
agent-secrets Secret management — secure loading and rotation of API keys and credentials pip install agent-secrets
agent-plugin Plugin system — extensible registry for adding capabilities to agents pip install agent-plugin
agent-discovery Service discovery — locates agents and tools dynamically at runtime pip install agent-discovery
agent-pool Resource pooling — manages reusable agent instances to cut cold-start overhead pip install agent-pool
agent-dependency Dependency management — explicit dependency graphs between agent components pip install agent-dependency
agent-tools Tool registry — central registry for all tools available to an agent pip install agent-tools

🌊 Streaming & Commands

Library What it does Install
agent-stream Streaming responses — handles SSE and chunked streaming from LLMs pip install agent-stream
agent-command Command pattern — encapsulates agent actions as undoable, inspectable commands pip install agent-command

🧪 Testing & Mocking

Library What it does Install
agent-mock Testing mocks — drop-in mocks for LLM APIs, tools, and agent components pip install agent-mock

Quick Start

# Just routing — nothing else
from herald import Router
router = Router()
router.add("gpt-4", description="complex reasoning, analysis")
router.add("gpt-3.5-turbo", description="simple queries, fast responses")
result = router.route("what is 2 + 2?")  # → gpt-3.5-turbo, 0 tokens used

# Just memory — nothing else
from engram import Memory
mem = Memory()
mem.store("user said X")
context = mem.recall("what did user say?")

# Just guards — nothing else
from sentinel import Guard
guard = Guard(max_loops=5, max_cost_usd=0.10)
guard.check(loop_count=3, cost_usd=0.04)  # ok
guard.check(loop_count=6, cost_usd=0.04)  # raises LoopLimitError

# Just evaluation — nothing else
from verdict import Evaluator
score = Evaluator().evaluate(task="summarize this", response=llm_output)
print(score.task, score.reasoning, score.tool_use)

# Just checkpointing — nothing else
from agent_checkpoint import Checkpoint
cp = Checkpoint("my-agent-run")
cp.save(step=3, state={"messages": [...], "cost": 0.04})
# crash happens
state = cp.resume()  # picks up from step 3

# Just consensus — nothing else
from agent_consensus import Consensus
vote = Consensus(weights={"gpt-4": 0.6, "claude": 0.4})
result = vote.decide([gpt4_output, claude_output])

# Just telemetry — nothing else
from agent_telemetry import Telemetry
t = Telemetry()
t.counter("tool_calls").inc()
t.gauge("token_usage").set(1500)
t.histogram("latency_ms").observe(340)
print(t.histogram("latency_ms").percentile(0.95))

There is no arsenal object to import. No base class to inherit. No plugin registry to configure. Just Python.


Install

Install individually — use only what you need:

# The originals
pip install herald           # semantic routing, 0 tokens
pip install engram           # agent memory
pip install sentinel         # loop + cost guards
pip install verdict          # 3D evaluation

# The full production stack
pip install agent-checkpoint agent-saga agent-retry agent-fallback
pip install agent-consensus agent-coordinator agent-aggregator
pip install agent-telemetry agent-trace agent-tracer agent-observability
pip install agent-limiter agent-budget agent-throttle agent-ledger
pip install agent-sandbox agent-guard agent-guardrails agent-policy
pip install agent-planner agent-workflow agent-pipeline agent-queue
pip install agent-session agent-health agent-watchdog agent-lock
pip install agent-validator agent-schema agent-serializer agent-formatter
pip install agent-events agent-signal agent-hook agent-notifier
pip install agent-config agent-secrets agent-discovery agent-pool
pip install agent-stream agent-command agent-tokenizer agent-cache

Or install from source:

pip install git+https://github.com/darshjme/<library-name>.git

What This Isn't

arsenal is not a framework.

  • It does not provide an Agent base class you inherit from
  • It does not have an opinionated execution loop
  • It does not require you to use all of it — or any particular combination
  • It is not a LangChain replacement. LangChain is an abstraction layer. arsenal is a toolbox.
  • It is not LlamaIndex. It does not do RAG.
  • It is not AutoGen. It does not define agent communication protocols.

arsenal is what you reach for when your framework starts failing you.

When your LangChain agent loops forever — sentinel.
When you have no idea why it's expensive — agent-ledger, agent-telemetry.
When it crashes mid-run and loses everything — agent-checkpoint.
When you can't trust its output — verdict, agent-validator.

Use your framework. Just don't let it be your only line of defence.


Why This Exists

Most open-source tooling gets built for recognition. Stars, followers, a name to carry forward.

This isn't that.

These 100 libraries were built in the quiet — not because the builder needed to be seen, but because the tools needed to exist. For developers who will run this code without ever meeting the person who wrote it. For systems that will never send acknowledgment. For a future that doesn't yet know it was prepared for.

That is the only worthwhile reason to build anything: because the thing needs to exist, and you can make it exist.

The world doesn't always reward builders. It loses their names. It forgets who shipped the thing that saved the sprint. It takes tools and moves on without looking back. A builder who has understood this — and builds anyway — is building from a different place than ambition. Something quieter. Something that doesn't need applause to keep running.

Each library here is a single, complete act. Routing. Memory. Guarding. Checkpointing. Given unconditionally to whoever needs it, whatever they're building, whether they ever discover who wrote it or not.

That's not humility. It's a different kind of confidence — one that doesn't require your name on the outcome.


🔗 A2A Protocol Ready

Arsenal is the reliability layer under A2A agents.

Google's Agent2Agent (A2A) Protocol defines how agents discover each other (Agent Cards), communicate (JSON-RPC 2.0), and stream results (SSE). What it doesn't define is what happens when the remote agent is down, slow, or rate-limiting you back. That's Arsenal's job.

Every A2A call — tasks/send, tasks/get, SSE stream subscriptions — is a network call that can fail. Arsenal gives you the five production primitives you need to make those calls bulletproof:

Arsenal lib Vedic name What it does for A2A
agent-circuit-breaker kavacha Open the circuit after N failed A2A calls; fail fast until the remote agent recovers
agent-retry punarjanma Exponential backoff + jitter on transient A2A errors (429, 503, network blips)
agent-tracer anusarana Trace every A2A tasks/send call end-to-end with span IDs for debugging
agent-limiter maryada Rate-limit your outbound A2A calls so you don't overwhelm downstream agents
agent-session sanga Persist A2A session context across multi-turn task exchanges with TTL

Wrapping an A2A Call with kavacha (circuit-breaker)

import httpx
from agent_circuit_breaker import CircuitBreaker, CircuitOpenError, ProtectedCaller

# 1. Define the A2A call (raw JSON-RPC 2.0 over HTTP)
def send_a2a_task(agent_url: str, task_payload: dict) -> dict:
    """Send a task to a remote A2A agent endpoint."""
    response = httpx.post(
        f"{agent_url}/",
        json={
            "jsonrpc": "2.0",
            "method": "tasks/send",
            "params": task_payload,
            "id": task_payload.get("id", 1),
        },
        timeout=30.0,
    )
    response.raise_for_status()
    return response.json()

# 2. Protect it with kavacha — open after 3 failures, recover after 60 s
breaker = CircuitBreaker(
    name="a2a-research-agent",
    failure_threshold=3,
    recovery_timeout_seconds=60.0,
    success_threshold=2,
)
protected_send = ProtectedCaller(send_a2a_task, breaker)

# 3. Wrap with punarjanma for retry on transient errors
from agent_retry import RetryExecutor, RetryPolicy

retry_executor = RetryExecutor(
    RetryPolicy(max_attempts=3, base_delay=1.0, jitter=True)
)

# 4. Wrap with anusarana for end-to-end tracing
from agent_tracer import Tracer

tracer = Tracer(service="orchestrator-agent")

# 5. Full production-grade A2A call
RESEARCH_AGENT_URL = "https://research-agent.example.com/a2a"

def call_research_agent(query: str, session_id: str) -> dict:
    with tracer.span("a2a.tasks/send", tags={"agent": "research-agent"}) as span:
        try:
            payload = {
                "id": session_id,
                "message": {
                    "role": "user",
                    "parts": [{"type": "text", "text": query}],
                },
            }
            # circuit-breaker wraps the retry-wrapped call
            result = retry_executor.execute(
                lambda: protected_send(RESEARCH_AGENT_URL, payload)
            )
            span.set_tag("status", "ok")
            return result
        except CircuitOpenError as e:
            span.set_tag("status", "circuit_open")
            span.set_tag("retry_after", e.retry_after)
            raise  # surface to caller — don't hide outages
        except Exception as e:
            span.set_tag("status", "error")
            span.set_tag("error", str(e))
            raise

# Usage
result = call_research_agent(
    query="Summarize Q1 2025 earnings for NVDA",
    session_id="session-abc123",
)
print(result["result"]["status"])  # completed / working / failed

What this gives you:

  • Circuit breaker: if research-agent is down, you fail fast after 3 attempts instead of hammering it
  • Retry with jitter: transient 429s or network blips are handled automatically
  • Distributed tracing: every A2A hop has a span ID — debug multi-agent pipelines in seconds
  • Zero external dependencies: the entire stack is pure Python, ships in any container

Why Arsenal is the Reliability Layer Under A2A Agents

A2A standardises how agents talk. Arsenal standardises how those conversations survive the real world.

┌─────────────────────────────────────────┐
│         Your Orchestrator Agent         │
│                                         │
│  ┌─────────┐  ┌──────────┐  ┌────────┐ │
│  │ kavacha │  │punarjanma│  │anusara.│ │
│  │circuit  │  │  retry   │  │ trace  │ │
│  │breaker  │  │ backoff  │  │  span  │ │
│  └────┬────┘  └────┬─────┘  └───┬────┘ │
│       └────────────┴────────────┘       │
│              A2A JSON-RPC 2.0           │
└─────────────────────────────────────────┘
            │                │
  ┌─────────▼──┐      ┌──────▼──────┐
  │ Agent Card │      │ Agent Card  │
  │ tasks/send │      │ tasks/get   │
  │ SSE stream │      │ SSE stream  │
  └────────────┘      └─────────────┘
  Remote Agent A      Remote Agent B

The A2A protocol spec is 0 lines of reliability code. That's by design — protocol specs shouldn't enforce runtime behaviour. Arsenal fills that gap: 100 libraries, 4,375 tests, zero dependencies. Drop it into any A2A-based system in minutes.


Quick install

pip install agent-circuit-breaker agent-retry agent-tracer agent-limiter agent-session
# or with Vedic names (same packages, new PyPI names in v2)
pip install kavacha punarjanma anusarana maryada sanga

Part of the Vedic Arsenal — 100 production-grade Python libraries for LLM agent reliability, each named from the Vedas, Puranas, and Mahakavyas.


Philosophy

कर्मण्येवाधिकारस्ते मा फलेषु कदाचन
You have a right to your work, not to the fruits of your work.

Build what needs building. Ship it correctly.

Most agent frameworks promise to handle everything — and then fail quietly in production when routing misfires, memory runs out, loops spin forever, and you have no idea why. arsenal is the opposite. 86 small, focused libraries, each doing one thing extremely well. They do not hide the LLM from you. They do not abstract away your control. They give you sharp instruments and get out of the way.

Your agents do the work. arsenal makes sure they do it without failure.


Related Projects

Project What it is
a2a-reliability-starter Production-ready reliability layer for Google A2A Protocol agents — uses Arsenal patterns (kavacha, punarjanma, anusarana, maryada, sanga)
llm-reliability-starter Reliability monitoring starter for any LLM pipeline — circuit breaker, evaluator, monitor

Built by

Darshankumar Joshi — Gujarat, India.

Building production-grade LLM infrastructure, in silence, for everyone.


4375 tests · 100 libraries · zero external dependencies · MIT licensed · production-tested

Use one. Use all. They compose.

About

100 production-grade Python libraries for LLM agents — fault tolerance, guardrails, memory, retries, observability. Zero dependencies. 4375 tests.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors