Inspiration
Anyone who has run an autonomous agent has watched it happen: the agent decides the file it needs is package.json, calls read_file("package.json"), gets "not found," and then — with total confidence — tries ./package.json, then app/package.json, then ../package.json, forever. The loop is invisible until the bill arrives. The model never says "I'm stuck." It just keeps spending tokens, keeps calling tools, and keeps failing in slightly different ways.
We wanted a circuit breaker for agents — something that sits beside any agent, notices when it has stopped making progress, and either stops it or fixes it. Not a hardcoded retry counter (real work repeats tool calls all the time), but something that understands when repetition is semantically a loop rather than legitimate iteration. That became LoopGuard.
What it does
LoopGuard is a local semantic circuit breaker for AI agents. It observes an agent's stream of tool calls and outputs, detects when the agent is stuck, and intervenes.
Detection runs in two layers:
- Cheap, local, deterministic detectors (no API cost) that flag suspicious repetition.
- An LLM judge that confirms the loop and proposes a concrete fix — only invoked once a detector trips, so the expensive layer runs rarely.
When a loop is confirmed, LoopGuard applies one of four modes:
warn— log it and move onflag— report without blockingpause— hand control to a human with the full t / c / a / i flow: terminate, continue, allowlist the action, or inject a correctionauto— inject the judge's suggested fix and let the agent recover on its own
It ships as a Python engine, a CLI, a FastAPI server that streams runs over WebSocket, and an Expo / React Native mobile "Mission Control" app so you can watch agents loop — and intervene — from your phone. It tracks real token and dollar cost (agent and judge) live, logs every auto-fix, and supports a persistent allowlist of actions you've decided are fine to repeat. It also guards multiple agents at once, catching the A→B→A→B ping-pong where two agents trap each other.
How we built it
The detection math. Every event (a tool call plus its arguments and result) is normalized to a canonical string and embedded locally with an L2-normalized hashing vectorizer — no embedding API, no cost. For the last $n$ events (default $n = 3$), we compute pairwise cosine similarity
$$ \text{sim}(a, b) = \frac{a \cdot b}{\lVert a \rVert \, \lVert b \rVert} $$
and trip only when the minimum pairwise similarity clears the threshold:
$$ \min_{0 \le i < j < n} \text{sim}(v_i, v_j) \;\ge\; \tau, \qquad \tau = 0.86 $$
Using the minimum (not the mean) means all of the recent events must be mutually similar — one genuinely new action breaks the loop and resets the guard. We layered several detectors on top of the same embedding:
- exact — identical normalized signatures ($\text{sim} = 1.0$)
- semantic — the cosine test above
- ping-pong — for a 4-event tail $[A, B, A, B]$, we check $\text{sim}(v_0, v_2)$ and $\text{sim}(v_1, v_3)$, catching multi-agent oscillation
- budget — hard ceilings on tool calls and dollar cost
The judge. Once a detector trips, the run's recent history (plus a listing of the files that actually exist in the workspace) is sent to a judge running on Cerebras gpt-oss-120b. Cerebras's inference speed is what makes the two-layer design viable — the confirmation step returns fast enough to sit inline in an agent's loop. The judge returns a structured verdict: is this a loop, a confidence score, its reasoning, and a suggested_fix. In auto mode that fix is injected straight back into the agent's context.
The stack. A provider-agnostic agent runtime (run_agent / run_multi_agent) drives real tool-calling agents through either the Cerebras SDK or LiteLLM. A FastAPI server exposes /projects, /runs, /runs/{id}, /allowlist, /autofixes and a per-run WebSocket; the React Native app subscribes to that stream and renders a live decision card, a token/\$ meter, and run history. We seeded a set of demo projects — each a real workspace with an agent deliberately biased toward the wrong well-known file (an npm agent let loose in a Python repo) — so the loops are genuine, not scripted.
Challenges we ran into
- Telling a loop apart from progress. This is the whole problem. Too sensitive and you kill legitimate retries; too lax and the agent burns \$5 before you notice. The fixes were the minimum-similarity rule, careful event normalization (so argument noise doesn't hide a repeat), and the two-layer design — the cheap detector can be a little trigger-happy because the judge has the final say.
- Reasoning models truncate their JSON. Our first judge calls came back as half-finished JSON objects —
gpt-osswas spending its token budget thinking and getting cut off mid-verdict. We had to raise the completion-token budget specifically for the reasoning model so the structured output survived. - Vague fixes. Early on the judge would say "read the manifest" without knowing which file existed, so it guessed the wrong extension. Passing the actual workspace file listing as context turned "try the manifest" into "read
pyproject.toml." - A pause/resume race in the server. Pausing a streaming run and resuming it from an intervention had a race between the run loop and the HTTP handler; we had to close it so a decision could never be applied to a stale run state.
- Real cost accounting. We refused to fake the meter. Cost is real token usage times real per-model pricing, and the judge's own cost is folded into the displayed total — guarding an agent isn't free, and the UI says so.
- Streaming to a phone. WebSocket reconnection, persisting the server URL across launches, and keeping the mobile reducer in sync with server-side run state took more care than the "happy path" demo suggested.
Accomplishments that we're proud of
- It actually works end to end against the real Cerebras API — not a mock. In a live run, an npm-minded agent loops on
package.jsonin a Python repo, the semantic detector trips at similarity $0.89$, the judge fires at confidence $0.97$,automode injects "readpyproject.toml," and the agent recovers and reportsacme-cli v2.3.1— all for a fraction of a cent. - A clean two-layer architecture where the expensive model only runs when a free local check says it's worth it.
- Multi-agent guarding with a dedicated ping-pong detector.
- A genuinely useful mobile control surface for something usually buried in terminal logs.
- A real test suite — 71 backend tests plus the app reducer tests and a clean typecheck — rather than a demo held together with tape.
What we learned
- Repetition is not the signal; semantic repetition is. A counter can't tell the difference between an agent making progress and an agent flailing. Embeddings + a minimum-similarity rule can.
- Cheap-filter, expensive-confirm is the right shape for agent safety. Running an LLM judge on every step would be slow and costly; running it only after a deterministic trip gives you the judge's nuance at almost none of its cost — and fast inference (Cerebras) is what lets the confirm step stay inline.
- Context is the difference between a useless and a useful fix. The same judge prompt went from guessing to precise the moment we handed it the real file listing.
- Reasoning models need token headroom for structured output — the "thinking" tokens are real and they eat your JSON if you don't budget for them.
What's next for LoopGuard
- Pluggable embeddings — swap the local hashing vectorizer for a real sentence-embedding model when accuracy matters more than zero cost.
- More loop shapes — $k$-cycle detection beyond ping-pong, and "slow drift" loops where the agent makes microscopic non-progress over many steps.
- A drop-in middleware/decorator so any existing agent framework gets a circuit breaker in one line.
- Persistent storage and analytics — the server is in-memory for the demo; next is durable run history and an allowlist that learns across sessions.
- Hardening for real deployments — auth, rate limits, and per-tenant budgets so teams can put LoopGuard in front of production agents.
- Codex, claude, cursor integration — so you can code with your fav agents!
Log in or sign up for Devpost to join the conversation.