Inspiration

Launching something online is weirdly random.

You pick one angle. You post it somewhere. You just see what happens.

We wanted to explore the competing directions before committing in public, and let the strongest one prove itself instead of guessing.

What it does

You give it a goal. Several strategy agents then compete to reach it, each betting on a different approach.

In the demo, the product is TileShift, a minimalist browser puzzle, and the goal is the first 10 users. Three agents run: gameplay-first, technical-first, and founder-story.

They consume web signals across five channels: Reddit, Hacker News, Product Hunt, X, and LinkedIn. A scoring engine rates each agent on six criteria (early-user potential, audience fit, clarity, friction, learning value, risk) in a tuned weighted sum where early-user pull dominates.

The twist is that it runs as a live, uncertain tournament rather than a static readout. Evidence arrives round by round, the early leader gets overtaken, and each agent adapts to its biggest objection before the final ranking is decided. You walk away with a ranked recommendation, a next experiment, and a draft launch message.

How we built it

The whole thing is a zero-dependency Node service with a clean pipeline:

web signals → strategy agents → scoring engine → tournament → recommendation UI

The agents, scorer, and tournament are deterministic modules. The signal source is deliberately isolated behind a single getSignals(context) function in signalProvider.js, the one seam built to swap in live web data without touching anything downstream.

The UI is vanilla HTML/CSS/JS that replays the tournament honestly: per-round score trajectories, a signal feed showing who each signal helps or hurts, and the strategy updates each signal triggered. It ships as a container on Cloud Run with Firebase Hosting serving the static UI and rewriting /api/** to the service.

This build validates the multi-agent engine against a curated set of mock web signals while the live integration point stays ready to go.

Challenges we ran into

Making the tournament genuinely uncertain was the hard part. A leaderboard that just sorts scores is boring and dishonest. We had to engineer real lead changes (novelty rewarding the technical-first agent early, then playable and friction signals reshuffling everything) while keeping the engine deterministic and replayable.

The other challenge was the adaptation loop: letting each agent recover risk and lift its weakest criterion in response to its biggest objection, then re-deciding the ranking on that adapted score.

Accomplishments that we're proud of

The adaptive loop is the magic moment: an agent hits real rejection, learns from the specific pushback, revises, and converges on something that works.

The tournament is honest and fully replayable. Every round, every signal's effect, and every strategy update is emitted by the engine, not faked in the UI. And it runs with zero runtime dependencies and deploys cleanly.

What we learned

Ranking strategies isn't actually the value. Surfacing tradeoffs is. Fast adoption vs. deeper engagement, broad reach vs. retention quality. Different futures emerge depending on where you commit.

Staying specific to one vertical (product launches) made the whole concept far more credible than trying to generalize. And isolating the data source behind one function is what makes "go live" a small change instead of a rewrite.

What's next for Agentic Sandbox

Wire signalProvider.js to Nimble's Search, Extract, Crawl, and Web Search Agents so the agents reason over real, live web signals instead of mock data. Then: generalize beyond launches to other decision types, persist tournament runs, and let users define their own agents.

Built With

Share this project:

Updates