For the video: please go here http://oft.ovh/ny2A

In just a few clicks, Hive AI Agents transforms plain-English directives into fully tested, documented, containerized micro-services and then dynamically stitches them into complex, multi-step workflows under a root orchestrator. Built on Flask, Docker, Google Cloud Run, and powered by Gemini and Anthropic LLMs, it learns which peers to include via a custom LLM-guided CSV slicing and self-optimizes through recursive agent creation—no manual wiring, no boilerplate.

Inspiration

We saw a growing trend in treating AI “agents” as first-class microservices that can be composed into larger workflows—borrowing best practices from event-driven microservices architecture and EDA patterns to scale agents reliably. At the same time, leaders like Meta predict that AI will function as a mid-level engineer by 2025, writing and reviewing code in real time. We wanted to build a meta-AI: an AI that not only writes code but manages, tests, deploys, and orchestrates its own creations.

What it does

  • Automated Agent Generation: Users post a JSON prompt (“Create an agent that summarizes research papers…”), and the backend spins up a new Flask microservice, complete with Pydantic validation, pytest suites, documentation, Docker packaging, and a Cloud Run deployment—all via LLMs.
  • Hierarchical Orchestration: A top-level “ResearchMaster” agent exposes /research-master, accepts a pipeline of registered agents, calls each in sequence, and returns a full execution trace or error details.
  • Service Registry & Semantic Discovery: Every agent’s metadata is appended to agents.csv and a REST /registry endpoint—allowing Claude/Gemini to pick which peers to include contextually, thus avoiding prompt bloat and ensuring relevance.

How we built it

We chose Flask for its minimal footprint and ease of writing microservices. Each generated agent lives in its own folder, and we use os.makedirs(..., exist_ok=True) to avoid mkdir collisions. pipreqs (with --mode no-pin) auto-generates requirements.txt without strict version pins—preventing deployment breakage on numpy upgrades. For orchestration we integrated Orkes Conductor via the Python SDK, defining three tasks (choose_or_create_agent, ask_agent, log_usage) and a simple DAG—all running alongside Flask in the same container. Containers deploy to Google Cloud Run with a --revision-suffix to enforce unique revisions on every build.

Challenges we ran into

  • Name collisions & idempotency: Re-deploying the same agent name caused Cloud Run ALREADY_EXISTS errors; fixed by adding a random suffix per deploy.
  • Prompt context bloat: Feeding the entire registry to the LLM exceeded token limits; solved with a custom self-hosted LLM-guided CSV slicer that asks an assistant which agents matter most, then clamps to 30 rows.
  • Silent failures: Initial 500s from uncaught exceptions left us blind; we added a global sys.excepthook to log full stack traces and wrapped each expensive step (build_agent, gcloud, ping_ai) in try/except blocks for clear JSON errors.

Accomplishments that we're proud of

  • Recursive agent creators: We built agents that generate agents—e.g. documentation generators, test writers, and even a new agent factory—demonstrating true self-improvement.
  • Seamless orchestration: The three-node Conductor workflow immediately visualizes each request’s path in the Orkes UI, complete with retries, SLA timeouts, and audit logs.
  • Plug-and-play registry: Judges can hit /registry in a browser, see every agent spun up, and chain them in new pipelines on the fly.

What we learned

  • Microservices patterns like the Service Registry are invaluable for AI workflows—treat agents as replaceable, discoverable services rather than monolithic code.
  • Prompt engineering is just as critical in system architecture: crafting clear system prompts and fallback logic can make or break reliability under token constraints.
  • DevOps for AI requires new guardrails (constraints files, revision suffixes, file locks) to handle the unpredictability of LLM-generated code.

What's next for Hive AI Agents

We plan to add parallel fan-out, where multiple agent variants run concurrently and the platform selects the best output. We’ll explore dynamic sub-workflows that fork mid-pipeline based on runtime signals. Finally, we aim to launch a dashboard UI to visualize agent trees, live executions, and semantic-search-powered recommendations—all in real time.

Built With

Share this project:

Updates