ZIP FILE FOR PRE-REFORM SCREENSHOTS:

https://drive.google.com/file/d/1n_Yk2ln6uW9XDQ0Vbmdq-fVIBAaNsasq/view?usp=sharing

IMPORTANT NOTE

We had a couple of issues with Reform, such as with it identifying the proper page routes, and with it actually applying it's suggested changes to our repo. So, although our Reform UI transformations weren't extreme makeovers, they were still very helpful to make things look more interesting! We really tried our absolute best to make it work, and we left you more than plenty of feedback in the feedback form.

Inspiration

We've both been Minecraft players for years, and we've always been fascinated by how the game creates this sandbox where anything feels possible. Then we realized what if we could build something that lets anyone deploy AI-powered bots into their world, without writing a single line of code? Most existing tools are research prototypes: command-line only, single-agent, no UI. We wanted to build the missing layer: a polished, real-time dashboard that makes spawning and commanding autonomous agents feel as intuitive as playing the game itself.

What it does

MineMobs is a real-time web dashboard for spawning, commanding, and chatting with autonomous Minecraft bots. You connect it to any local PaperMC server and get a browser-based control center where you can:

Spawn agents across several specialized roles, such as Lumberjack (chops trees), Miner (mines ores), Fighter (hunts hostile mobs), and more. Each agent has its own personality, toolkit, and loadout. Chat with agents in natural language. Tell a miner "find me some diamonds" or a fighter "defend this area." Gemini parses the intent, maps it to a structured task, and the bot autonomously executes pathfinding, mining, combat, all handled. Progress streams back to the chat panel in real time.

You can use multi-agent mode to describe a high-level goal like "set up a base camp." The AI decomposes it into a multi-agent plan (e.g., one lumberjack gathering wood, one miner collecting stone, one scout surveying the perimeter), you review the plan, and deploy all agents simultaneously. The system monitors them every 5 seconds and generates a mission summary when they finish.

You can monitor your agents' health bars, hunger, inventory grids, task progresses, and positions. You can create multiple tabs that let you run independent sessions at the same time.

How we built it

Backend: Node.js + Express + Socket.io server that bridges the web dashboard to Minecraft. Each agent is a Mineflayer bot instance with pathfinder and PVP plugins loaded. When a user sends a chat message, it goes through Gemini 2.5 Flash for task parsing the model returns a structured JSON action (mine, chop, fight, scout, defend) which gets executed by a task executor that handles pathfinding, block-breaking, mob targeting, and inventory management. RCON integration handles server-side setup (gamemode, loadouts). A state emitter polls all agent positions, health, and inventory every 2 seconds and broadcasts via Socket.io. Commander mode uses Gemini to decompose goals into multi-agent plans, with a monitoring loop that detects completion and generates mission summaries.

Frontend: React 19 + Vite + TypeScript + Tailwind CSS v4. The entire UI is custom-themed to look and feel like Minecraft pixel font (Press Start 2P), beveled buttons matching Java Edition's menu style, dark panel backgrounds with 3D inset borders, SVG pixel-art role icons, and actual Minecraft textures from the game's asset repository for inventory items. Each page has a unique animated background (Canvas particle effects with page-specific color palettes, slow-drifting screenshots, radial vignettes). The dashboard is fully real-time via Socket.io hooks (useAgents, useAgentChat, useSocket) that stream state updates into React state. The minimap renders on a Canvas 2D context with role-colored dots, position history trails, and zoom controls. Framer Motion handles page transitions, layout animations, and staggered reveals throughout.

Integration: The frontend connects directly to localhost:3001 no auth, no hosting, purely local. REST endpoints handle spawning/dismissing agents and plan generation, while Socket.io carries all real-time state (agent updates, chat messages, activity logs, commander progress).

Challenges we ran into

Mineflayer's learning curve was steep. Bot spawning, pathfinding, PVP, and inventory management all use different plugins with different APIs. Getting agents to reliably navigate terrain, break the right blocks, and handle edge cases (falling in water, getting stuck, dying mid-task) took significant debugging. We had to implement safe username generation, spawn timeouts, and RCON-based loadout assignment to make the pipeline reliable.

Gemini task parsing needed fallbacks everywhere. LLMs don't always return valid JSON, and even when they do, they sometimes assign actions that don't match the agent's role (a scout trying to mine). We built a keyword-based fallback parser and role-action validation to catch these cases gracefully, so the system never fully breaks even when the model is unreliable.

Real-time state synchronization was tricky. With multiple agents running concurrently, each with health, position, inventory, and task progress changing constantly, keeping the frontend in sync required careful polling intervals (2s for state, 5s for commander monitoring) and deduplication of Socket.io events. Early on we had double-message bugs and inventory slots mapping to the wrong indices because Mineflayer uses Java Edition's protocol slot numbering (hotbar starts at slot 36, not slot 0).

Getting the UI to look authentically Minecraft-themed without being cheesy was harder than expected. We went through multiple iterations tiled block textures (looked terrible at scale), generic dark cards (too generic), and finally landed on the beveled button style from Java Edition's actual menu UI, combined with SVG pixel-art icons and real Minecraft textures pulled from the game's asset repository.

Accomplishments that we're proud of

Commander mode genuinely feels like magic. Typing "set up a base camp" and watching the AI decompose it into three agents with specific tasks, then deploying them all at once and monitoring their collective progress on a minimap that's the moment where the project goes from "cool demo" to something that feels like a real product.

The full loop works end-to-end. You can go from the landing page, pick a world seed, start a server, spawn a miner, type "mine some iron," and watch it pathfind to a cave, break iron ore, and update its inventory in your browser all within a few minutes of setup.

The UI quality. Every page has its own animated background, particle system, and color palette. The inventory grid uses actual Minecraft textures. The buttons have authentic Java Edition bevels. The health/hunger bars are SVG pixel hearts and drumsticks. We didn't cut corners on the visual polish.

What we learned

Mineflayer is powerful but underdocumented. The community wiki covers basics, but advanced patterns (concurrent bots, PVP state machines, RCON integration) required reading source code and experimenting. We now have a much deeper understanding of Minecraft's protocol layer.

LLM output needs guard rails. Even with structured JSON schemas and role-specific system prompts, Gemini occasionally hallucinates actions or returns malformed responses. Building robust fallback chains (LLM keyword parser role-default) is essential for any AI-powered system that needs to be reliable.

UI theming is a design system problem. We initially thought "just make it dark and add pixel font." But authentic Minecraft styling required understanding the actual visual language border bevel directions, specific grey values (#555 primary, #5b8731 success), inset shadow patterns, pixelated image rendering. Once we extracted those rules into reusable components (MinecraftButton, MinecraftCard), the whole UI snapped into place.

Socket.io is the right tool for this kind of real-time multiplayer state. REST polling would've been too slow, WebSocket raw would've been too low-level. Socket.io's room-based broadcasting and automatic reconnection made the real-time layer surprisingly smooth.

What's next for MineMobs

Agent memory and learning. Right now agents are stateless between tasks. We want to add persistent memory so a miner remembers where it found diamonds last time, or a scout builds up a map of the world over multiple expeditions.

Multi-agent coordination. Currently agents in a Commander plan execute independently. We want agents to share resources, communicate with each other, and dynamically reassign tasks a fighter that clears mobs for a miner, a lumberjack that delivers wood to a builder.

Builder role. A fifth agent role that can construct structures from blueprints or natural-language descriptions ("build a 5x5 cobblestone house with a door facing east").

Voice commands. Using Whisper or a similar speech model to let users talk to their agents instead of typing making it feel like you're actually commanding a crew.

Hosted mode. Right now everything runs locally. We'd like to offer a hosted version where you can connect to remote servers, share your dashboard with friends, and persist agent state across sessions.

Share this project:

Updates