Make-It

Made-It System Design Diagram
Made-It Logo

🌟 Inspiration

Your group chats are full of good intentions: “Thursday night works for me! What about everyone else?” “Can we get tacos instead? I had pizza last night.” “I don’t have a car, but I guess I could look into an Uber".

And then someone finally says: “Sounds good! Who wants to make the reservation?” And then nothing. 🦗🦗🦗

We realized the problem isn’t excitement - it’s logistics. Make-It exists to eliminate the invisible friction that quietly kills plans before they ever "Make-It" out of the group chat.

🍽 What It Does

Make-It turns chaotic high-activation energy group chat planning into confirmed, coordinated real-world action.

Make-It sits in your iMessage group chat. When people start planning, it handles the entire pipeline: preference extraction, restaurant search, consensus tracking, reservation booking, calendar invites, and ride cost estimates, all through natural conversation. No app downloads, no links to click, no one person stuck being the organizer.

🛠 How We Built It

We designed Make-It as a modular AI orchestration system:

Message Routing + LLM Judge

We built a real-time message bridge using Photon's iMessage SDK that polls the group chat every 2 seconds. Every message passes through an LLM Judge, a Claude Haiku-powered classifier running a two-state machine (IDLE/PLANNING), that filters out chatter and only forwards planning-relevant messages to the AI agent. This prevents the agent from responding to "lol" or "did you see that meme" while still catching "I'm free after 7" buried in noise.

Conversational Agent (Poke)

Poke handles the reasoning layer: understanding that "soda hall" means UC Berkeley, "7ish" means around 7pm, and that three "sounds good" messages from different people means the group is ready to book. It connects to our MCP server over HTTP, discovers available tools automatically, and calls them as needed during conversation.

MCP Server on Modal

We deployed a FastMCP server on Modal that exposes custom tools the agent can call. The Modal image bundles a full Node.js runtime alongside our browser automation scripts so everything runs in one container. Each tool follows the same pattern: Python receives the MCP call, pipes JSON into a TypeScript subprocess via stdin, and parses structured output from stdout. Currently two tools are live: restaurant booking and Uber price estimation.

Browser Automation (Stagehand + Browserbase)

For reservations, Stagehand navigates OpenTable's full booking flow, selecting party size, picking a time slot, entering contact info, and confirming. For Uber estimates, we needed an authenticated session since their price estimator requires login. We used Browserbase's Contexts API to solve this: log into Uber once through their live browser debugger, persist the session cookies to a context ID, and every future automation run reuses that authenticated state. The script enters pickup/dropoff on uber.com/price-estimate and extracts live prices for UberX, UberXL, and Share.

Loop Prevention + Message Dedup

Since the agent both reads from and writes to the same iMessage pipe, we had to prevent infinite echo loops. We track every outbound message in a set with a 10-second TTL, if a message comes back in that matches something we just sent, we drop it silently.

Calendar integration: Generates and sends Google Calendar links directly to attendees’ emails.

⚡ Challenges We Ran Into

Browserbase authentication persistence — Uber and OpenTable require login to access pricing/booking. We used Browserbase Contexts to persist auth cookies across sessions, but getting the context IDs synced between local dev and our Modal deployment took several rounds of debugging.

Poke MCP integration — Getting Poke to reliably call our custom MCP tools and relay the results back to the group chat required careful iteration on transport configuration (streamable-http) and tool response formatting. Some accounts weren't able to access MCP tools.

Stagehand fragility — Website inputs sometimes use async autocomplete dropdowns that don't respond to normal form fills well. We had to drop down to keyboard strokes to select suggestions reliably after diagnosing in playground.

Message loop prevention — With messages flowing between iMessage, our watcher, and Poke, preventing echo loops required a dedup set with TTL-based expiration and careful routing logic.

Most difficult was making everything feel invisible - like a friend helping, not a bot intruding.

🧠 What We Learned

Most plans fail at the micro-friction layer (transport, confirmation, ambiguity). AI is powerful not because it answers questions - but because it coordinates. Logistics is emotional. If getting there feels hard, the plan dies. A good AI assistant feels proactive but not controlling.

🔮 What’s Next for Make-It