Inspiration
Every day I see someone ships something new with v0, Claude Code, or Cursor. You describe what you want, you watch it appear, you tweak it, and you feel the design come together. But that whole loop is visual. A blind developer can generate a UI with AI just like anyone else, and then hit a wall: they cannot see what came out or feel whether it actually looks right. The tools got radically easier to build with but blind and low vision devs can really do “vibe-coding” . That’s why i built MUSE, it looks at the page for them and describes it the way a sighted friend sitting next to them would. From basic overview to technical details about font-size and html tags.
What it does
You paste a v0 link, a URL, screenshot or some UI code. MUSE fetches the page, takes a screenshot, and reads its html structure. Then you just talk to it. Ask "what is this page," "tell me about the footer," "which product is in the hero," "is this too cluttered," or "give me a prompt to make it more premium." It answers out loud, grounded in what it actually saw, with real opinions about the design and how it fits your taste, this way you plan the changes you want and then ask for a prompt and it writes a clean one you can paste straight back into v0 or any other coding agent. The whole thing runs as a hands free call: press space to talk, interrupt it any time, go quiet and it just waits.
How we built it
The front end is Next.js and TypeScript. Groq runs the brain (Llama 3.3 70B), Whisper handles speech to text, and Llama 4 Scout actually looks at the screenshot. AWS Polly speaks the replies and DynamoDB remembers your style taste. Playwright fetches the page and captures the screenshot on the server. We turn the HTML into a page map of sections and actions, pull design signals like colors and spacing out of the CSS, and feed all of that plus the vision read into one clean briefing the model reasons from.
Challenges we ran into
The agent kept hedging, insisting it could not see the page even when it had the screenshot and the map right in front of it. The fix was to stop patching symptoms and rebuild the foundation: one honest briefing of exactly what MUSE can see, and one positive set of behaviors instead of a pile of "never do this" rules. The voice loop was the other hard part. Making interruptions, silence, and turn taking feel natural took real work, especially making sure it never talks into silence and never freezes after you cut it off.
What I learned
A good agent is mostly a good foundation, not a stack of patches. Once the model had a single clear picture of what it knew, most of the strange behavior disappeared. We also learned how much real design feedback you can convey without a screen, as long as every sentence is grounded in actual structure and a real screenshot.
What's next?? (beyond the hackathon)
I want to ship MUSE as a browser extension(or even better a system level app) so it rides along inside v0, Cursor, and any website, describing and advising right where you are instead of asking you to paste links. That turns it from a demo into a daily assistant that sits beside a blind developer wherever they build.
Built With
- amazon-dynamodb
- aws-polly
- groq
- llama-3.3-70b
- llama-4-scout
- next.js
- playwright
- react
- tailwind-css
- typescript
- v0
- web-audio-api
- whisper

Log in or sign up for Devpost to join the conversation.