MUSE stands for "Make UI Speak & Edit."
MUSE is a hackathon MVP for blind developers who build UI with AI coding tools like v0. It accepts a v0 preview link, pasted UI code, screenshot, or spoken design goal, then returns a structured explanation of the likely visual layout, taste/design feedback, accessibility notes, and a copyable prompt for the next v0 iteration.
This is the product interface, not a landing page.
AI UI builders can generate visual interfaces quickly, but blind developers still need fast, practical feedback about what changed visually, whether the result feels polished, and what prompt to try next. MUSE turns UI output into reviewable text and optional speech so the design loop can stay keyboard-first and voice-first.
- Next.js App Router
- TypeScript
- Tailwind CSS
- Groq for structured review and speech-to-text
- ElevenLabs or AWS Polly for text-to-speech
- DynamoDB for persisted review sessions, user profile, and compact MUSE memory, with browser local fallback
- Server-side PageMap extraction for URL/code navigation guidance
MUSE now routes each call turn through a conversation controller before using a model. The controller classifies intents such as onboarding, overview, current section, next section, list actions, design critique, accessibility review, revision comparison, v0 prompt generation, preference memory, and session end.
- Deterministic PageMap intents answer without model calls and use mapped page title, section labels, nearby sections, actions, warnings, and the current goal.
- Call start greets once, then either asks for missing UI input or orients the user to the mapped page.
- Onboarding stores a heard name as pending first and asks for confirmation before saving it permanently.
- Preference memory only stores explicit user statements, such as preferring clean minimal layouts or prioritizing screen-reader clarity.
- Screenshot-only conversations stay graceful: MUSE explains the limitation and asks for URL/code instead of pretending to visually inspect the screenshot.
npm install
cp .env.example .env.local
npm run devOpen http://localhost:3000.
Required for AI review and voice transcription:
GROQ_API_KEY=Optional for ElevenLabs read-aloud:
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
DEFAULT_TTS_PROVIDER=elevenlabsOptional for AWS Polly and DynamoDB:
AWS_REGION=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_POLLY_VOICE_ID=
AWS_DYNAMODB_TABLE=Optional for LiveKit realtime voice:
LIVEKIT_URL=
LIVEKIT_API_KEY=
LIVEKIT_API_SECRET=
MUSE_APP_URL=http://localhost:3000To persist completed reviews, comparisons, companion profile, and compact memory server-side, create a DynamoDB table with:
- Partition key:
pkas string - Sort key:
skas string
The same table stores:
SESSION#<sessionId>/METADATAfor review and comparison sessionsUSER#<userId>/PROFILEfor onboarding and preferencesUSER#<userId>/MEMORY#ACTIVEfor compact conversational memoryUSER#<userId>/CALL#<callId>#TURN#...for recent call turns
IAM actions needed for the hackathon app:
- DynamoDB:
PutItem,GetItem,UpdateItem,Query - Polly:
SynthesizeSpeech
When DynamoDB is not configured or a save fails, MUSE still returns the review and stores local fallback data in this browser under muse.localSessions, muse.profile, and muse.memory.
MUSE can upgrade from browser-recorded turns to LiveKit Cloud realtime transport when LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET are configured. The Next.js app remains the source of truth: it mints short-lived room tokens at POST /api/livekit/token, stores profile/memory through DynamoDB, maps pages through PageMap, and routes turns through /api/conversation.
- Realtime calls have a hard 10-minute cap. The client shows remaining time, the token expires at the cap, and the UI ends the call with a concise status message.
- The settings panel includes a
Realtime voicetoggle. If LiveKit token minting or connection fails, MUSE falls back to the existing browser recording,/api/stt, and/api/ttsflow. - The separate worker entrypoint is
workers/muse-livekit-agent.ts; run it withnpm run livekit:agentin an environment that has LiveKit credentials andMUSE_APP_URL. - The worker joins LiveKit rooms as MUSE, listens for voice turns through LiveKit Agents, calls
/api/conversationfor the MUSE brain, and speaks the backend reply. Room metadata can carry current URL/code summary, PageMap, and navigation state.
- Start the app with
npm run dev. - Paste a short v0/React/HTML snippet into the workspace with
Cmd+V, or open the F-options panel and choosePaste code. - MUSE maps the page into sections and actions when URL/code input is available.
- Press
F, then useDescribe current section,Next section,Previous section, orList actionsto navigate the mapped page. - Add a goal such as
Make this page feel more premium and accessible. - Press
F, chooseReview UI, and wait for the structured MUSE review. - Copy the v0 prompt from the result.
- Press
Spaceonce to start a MUSE call. First-time users should hear onboarding and be asked to confirm the heard name before it is saved. - Ask
what am I looking at?,does this feel premium?, ormake a v0 prompt; replies should stay concise and grounded in PageMap labels/actions. - Speak a short navigation command like
next sectionorlist actions, then pause. MUSE should auto-submit after the silence gap, reply concisely, and return to listening. - Press
Escapeto end the call; MUSE saves and shows a compact session summary. - Use
Read aloudif ElevenLabs or Polly is configured. - Press
F, chooseCompare versions, paste before/after snippets, and run the comparison flow. - For the screenshot limitation, upload only a screenshot and choose
Review UI; MUSE should ask for HTML/JSX/CSS or page code for accurate review.
Tabreaches all primary controls.Fopens the command/options panel when focus is not inside a text field.Spacestarts or resumes the MUSE call when focus is not inside a text field. During speech playback, Space interrupts MUSE and returns to listening.Escapecloses panels or ends the active call.- Voice turns auto-submit after a silence gap; the transcript display remains read-only.
- Pasted URLs/code are mapped into sections and actions for guide-only navigation. MUSE does not click target websites from the web app.
- Status changes are announced through
role="status", with blocking errors exposed throughrole="alert". - Conversation replies include suggested commands when useful, and those suggestions are exposed in the reply region.
- Icon-only controls have accessible labels and visible focus rings.
- MUSE uses one concise voice personality instead of user-facing description modes. It should describe one specific part at a time, then ask whether to continue.
- Screenshot-only analysis is intentionally limited. With the current MVP model path, paste HTML/JSX/CSS or a v0 preview link for accurate review.
- PageMap navigation is server-side HTML/code mapping, not a rendered browser or extension. JavaScript-heavy pages may have missing controls.
- Live speech-to-text requires microphone permission and
GROQ_API_KEYin fallback mode. LiveKit realtime mode also requires LiveKit Cloud credentials and a separately running agent worker. - Read-aloud requires
ELEVENLABS_API_KEYor AWS Polly credentials. - DynamoDB table creation is manual for the hackathon build.
- Browser local fallback sessions stay on the current device and browser only.
The API routes include in-memory per-client limits before calling paid providers:
- Review/comparison: 8 requests per 15 minutes, max 30000 input characters.
- Speech-to-text: 10 requests per 15 minutes, max 8 MB audio upload.
- Text-to-speech: 20 requests per 15 minutes, max 3000 characters.
- Conversation: 20 turns per 15 minutes, max 20000 context characters.
- PageMap: max 90000 input characters per mapping request.
- LiveKit token minting: 10 requests per 15 minutes, 10-minute token/call cap.
This protects the local/demo app from accidental credit burn. For multi-instance production hosting, replace this with a shared limiter such as Redis, Upstash, or a provider gateway limit.
npm run test includes deterministic multi-turn conversation evals for onboarding, PageMap orientation, section navigation, design judgment, preference learning, v0 prompts, revision comparison, screenshot-only limitations, LiveKit token behavior, and LiveKit call-cap/state transitions.
npm run lint
npm run test
npm run buildThe app can deploy as a standard Next.js project on Vercel. Add the same environment variables in the deployment dashboard. Voice recording requires HTTPS in deployed environments, which Vercel provides by default. The LiveKit agent worker is a separate long-running process and should be deployed in a LiveKit Agents-compatible worker environment, with MUSE_APP_URL pointing at the Vercel app.
- Read
vision.mdbefore changing product behavior. - Read
plan.mdbefore starting a new phase. - Update
lander.mdafter every completed phase. - Keep application source TypeScript-native. Do not convert app code to plain JavaScript.
Phases 0-8 are implemented, followed by the conversational companion, PageMap navigation, and reliable call-based review upgrades: planning, scaffold, command workspace, input flows, Groq review, voice transcription, read aloud, DynamoDB-backed sessions/profile/memory, local fallback storage, before/after comparison, final docs, focused tests, QA, onboarding, call-like voice flow, guide-only UI navigation intelligence, deterministic conversation control, structured preference memory, and multi-turn evals.