Skip to content

Shiv-aurora/muse

Repository files navigation

MUSE

MUSE stands for "Make UI Speak & Edit."

MUSE is a hackathon MVP for blind developers who build UI with AI coding tools like v0. It accepts a v0 preview link, pasted UI code, screenshot, or spoken design goal, then returns a structured explanation of the likely visual layout, taste/design feedback, accessibility notes, and a copyable prompt for the next v0 iteration.

This is the product interface, not a landing page.

Why It Matters

AI UI builders can generate visual interfaces quickly, but blind developers still need fast, practical feedback about what changed visually, whether the result feels polished, and what prompt to try next. MUSE turns UI output into reviewable text and optional speech so the design loop can stay keyboard-first and voice-first.

Stack

  • Next.js App Router
  • TypeScript
  • Tailwind CSS
  • Groq for structured review and speech-to-text
  • ElevenLabs or AWS Polly for text-to-speech
  • DynamoDB for persisted review sessions, user profile, and compact MUSE memory, with browser local fallback
  • Server-side PageMap extraction for URL/code navigation guidance

Conversational Review Companion

MUSE now routes each call turn through a conversation controller before using a model. The controller classifies intents such as onboarding, overview, current section, next section, list actions, design critique, accessibility review, revision comparison, v0 prompt generation, preference memory, and session end.

  • Deterministic PageMap intents answer without model calls and use mapped page title, section labels, nearby sections, actions, warnings, and the current goal.
  • Call start greets once, then either asks for missing UI input or orients the user to the mapped page.
  • Onboarding stores a heard name as pending first and asks for confirmation before saving it permanently.
  • Preference memory only stores explicit user statements, such as preferring clean minimal layouts or prioritizing screen-reader clarity.
  • Screenshot-only conversations stay graceful: MUSE explains the limitation and asks for URL/code instead of pretending to visually inspect the screenshot.

Local Setup

npm install
cp .env.example .env.local
npm run dev

Open http://localhost:3000.

Environment Variables

Required for AI review and voice transcription:

GROQ_API_KEY=

Optional for ElevenLabs read-aloud:

ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
DEFAULT_TTS_PROVIDER=elevenlabs

Optional for AWS Polly and DynamoDB:

AWS_REGION=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_POLLY_VOICE_ID=
AWS_DYNAMODB_TABLE=

Optional for LiveKit realtime voice:

LIVEKIT_URL=
LIVEKIT_API_KEY=
LIVEKIT_API_SECRET=
MUSE_APP_URL=http://localhost:3000

DynamoDB Storage

To persist completed reviews, comparisons, companion profile, and compact memory server-side, create a DynamoDB table with:

  • Partition key: pk as string
  • Sort key: sk as string

The same table stores:

  • SESSION#<sessionId> / METADATA for review and comparison sessions
  • USER#<userId> / PROFILE for onboarding and preferences
  • USER#<userId> / MEMORY#ACTIVE for compact conversational memory
  • USER#<userId> / CALL#<callId>#TURN#... for recent call turns

IAM actions needed for the hackathon app:

  • DynamoDB: PutItem, GetItem, UpdateItem, Query
  • Polly: SynthesizeSpeech

When DynamoDB is not configured or a save fails, MUSE still returns the review and stores local fallback data in this browser under muse.localSessions, muse.profile, and muse.memory.

LiveKit Realtime Voice

MUSE can upgrade from browser-recorded turns to LiveKit Cloud realtime transport when LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET are configured. The Next.js app remains the source of truth: it mints short-lived room tokens at POST /api/livekit/token, stores profile/memory through DynamoDB, maps pages through PageMap, and routes turns through /api/conversation.

  • Realtime calls have a hard 10-minute cap. The client shows remaining time, the token expires at the cap, and the UI ends the call with a concise status message.
  • The settings panel includes a Realtime voice toggle. If LiveKit token minting or connection fails, MUSE falls back to the existing browser recording, /api/stt, and /api/tts flow.
  • The separate worker entrypoint is workers/muse-livekit-agent.ts; run it with npm run livekit:agent in an environment that has LiveKit credentials and MUSE_APP_URL.
  • The worker joins LiveKit rooms as MUSE, listens for voice turns through LiveKit Agents, calls /api/conversation for the MUSE brain, and speaks the backend reply. Room metadata can carry current URL/code summary, PageMap, and navigation state.

Demo Script

  1. Start the app with npm run dev.
  2. Paste a short v0/React/HTML snippet into the workspace with Cmd+V, or open the F-options panel and choose Paste code.
  3. MUSE maps the page into sections and actions when URL/code input is available.
  4. Press F, then use Describe current section, Next section, Previous section, or List actions to navigate the mapped page.
  5. Add a goal such as Make this page feel more premium and accessible.
  6. Press F, choose Review UI, and wait for the structured MUSE review.
  7. Copy the v0 prompt from the result.
  8. Press Space once to start a MUSE call. First-time users should hear onboarding and be asked to confirm the heard name before it is saved.
  9. Ask what am I looking at?, does this feel premium?, or make a v0 prompt; replies should stay concise and grounded in PageMap labels/actions.
  10. Speak a short navigation command like next section or list actions, then pause. MUSE should auto-submit after the silence gap, reply concisely, and return to listening.
  11. Press Escape to end the call; MUSE saves and shows a compact session summary.
  12. Use Read aloud if ElevenLabs or Polly is configured.
  13. Press F, choose Compare versions, paste before/after snippets, and run the comparison flow.
  14. For the screenshot limitation, upload only a screenshot and choose Review UI; MUSE should ask for HTML/JSX/CSS or page code for accurate review.

Keyboard And Accessibility Notes

  • Tab reaches all primary controls.
  • F opens the command/options panel when focus is not inside a text field.
  • Space starts or resumes the MUSE call when focus is not inside a text field. During speech playback, Space interrupts MUSE and returns to listening.
  • Escape closes panels or ends the active call.
  • Voice turns auto-submit after a silence gap; the transcript display remains read-only.
  • Pasted URLs/code are mapped into sections and actions for guide-only navigation. MUSE does not click target websites from the web app.
  • Status changes are announced through role="status", with blocking errors exposed through role="alert".
  • Conversation replies include suggested commands when useful, and those suggestions are exposed in the reply region.
  • Icon-only controls have accessible labels and visible focus rings.
  • MUSE uses one concise voice personality instead of user-facing description modes. It should describe one specific part at a time, then ask whether to continue.

Known MVP Limitations

  • Screenshot-only analysis is intentionally limited. With the current MVP model path, paste HTML/JSX/CSS or a v0 preview link for accurate review.
  • PageMap navigation is server-side HTML/code mapping, not a rendered browser or extension. JavaScript-heavy pages may have missing controls.
  • Live speech-to-text requires microphone permission and GROQ_API_KEY in fallback mode. LiveKit realtime mode also requires LiveKit Cloud credentials and a separately running agent worker.
  • Read-aloud requires ELEVENLABS_API_KEY or AWS Polly credentials.
  • DynamoDB table creation is manual for the hackathon build.
  • Browser local fallback sessions stay on the current device and browser only.

Credit Protection

The API routes include in-memory per-client limits before calling paid providers:

  • Review/comparison: 8 requests per 15 minutes, max 30000 input characters.
  • Speech-to-text: 10 requests per 15 minutes, max 8 MB audio upload.
  • Text-to-speech: 20 requests per 15 minutes, max 3000 characters.
  • Conversation: 20 turns per 15 minutes, max 20000 context characters.
  • PageMap: max 90000 input characters per mapping request.
  • LiveKit token minting: 10 requests per 15 minutes, 10-minute token/call cap.

This protects the local/demo app from accidental credit burn. For multi-instance production hosting, replace this with a shared limiter such as Redis, Upstash, or a provider gateway limit.

Validation

npm run test includes deterministic multi-turn conversation evals for onboarding, PageMap orientation, section navigation, design judgment, preference learning, v0 prompts, revision comparison, screenshot-only limitations, LiveKit token behavior, and LiveKit call-cap/state transitions.

npm run lint
npm run test
npm run build

Deployment Notes

The app can deploy as a standard Next.js project on Vercel. Add the same environment variables in the deployment dashboard. Voice recording requires HTTPS in deployed environments, which Vercel provides by default. The LiveKit agent worker is a separate long-running process and should be deployed in a LiveKit Agents-compatible worker environment, with MUSE_APP_URL pointing at the Vercel app.

Project Handoff

  • Read vision.md before changing product behavior.
  • Read plan.md before starting a new phase.
  • Update lander.md after every completed phase.
  • Keep application source TypeScript-native. Do not convert app code to plain JavaScript.

Phase Status

Phases 0-8 are implemented, followed by the conversational companion, PageMap navigation, and reliable call-based review upgrades: planning, scaffold, command workspace, input flows, Groq review, voice transcription, read aloud, DynamoDB-backed sessions/profile/memory, local fallback storage, before/after comparison, final docs, focused tests, QA, onboarding, call-like voice flow, guide-only UI navigation intelligence, deterministic conversation control, structured preference memory, and multi-turn evals.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages