A 3D urban planning platform where you design buildings with your voice, place them on a live map of Kingston, and simulate environmental impact -- before the first shovel hits the ground.
Built for the City of Kingston at QHacks 2026.
KingsView is a real-time, voice-driven urban planning simulator built on top of a 3D model of Kingston, Ontario. It allows city planners, residents, and public officials to design buildings using natural language, place them at real-world coordinates, and immediately see the projected environmental, traffic, and community impact.
The platform combines three core technologies into a single interactive experience:
- ElevenLabs provides the voice and audio layer -- real-time narration of building designs, AI-generated sound effects for every editor action, and spoken feedback that makes the tool accessible to users who cannot or prefer not to read dense technical output.
- Google Gemini 2.5 Flash serves as the reasoning engine -- interpreting freeform speech, resolving ambiguity, generating structured building parameters, producing environmental impact reports, and recommending tree species from Kingston's municipal database.
- Three.js and React Three Fiber power a full 3D simulation of Kingston with over 100 vehicles, real traffic signals, A* pathfinding, construction noise propagation, and zoning overlays.
The result is a platform where anyone -- regardless of technical skill, visual ability, or planning expertise -- can participate in shaping their city.
Try it now: https://kingsview.vercel.app
- Open the Build page -- Navigate to the Building Editor.
- Use Voice Design -- Click "Design with Voice" and say something like:
- "Make me a 5-story modern glass tower with a flat roof"
- "A small brick house with arched windows and a gable roof"
- "A 10-floor concrete office building, 20 meters wide"
- Listen -- ElevenLabs will speak back a confirmation of what was built.
- Hear the sounds -- Every action in the editor (adding floors, resizing, rotating, placing windows) triggers an AI-generated sound effect created by ElevenLabs.
- Open the Map -- Place your building on Kingston's live 3D map.
- Zoom into a street -- Scroll in close to street level and hear an AI-generated metropolitan city ambiance produced by ElevenLabs in real time. The volume fades smoothly based on how close you are to the ground.
- Generate an Environmental Report -- See carbon footprint, noise levels, habitat impact, and community effects analyzed by Gemini.
- Ask the Tree Advisor -- Get tree recommendations from Kingston's official planting program, powered by Gemini.
KingsView makes urban planning conversational, audible, and visual -- all at once.
- The user speaks a building description (e.g., "A 3-story brick building with round windows")
- Gemini interprets the request and generates a structured JSON configuration
- The configuration is validated against a Zod schema with automatic retry on failure
- The 3D building renders instantly in the editor
- ElevenLabs narrates a spoken confirmation of what was built
- The user places the building on Kingston's 3D map, sets a construction timeline, and generates a full environmental impact report
Non-technical users, seniors, and visually impaired users can participate in city design. The barrier to civic engagement drops from "knows CAD software" to "can describe a building in a sentence."
Voice Input → Web Speech API → /api/design → Gemini 2.5 Flash (parse + validate)
↓
3D Building Editor (Three.js)
+ ElevenLabs Sound Effects (9 AI-generated sounds)
↓
/api/speak → ElevenLabs TTS (spoken confirmation)
↓
3D Kingston Map (100+ vehicles, traffic, zoning)
+ /api/street-sound → ElevenLabs Sound Gen (city ambiance on zoom)
↓
/api/environmental-report → Gemini (carbon, noise, habitat, community)
/api/tree-advisor → Gemini (40+ Kingston tree species, planting advice)
ElevenLabs is not a cosmetic addition to KingsView. It is a core layer of the platform that makes the tool accessible, engaging, and usable in contexts where visual interfaces alone are insufficient.
When a user designs a building by voice, KingsView does not simply display the result on screen. It speaks the result back.
After Gemini generates a building configuration, a one-sentence confirmation is sent to the ElevenLabs Text-to-Speech API via the /api/speak endpoint. The confirmation is streamed as MP3 audio and played immediately in the browser.
This voice feedback serves several critical purposes:
- Accessibility: Users with visual impairments or reading difficulties receive confirmation of their design without needing to read anything on screen.
- Hands-free interaction: Users operating the tool in a presentation, public meeting, or classroom setting can keep their attention on the 3D view while receiving audio confirmation.
- Public consultation: In a city hall presentation, a planner can speak a building description and the audience hears the system respond -- creating a conversational, transparent design process.
- Error correction: If the system misinterprets a request, the spoken confirmation makes the misinterpretation immediately obvious, allowing the user to correct it in their next voice command.
Every interaction in the Building Editor is accompanied by a sound effect generated by the ElevenLabs Sound Generation API. These are not stock audio files. Each sound was generated from a natural-language prompt describing the desired audio experience.
Nine custom sounds were generated:
| Editor Action | ElevenLabs Prompt | Duration |
|---|---|---|
| Place object | "A fast whoosh followed by a soft landing thud, like something flying in and dropping into place" | 1.0s |
| Add floor | "A satisfying plastic lego brick snapping and clicking into place, crisp snap click sound, short and punchy" | 0.6s |
| Resize building | "A rubber stretching and elastic pulling sound, like a material being stretched out longer with tension" | 1.0s |
| Change texture | "A quick light whoosh, like a card being flipped or a page turning fast in the wind" | 0.6s |
| Place brick | "A fast smooth whoosh sound effect, like an object flying through the air and landing" | 0.8s |
| Rotate object | "A quick spinning whoosh, like something rotating fast through the air with a smooth swooshing wind sound" | 0.7s |
| Move object | "A smooth gliding whoosh, like an object sliding quickly through the air" | 0.8s |
| Edit window | "A light airy whoosh, like a curtain being pulled open quickly" | 0.6s |
| Add window | "A solid block clicking into place with a satisfying snap and a short bam" | 0.7s |
The sound generation script (scripts/generate-sounds.mjs) calls the ElevenLabs Sound Generation API at https://api.elevenlabs.io/v1/sound-generation for each prompt and saves the resulting MP3 files to public/sounds/building/.
Sound playback architecture:
The SoundManager class (lib/editor/utils/SoundManager.ts) manages all sound playback with:
- Audio caching: All nine sounds are preloaded on first use to eliminate latency.
- Cooldown system: Each sound has a cooldown (100--400ms) to prevent overlap during rapid interactions like slider adjustments.
- Clone-based playback: Audio elements are cloned for each play event, allowing multiple simultaneous sounds.
- Volume control and mute toggle: Users can adjust or disable sounds.
When a user zooms into street level on the 3D Kingston map, ElevenLabs generates a metropolitan city ambiance in real time. The /api/street-sound endpoint calls the ElevenLabs Sound Generation API with a prompt describing a busy urban street -- car horns, engines, pedestrians, distant sirens -- and streams the resulting audio back to the browser.
How it works:
- The 3D map tracks the camera's distance from the ground in every animation frame.
- When the camera crosses the street-level threshold (< 200 world units), the client calls
/api/street-sound. - ElevenLabs generates a 5-second metropolitan city ambiance clip from a natural-language prompt.
- The audio plays in a loop with volume that scales smoothly based on zoom distance -- closer to the street means louder ambiance.
- When the user zooms back out, the sound fades and stops.
- The audio is cached client-side so subsequent zoom-ins replay instantly without another API call.
This creates an immersive experience where zooming into Kingston's streets feels like walking down a real city block. The sound is not a pre-recorded stock file -- it is generated by ElevenLabs from a text description, the same way the editor sound effects are created.
Google Gemini 2.5 Flash is the reasoning engine behind three core features.
Converts natural language like "Make me a tall glass building with round windows" into a structured building configuration (floors, dimensions, materials, roof, windows, color). Gemini resolves ambiguity -- "tall" becomes 8 floors, "glass" maps to both texture and wall color, "round windows" resolves to the circular enum. Output is validated against a Zod schema with up to 3 automatic retries. Supports incremental editing: "Make it taller" updates only the relevant fields.
Generates a full environmental and societal impact report for buildings placed on Kingston's map. Covers carbon footprint, habitat disruption, water impact, air quality, traffic projections, noise levels, community effects, risk classification, mitigation measures, and overall sustainability scores (0--100). Grounded in Kingston's geography, zoning, and Great Lakes-St. Lawrence ecosystem.
Recommends tree species from Kingston's Neighbourhood Tree Planting Program -- a real municipal dataset of 40+ species. Returns species selection, planting density, radius, reasoning, and tips. All recommendations are validated against the official Kingston dataset.
KingsView was designed with accessibility as a primary constraint, not an afterthought. The entire building design workflow can be completed without touching a keyboard or reading output -- press "Design with Voice," describe the building, and hear the spoken confirmation from ElevenLabs.
This serves visually impaired users, seniors, non-native English speakers, and motor-impaired users. The user never needs to learn parameter names, unit systems, or menu structures. KingsView requires the ability to describe a building in a sentence -- nothing more.
- City planning -- Describe a development, place it on the map, and generate an environmental impact report in under a minute.
- Public consultations -- Project KingsView at a town hall. A facilitator speaks, the audience sees the 3D result, and ElevenLabs narrates the output. Residents suggest modifications verbally in real time.
- Education -- Students prototype developments and see environmental consequences without learning CAD software.
- Real estate -- Developers test designs against zoning codes and generate preliminary impact reports before engaging consultants.
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS, Framer Motion |
| 3D Engine | Three.js, React Three Fiber, Drei |
| AI Reasoning | Google Gemini 2.5 Flash (voice design, environmental report, tree advisor) |
| Voice and Sound | ElevenLabs Text-to-Speech API, ElevenLabs Sound Generation API (editor effects + real-time street ambiance), Web Speech API |
| Validation | Zod (schema validation with retry) |
| Geospatial | Turf.js, OpenStreetMap data, lat/lng projection |
| Traffic Simulation | A* pathfinding, spatial grid collision detection, signal coordination, vehicle state machine |
| Data Sources | Kingston Official Plan zoning (76 zones), Kingston tree planting program (40+ species), OpenStreetMap buildings and roads, traffic signal locations |
| Export | GLB (3D model), GeoJSON (geospatial data) |
git clone https://github.com/Lemirq/qhacks.git && cd qhacks
npm installAdd API keys to .env.local:
GEMINI_API_KEY=your_gemini_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_keynpm run dev # Start at http://localhost:3000
npm run generate-sounds # Regenerate ElevenLabs sound effectsBuilt at QHacks 2026 by:
- Phineas Truong
- Jack Le
- Vihaan Sharma
- Dhan Narula