What it does SculptNet is a browser-based WebXR application that turns your webcam into an AR canvas. Using simple hand gestures (e.g., pinch to adjust FOV, wave to sweep lighting, frame with hands for composition), users dynamically modify FIBO's structured JSON parameters in real-time. A trigger gesture generates a new image via Bria's API, overlaying it in 3D space on your webcam feed. It bridges abstract JSON controls with tactile interaction, enabling rapid iteration on cinematic visuals, product mocks, or concept art—all with zero hardware beyond a laptop/phone. How we built it We built SculptNet entirely in the browser for accessibility:

AR Rendering: A-Frame with AR.js for markerless webcam AR scenes. Gesture Detection: MediaPipe Tasks Vision (@mediapipe/tasks-vision) for real-time hand landmark tracking and gesture mapping (e.g., pinch distance → camera.fov delta). FIBO Integration: Direct calls to Bria AI's v2 API endpoints (/structured_prompt/generate for VLM expansion + /image/generate for JSON-to-image). We maintained a dynamic JSON object, validated with AJV, and handled base64 responses for instant overlays. Polish: Web Vibration API for haptic feedback on "balanced" compositions; Fetch API for async calls; deployed on Vercel for live demos. Everything runs client-side with cloud inference—no backend needed.

Challenges we ran into

Mapping fluid hand gestures accurately to discrete JSON parameters without lag (solved by optimizing MediaPipe loop to 15-30fps and debouncing deltas). Handling API latency (5-15s per generation)—we added spinners, progressive previews, and fallback images. Webcam AR stability across devices (tuned AR.js for markerless tracking and tested on Chrome flags). Balancing intuitive gestures with FIBO's deep schema (iterated on mappings like wrist rotation for dutch tilts).

Accomplishments that we're proud of

Creating a truly novel UX that makes FIBO's JSON feel magical and embodied—judges can "sculpt" images hands-on in our demo video. Achieving a polished, zero-cost prototype that runs on any device, showcasing FIBO's controllability in an unexpected, immersive way. Deep native integration with Bria's API, including VLM expansion for gesture-to-full-JSON enrichment. Building something that feels like the future of creative tools: tactile, collaborative-ready, and pro-export capable.

What we learned We gained hands-on mastery of FIBO's disentangled parameters and how structured JSON unlocks precise, scalable generation. Integrating MediaPipe with WebXR taught us the power (and quirks) of browser-based computer vision. Most importantly, we learned that the best innovations hide complexity—turning code-heavy controls into natural body movements dramatically lowers barriers for creatives. What's next for SculptNet

Multi-user collaboration via Socket.io for remote team sculpting sessions. Mobile-first optimizations and phone AR (WebXR hit-testing). Advanced exports (layered PSDs with JSON metadata) and integrations with tools like Blender/Nuke. Haptic enhancements on devices and potential Bria partnership for deeper VLM gesture interpretation. Expand to video previs or 3D asset generation, evolving into a full professional embodied AI workflow tool.

Built With

  • nextjs
Share this project:

Updates