fibostudio - AI-Powered 3D Product Photography Studio
Inspiration
The inspiration for fibostudio came from observing the massive gap between how product photography is traditionally done and how it could be done with modern AI. We noticed:
- E-commerce businesses spend thousands on professional photographers and studio equipment
- Content creators struggle to maintain visual consistency across product catalogs
- Small businesses can't afford professional photography for their products
- Iteration cycles are painfully slow - one photoshoot can take hours or days
We asked ourselves: What if you could design your product scene visually, then let AI generate perfect product photos that match exactly what you see?
The breakthrough came when we realized that combining 3D scene composition with AI image generation could eliminate the photographer entirely. Users could become their own studio directors - positioning objects, adjusting lighting, and speaking commands to modify their scene in real-time.
What it does
fibostudio is a web-based virtual photography studio that revolutionizes product image creation:
Core Functionality
3D Scene Editor
- Drag-and-drop object positioning with real-time 3D preview
- Intuitive transform controls (Move, Rotate, Scale)
- Real-time lighting adjustments (key light, fill light, rim light)
- Background and environment customization
- Multiple mood presets (Clean, Dark, Warm, Cool)
Voice-Controlled Studio Director
- Speak natural language commands to modify your scene
- "Make it look cinematic with dramatic lighting"
- "Add warm golden hour lighting"
- "Change background to white"
- Real-time voice feedback using Eleven Labs text-to-speech
AI-Powered Image Generation
- Detects exact camera angle from 3D preview
- Generates photorealistic product images matching your scene
- Two generation modes:
- Plain: White background, minimal styling, clean product shots
- Professional: YouTube-ready, product launch quality, dramatic lighting
Project Management
- Save and organize multiple product photography projects
- View production gallery of all generated images
- Download high-quality product photos
- User authentication with JWT
Exact View Matching
- AI understands precise camera positioning (front, side, top-down, angles)
- Generates images that match your 3D preview exactly
- No AI creativity - follows your specifications precisely
- Respects object positioning, rotation, and scale
How we built it
Architecture
Frontend Stack:
- React 18 with TypeScript for type-safe UI
- Three.js + React Three Fiber for 3D scene rendering
- Tailwind CSS for responsive styling
- Web Speech API for voice input
- Eleven Labs API for voice synthesis
Backend Stack:
- Node.js + Express for REST API
- MongoDB Atlas for data persistence
- JWT authentication for secure user sessions
- Middleware for CORS and request validation
AI Integration:
- Google Gemini 3 for natural language understanding of voice commands
- FAL.ai for reliable image generation
- Eleven Labs for natural voice synthesis and feedback
Key Implementation Details
3D Scene Management
- Real-time camera angle detection from Three.js scene
- Precise angle calculations (horizontal and vertical degrees)
- Distance-based framing detection (close-up, medium, full, wide shots)
- Object rotation and positioning tracking
Voice-to-Scene Pipeline
- Web Speech API captures user voice
- Transcribed text sent to Gemini for scene understanding
- Gemini generates structured scene modifications
- React state updates 3D scene in real-time
- Eleven Labs synthesizes confirmation response
Image Generation Pipeline
- Frontend detects exact camera view from 3D preview
- Builds precise prompt describing view, style, and specifications
- Backend proxies request to FAL.ai with strict constraints
- Generated image returned and displayed in production gallery
Prompt Engineering
- Single-paragraph prompts for clarity
- Exact view descriptions (e.g., "front view at eye level")
- Style-specific instructions (plain vs professional)
- No JSON parameters - pure natural language
Development Process
- Phase 1: Built 3D scene editor with Three.js and React Three Fiber
- Phase 2: Integrated Gemini for natural language scene modifications
- Phase 3: Added voice input with Web Speech API
- Phase 4: Implemented Eleven Labs for voice feedback
- Phase 5: Integrated FAL.ai for image generation
- Phase 6: Refined prompt engineering for exact view matching
- Phase 7: Added project management and user authentication
Challenges we ran into
1. Exact View Matching
Challenge: AI was generating images from different camera angles than the 3D preview showed.
Solution:
- Implemented precise angle detection (horizontal and vertical degrees)
- Created detailed spatial descriptions in prompts
- Added distance-based framing detection
- Built view type classification system (top-down, high-angle, eye-level, low-angle)
2. CORS and API Integration
Challenge: Direct frontend-to-external-API calls were blocked by CORS policies.
Solution:
- Built backend proxy for all external API calls
- Configured CORS middleware for frontend origin
- Centralized API key management on backend
3. Voice Recognition Reliability
Challenge: Web Speech API was inconsistent and sometimes failed silently.
Solution:
- Implemented proper error handling and user feedback
- Added silence detection (2-second timeout)
- Auto-submit on speech completion
- Clear visual indicators for recording state
4. Prompt Complexity
Challenge: Complex JSON-based prompts confused the AI and resulted in wrong images.
Solution:
- Simplified to single-paragraph natural language prompts
- Removed all JSON parameters from prompts
- Focused on clear, direct instructions
- Added style-specific prompt variations
5. API Key Management
Challenge: Managing multiple API keys (Gemini, FAL.ai, Eleven Labs) across environments.
Solution:
- Centralized environment variable configuration
- Separate .env files for frontend and backend
- Clear documentation for setup
- Fallback mechanisms for API failures
6. Real-time 3D Rendering Performance
Challenge: Smooth 3D preview with complex lighting calculations.
Solution:
- Optimized Three.js scene with proper LOD
- Efficient lighting setup with key, fill, and rim lights
- Responsive camera controls with OrbitControls
- Proper shadow mapping configuration
7. Exact Object Positioning
Challenge: Objects starting in side view but AI generating front views.
Solution:
- Implemented precise camera angle detection from 3D scene
- Added degree-based angle calculations
- Created view type classification system
- Included exact positioning in prompts
Accomplishments that we're proud of
1. Voice-Controlled Studio Director
Successfully integrated Web Speech API, Gemini, and Eleven Labs to create a seamless voice-controlled interface. Users can literally speak their vision and watch the studio update in real-time.
2. Exact View Matching
Solved the critical problem of AI generating images that match the exact camera angle from the 3D preview. This required sophisticated angle detection and prompt engineering.
3. Full-Stack Integration
Built a complete end-to-end system connecting:
- 3D scene editor (Three.js)
- Natural language processing (Gemini)
- Voice I/O (Web Speech API + Eleven Labs)
- Image generation (FAL.ai)
- User management (MongoDB + JWT)
4. Intuitive User Experience
Created an interface so intuitive that users can generate professional product photos without any photography knowledge.
5. Plain vs Professional Modes
Implemented two distinct generation modes that produce completely different results based on user intent - from minimal white-background shots to YouTube-ready professional photography.
6. Real-time Feedback Loop
Built a system where users see their 3D scene, hear voice confirmation, and get generated images - all in a seamless workflow.
7. Scalable Architecture
Designed a backend that can handle multiple concurrent image generation requests with proper error handling and fallbacks.
What we learned
1. Prompt Engineering is Critical
- Simple, direct prompts work better than complex JSON structures
- Natural language is more effective than structured data for AI
- Single-paragraph prompts are clearer than multi-part instructions
2. Camera Angle Detection Requires Precision
- Exact degree calculations matter for view matching
- Combining horizontal and vertical angles creates unique view types
- Distance-based framing significantly impacts composition
3. Voice Interfaces Need Careful UX
- Users need clear visual feedback during recording
- Silence detection is essential for auto-submission
- Error messages should be helpful, not technical
4. API Integration Complexity
- CORS issues require backend proxying
- Multiple API keys need centralized management
- Fallback mechanisms are essential for reliability
5. 3D Rendering Performance Matters
- Proper scene optimization is crucial for smooth interaction
- Lighting calculations can be expensive
- Real-time updates require efficient state management
6. User Authentication is Non-Negotiable
- JWT tokens provide secure, stateless authentication
- Demo mode is valuable for user onboarding
- Project persistence requires robust database design
7. AI Creativity vs Precision
- Sometimes you want AI to follow instructions exactly, not be creative
- Negative prompts are as important as positive ones
- Style consistency requires explicit specification
What's next for fibostudio
Short Term (Next 1-2 months)
Enhanced Voice Commands
- Support for more complex scene modifications
- Batch operations ("Generate 5 variations")
- Undo/redo for voice commands
Improved Image Generation
- Support for multiple objects in single scene
- Custom background images
- Texture and material customization
Performance Optimization
- Faster image generation with caching
- Optimized 3D scene rendering
- Progressive image loading
Medium Term (3-6 months)
Advanced Features
- 360-degree product views
- Animation support (rotating products)
- A/B testing for different product shots
- Batch generation with variations
Collaboration Tools
- Team projects and sharing
- Comments and feedback on generated images
- Version history and rollback
Integration Ecosystem
- Shopify integration for direct product uploads
- WooCommerce plugin
- API for third-party developers
- Zapier integration for automation
Long Term (6-12 months)
Enterprise Features
- White-label solution for agencies
- Custom branding and watermarks
- Advanced analytics and reporting
- Bulk processing for large catalogs
AI Enhancements
- Fine-tuned models for specific product categories
- Automatic lighting optimization
- Smart background suggestions
- Consistency across product lines
Monetization
- Freemium model with usage limits
- Pro tier with unlimited generations
- Enterprise licensing
- API access for developers
Mobile Experience
- Native mobile apps (iOS/Android)
- Mobile-optimized 3D editor
- Voice commands on mobile
- Offline support
Vision
fibostudio aims to become the standard tool for product photography in e-commerce. We envision a future where:
- Small businesses can compete with large enterprises on product image quality
- Content creators generate consistent, professional product shots in minutes
- E-commerce platforms offer built-in product photography tools
- Photographers focus on creative direction rather than technical execution
- Product launches happen faster with AI-generated marketing materials
The ultimate goal is to democratize professional product photography and make it accessible to everyone.
Technical Specifications
Frontend:
- React 18, TypeScript, Vite
- Three.js, React Three Fiber, @react-three/drei
- Tailwind CSS, Lucide React icons
- Web Speech API, Eleven Labs SDK
Backend:
- Node.js, Express, TypeScript
- MongoDB Atlas, Mongoose ODM
- JWT authentication, bcryptjs
- CORS middleware, error handling
APIs:
- Google Gemini 3 (natural language)
- FAL.ai (image generation)
- Eleven Labs (voice synthesis)

Log in or sign up for Devpost to join the conversation.