fibostudio - AI-Powered 3D Product Photography Studio

Inspiration

The inspiration for fibostudio came from observing the massive gap between how product photography is traditionally done and how it could be done with modern AI. We noticed:

E-commerce businesses spend thousands on professional photographers and studio equipment
Content creators struggle to maintain visual consistency across product catalogs
Small businesses can't afford professional photography for their products
Iteration cycles are painfully slow - one photoshoot can take hours or days

We asked ourselves: What if you could design your product scene visually, then let AI generate perfect product photos that match exactly what you see?

The breakthrough came when we realized that combining 3D scene composition with AI image generation could eliminate the photographer entirely. Users could become their own studio directors - positioning objects, adjusting lighting, and speaking commands to modify their scene in real-time.

What it does

fibostudio is a web-based virtual photography studio that revolutionizes product image creation:

Core Functionality

3D Scene Editor
- Drag-and-drop object positioning with real-time 3D preview
- Intuitive transform controls (Move, Rotate, Scale)
- Real-time lighting adjustments (key light, fill light, rim light)
- Background and environment customization
- Multiple mood presets (Clean, Dark, Warm, Cool)
Voice-Controlled Studio Director
- Speak natural language commands to modify your scene
- "Make it look cinematic with dramatic lighting"
- "Add warm golden hour lighting"
- "Change background to white"
- Real-time voice feedback using Eleven Labs text-to-speech
AI-Powered Image Generation
- Detects exact camera angle from 3D preview
- Generates photorealistic product images matching your scene
- Two generation modes:
  - Plain: White background, minimal styling, clean product shots
  - Professional: YouTube-ready, product launch quality, dramatic lighting
Project Management
- Save and organize multiple product photography projects
- View production gallery of all generated images
- Download high-quality product photos
- User authentication with JWT
Exact View Matching
- AI understands precise camera positioning (front, side, top-down, angles)
- Generates images that match your 3D preview exactly
- No AI creativity - follows your specifications precisely
- Respects object positioning, rotation, and scale

How we built it

Architecture

Frontend Stack:

React 18 with TypeScript for type-safe UI
Three.js + React Three Fiber for 3D scene rendering
Tailwind CSS for responsive styling
Web Speech API for voice input
Eleven Labs API for voice synthesis

Backend Stack:

Node.js + Express for REST API
MongoDB Atlas for data persistence
JWT authentication for secure user sessions
Middleware for CORS and request validation

AI Integration:

Google Gemini 3 for natural language understanding of voice commands
FAL.ai for reliable image generation
Eleven Labs for natural voice synthesis and feedback

Key Implementation Details

3D Scene Management
- Real-time camera angle detection from Three.js scene
- Precise angle calculations (horizontal and vertical degrees)
- Distance-based framing detection (close-up, medium, full, wide shots)
- Object rotation and positioning tracking
Voice-to-Scene Pipeline
- Web Speech API captures user voice
- Transcribed text sent to Gemini for scene understanding
- Gemini generates structured scene modifications
- React state updates 3D scene in real-time
- Eleven Labs synthesizes confirmation response
Image Generation Pipeline
- Frontend detects exact camera view from 3D preview
- Builds precise prompt describing view, style, and specifications
- Backend proxies request to FAL.ai with strict constraints
- Generated image returned and displayed in production gallery
Prompt Engineering
- Single-paragraph prompts for clarity
- Exact view descriptions (e.g., "front view at eye level")
- Style-specific instructions (plain vs professional)
- No JSON parameters - pure natural language

Development Process

Phase 1: Built 3D scene editor with Three.js and React Three Fiber
Phase 2: Integrated Gemini for natural language scene modifications
Phase 3: Added voice input with Web Speech API
Phase 4: Implemented Eleven Labs for voice feedback
Phase 5: Integrated FAL.ai for image generation
Phase 6: Refined prompt engineering for exact view matching
Phase 7: Added project management and user authentication

Challenges we ran into

1. Exact View Matching

Challenge: AI was generating images from different camera angles than the 3D preview showed.

Solution:

Implemented precise angle detection (horizontal and vertical degrees)
Created detailed spatial descriptions in prompts
Added distance-based framing detection
Built view type classification system (top-down, high-angle, eye-level, low-angle)

2. CORS and API Integration

Challenge: Direct frontend-to-external-API calls were blocked by CORS policies.

Solution:

Built backend proxy for all external API calls
Configured CORS middleware for frontend origin
Centralized API key management on backend

3. Voice Recognition Reliability

Challenge: Web Speech API was inconsistent and sometimes failed silently.

Solution:

Implemented proper error handling and user feedback
Added silence detection (2-second timeout)
Auto-submit on speech completion
Clear visual indicators for recording state

4. Prompt Complexity

Challenge: Complex JSON-based prompts confused the AI and resulted in wrong images.

Solution:

Simplified to single-paragraph natural language prompts
Removed all JSON parameters from prompts
Focused on clear, direct instructions
Added style-specific prompt variations

5. API Key Management

Challenge: Managing multiple API keys (Gemini, FAL.ai, Eleven Labs) across environments.

Solution:

Centralized environment variable configuration
Separate .env files for frontend and backend
Clear documentation for setup
Fallback mechanisms for API failures

6. Real-time 3D Rendering Performance

Challenge: Smooth 3D preview with complex lighting calculations.

Solution:

Optimized Three.js scene with proper LOD
Efficient lighting setup with key, fill, and rim lights
Responsive camera controls with OrbitControls
Proper shadow mapping configuration

7. Exact Object Positioning

Challenge: Objects starting in side view but AI generating front views.

Solution:

Implemented precise camera angle detection from 3D scene
Added degree-based angle calculations
Created view type classification system
Included exact positioning in prompts

Accomplishments that we're proud of

1. Voice-Controlled Studio Director

Successfully integrated Web Speech API, Gemini, and Eleven Labs to create a seamless voice-controlled interface. Users can literally speak their vision and watch the studio update in real-time.

2. Exact View Matching

Solved the critical problem of AI generating images that match the exact camera angle from the 3D preview. This required sophisticated angle detection and prompt engineering.

3. Full-Stack Integration

Built a complete end-to-end system connecting:

3D scene editor (Three.js)
Natural language processing (Gemini)
Voice I/O (Web Speech API + Eleven Labs)
Image generation (FAL.ai)
User management (MongoDB + JWT)

4. Intuitive User Experience

Created an interface so intuitive that users can generate professional product photos without any photography knowledge.

5. Plain vs Professional Modes

Implemented two distinct generation modes that produce completely different results based on user intent - from minimal white-background shots to YouTube-ready professional photography.

6. Real-time Feedback Loop

Built a system where users see their 3D scene, hear voice confirmation, and get generated images - all in a seamless workflow.

7. Scalable Architecture

Designed a backend that can handle multiple concurrent image generation requests with proper error handling and fallbacks.

What we learned

1. Prompt Engineering is Critical

Simple, direct prompts work better than complex JSON structures
Natural language is more effective than structured data for AI
Single-paragraph prompts are clearer than multi-part instructions

2. Camera Angle Detection Requires Precision

Exact degree calculations matter for view matching
Combining horizontal and vertical angles creates unique view types
Distance-based framing significantly impacts composition

3. Voice Interfaces Need Careful UX

Users need clear visual feedback during recording
Silence detection is essential for auto-submission
Error messages should be helpful, not technical

4. API Integration Complexity

CORS issues require backend proxying
Multiple API keys need centralized management
Fallback mechanisms are essential for reliability

5. 3D Rendering Performance Matters

Proper scene optimization is crucial for smooth interaction
Lighting calculations can be expensive
Real-time updates require efficient state management

6. User Authentication is Non-Negotiable

JWT tokens provide secure, stateless authentication
Demo mode is valuable for user onboarding
Project persistence requires robust database design

7. AI Creativity vs Precision

Sometimes you want AI to follow instructions exactly, not be creative
Negative prompts are as important as positive ones
Style consistency requires explicit specification

What's next for fibostudio

Short Term (Next 1-2 months)

Enhanced Voice Commands
- Support for more complex scene modifications
- Batch operations ("Generate 5 variations")
- Undo/redo for voice commands
Improved Image Generation
- Support for multiple objects in single scene
- Custom background images
- Texture and material customization
Performance Optimization
- Faster image generation with caching
- Optimized 3D scene rendering
- Progressive image loading

Medium Term (3-6 months)

Advanced Features
- 360-degree product views
- Animation support (rotating products)
- A/B testing for different product shots
- Batch generation with variations
Collaboration Tools
- Team projects and sharing
- Comments and feedback on generated images
- Version history and rollback
Integration Ecosystem
- Shopify integration for direct product uploads
- WooCommerce plugin
- API for third-party developers
- Zapier integration for automation

Long Term (6-12 months)

Enterprise Features
- White-label solution for agencies
- Custom branding and watermarks
- Advanced analytics and reporting
- Bulk processing for large catalogs
AI Enhancements
- Fine-tuned models for specific product categories
- Automatic lighting optimization
- Smart background suggestions
- Consistency across product lines
Monetization
- Freemium model with usage limits
- Pro tier with unlimited generations
- Enterprise licensing
- API access for developers
Mobile Experience
- Native mobile apps (iOS/Android)
- Mobile-optimized 3D editor
- Voice commands on mobile
- Offline support

Vision

fibostudio aims to become the standard tool for product photography in e-commerce. We envision a future where:

Small businesses can compete with large enterprises on product image quality
Content creators generate consistent, professional product shots in minutes
E-commerce platforms offer built-in product photography tools
Photographers focus on creative direction rather than technical execution
Product launches happen faster with AI-generated marketing materials

The ultimate goal is to democratize professional product photography and make it accessible to everyone.