Inspiration
Construction sites are chaotic, dangerous, and constantly evolving. While AI has revolutionized text and image processing, it still struggles with spatial reality. Project managers currently rely on time-consuming manual walkthroughs and periodic reports that leave critical blind spots—blind spots that can mean the difference between safety and disaster.
We asked ourselves: What happens when AI finally learns to see in 3D?
What it does
Splatt transforms first-person POV videos from construction sites into comprehensive 3D spatial models. Upload multiple video perspectives, and our platform:
- Reconstructs the entire construction site in 3D using 4D Gaussian Splatting technology
- Maps object locations globally by combining camera positioning with AI-detected objects
- Enables intelligent queries through RAG-powered semantic search
- Tracks site evolution across different timestamps and angles
- Identifies safety risks by cross-referencing detected objects with construction safety databases
All through an intuitive web interface—no need to attach cameras to hundreds of workers' heads.
How we built it
Our architecture combines cutting-edge spatial AI with semantic understanding:
Spatial Processing:
- 4D Gaussian Splatting generates 3D scenes at specific timestamps using neural network optimization
- Custom lightweight sequential processor handles multiple messy, real-world video inputs
- Automated filtering removes motion blur and obstructed frames
Intelligence Layer:
- Gemini extracts object data and their camera-relative positions
- 3072-dimension embeddings enable precise semantic search matching Gemini's hidden layers
- LangChain orchestrates secure data flow between components
- RAG integration identifies leading safety risks in real-time
Data Infrastructure:
- Supabase stores embeddings, object locations, and video metadata
- Vector database enables semantic querying of spatial data
- Combined camera poses + object data create comprehensive site maps
The modular design allows easy integration of emerging research like EgoGaussian for improved egocentric video processing.
Challenges we ran into
- Handling messy, real-world construction footage with motion blur and obstructions
- Synchronizing multiple video perspectives into a coherent 3D model
- Calculating global object positions from camera-relative data
- Building a RAG system that could meaningfully query spatial information
- Creating seamless data flow between Gaussian Splatting, Gemini, and vector storage
Accomplishments that we're proud of
- Successfully integrated 4D Gaussian Splatting with LLM-based object recognition
- Built an end-to-end pipeline that transforms raw video into queryable 3D intelligence
- Created a modular architecture that showcases how existing AI infrastructure can achieve spatial reasoning
- Demonstrated practical application for construction safety and project management
- Made spatial analysis accessible through a web interface
What we learned
- Spatial AI requires creative integration—no single model solves everything
- Combining camera pose data with semantic understanding unlocks powerful capabilities
- Vector embeddings can bridge 3D reconstruction and natural language queries
- Real-world construction footage demands robust preprocessing
- Modular architecture enables rapid experimentation with emerging research
What's next for Splatt
- Integration with EgoGaussian for improved first-person video processing
- Real-time processing for live site monitoring
- Predictive analytics for workflow optimization
- Automated compliance reporting for safety violations
- Mobile app for on-site access
- Multi-site comparison features for project managers overseeing multiple locations
Built With
4d-gaussian-splatting gemini langchain supabase vector-database rag computer-vision spatial-ai next.js python
Log in or sign up for Devpost to join the conversation.