Splatt

Splatt Logo, a hardhat next to the words "Splatt"
Image of splatt app with videos and options visible

Inspiration

Construction sites are chaotic, dangerous, and constantly evolving. While AI has revolutionized text and image processing, it still struggles with spatial reality. Project managers currently rely on time-consuming manual walkthroughs and periodic reports that leave critical blind spots—blind spots that can mean the difference between safety and disaster.

We asked ourselves: What happens when AI finally learns to see in 3D?

What it does

Splatt transforms first-person POV videos from construction sites into comprehensive 3D spatial models. Upload multiple video perspectives, and our platform:

Reconstructs the entire construction site in 3D using 4D Gaussian Splatting technology
Maps object locations globally by combining camera positioning with AI-detected objects
Enables intelligent queries through RAG-powered semantic search
Tracks site evolution across different timestamps and angles
Identifies safety risks by cross-referencing detected objects with construction safety databases

All through an intuitive web interface—no need to attach cameras to hundreds of workers' heads.

How we built it

Our architecture combines cutting-edge spatial AI with semantic understanding:

Spatial Processing:

4D Gaussian Splatting generates 3D scenes at specific timestamps using neural network optimization
Custom lightweight sequential processor handles multiple messy, real-world video inputs
Automated filtering removes motion blur and obstructed frames

Intelligence Layer:

Gemini extracts object data and their camera-relative positions
3072-dimension embeddings enable precise semantic search matching Gemini's hidden layers
LangChain orchestrates secure data flow between components
RAG integration identifies leading safety risks in real-time

Data Infrastructure:

Supabase stores embeddings, object locations, and video metadata
Vector database enables semantic querying of spatial data
Combined camera poses + object data create comprehensive site maps

The modular design allows easy integration of emerging research like EgoGaussian for improved egocentric video processing.

Challenges we ran into

Handling messy, real-world construction footage with motion blur and obstructions
Synchronizing multiple video perspectives into a coherent 3D model
Calculating global object positions from camera-relative data
Building a RAG system that could meaningfully query spatial information
Creating seamless data flow between Gaussian Splatting, Gemini, and vector storage

Accomplishments that we're proud of

Successfully integrated 4D Gaussian Splatting with LLM-based object recognition
Built an end-to-end pipeline that transforms raw video into queryable 3D intelligence
Created a modular architecture that showcases how existing AI infrastructure can achieve spatial reasoning
Demonstrated practical application for construction safety and project management
Made spatial analysis accessible through a web interface

What we learned

Spatial AI requires creative integration—no single model solves everything
Combining camera pose data with semantic understanding unlocks powerful capabilities
Vector embeddings can bridge 3D reconstruction and natural language queries
Real-world construction footage demands robust preprocessing
Modular architecture enables rapid experimentation with emerging research

What's next for Splatt

Integration with EgoGaussian for improved first-person video processing
Real-time processing for live site monitoring
Predictive analytics for workflow optimization
Automated compliance reporting for safety violations
Mobile app for on-site access
Multi-site comparison features for project managers overseeing multiple locations