Inspiration

As manual labor shifts toward robotic automation, we saw an opportunity to give robots live spatial intelligence so they can manage entire factories, warehouses, supply chains, and retail stores.

Current solutions rely on LIDAR and raw camera footage, which fail to fully understand spatial geometry. They can’t create fully labeled 3D maps, leaving robots error-prone and unable to scale safely across large spaces.

We wanted to create a solution that generates labeled 3D datasets from just minutes of video footage, marking objects, obstacles, and reference points in the environment. This structured labeling gives robots a complete understanding of the space, letting them navigate, plan, and act without trial-and-error.

With Strata’s 3D datasets, companies can now build capable robotic hardware that can autonomously complete cognitively demanding tasks. We believe that the future lies in fully autonomous warehouses, factories, and retail stores.

What it does

Strata is an intelligent spatial mapping platform that turns short video footage of any space into labeled 3D datasets, giving robots full spatial intelligence to navigate, plan, and perform tasks safely.

  1. Using 3D reconstruction, video clips are processed in real-time using Gaussian Splatting to generate high-fidelity, navigable 3D maps of any environment.
  2. SAM3 (pre-trained model) automatically labels objects and reference points in the 3D space, providing complete situational awareness for robots.
  3. The reconstructed and labeled environment is compiled into a digital twin that robots can immediately use to plan paths and perform complex tasks.
  4. Robots gain a complete understanding of the space before moving, minimizing errors, and accelerating deployment in warehouses, factories, and retail spaces

Additional features

  1. Works with just a few minutes of video
  2. Supports warehouses, factories, processing plants, and stores of any size
  3. Provides predictive planning for logistics
  4. Analysis of footage (number of objects)

How we built it

Our tech stack includes: VITE Webserver, Flask, WebSocket, SocketIO, Three.js, OpenCV, TypeScript, Pi3x, SAM3, Gaussian Splatting, SVD, CUDA, Amazon EC2, OpenAI, Gemini, Vercel.

Our pipeline combines modern tools to turn simple video footage into high-fidelity, labeled 3D environments that robots can use immediately:

  1. Using 3D reconstruction, short video clips are converted into dense, high-resolution 3D maps, capturing geometry, surfaces, and obstacles (Gaussian Splatting, SVD, Pi3x).
  2. Objects, obstacles, and reference points are labeled directly in 3D, giving robots structured spatial intelligence without manual labeling (SAM3).
  3. The reconstructed and labeled environment is compiled into a digital twin, ready for robot path planning, navigation, and task execution (Three.js, TypeScript, VITE Webserver, Flask).
  4. As the camera streams, the map is constantly updated with low latency (WebSocket and SocketIO) All the above processes (reconstruction and labeling in real time) are computed via the cloud (Amazon EC2, CUDA)
  5. AI is able to process natural language input to search for certain objects in the Gaussian splatting, as well as produce a detailed analysis of the area surveyed by the splat (OpenAI, Gemini)
  6. Fast deployment of CDN (Vercel)

Challenges we ran into

  1. Setting up Amazon EC2 with GPUs and configuring all required libraries (CUDA, drivers, model dependencies) so Gaussian Splatting, Pi3x, and SAM3 could run reliably on the server
  2. Streaming low-latency video from the camera to the cloud while handling large amounts of data
  3. Making the Pi3x reconstruction pipeline fast enough for near real-time use by reducing memory usage and speeding up frame processing
  4. Keeping object labels consistent in 3D so SAM3 outputs are aligned correctly with the reconstructed environment
  5. Finding the right balance between map quality and processing speed so the system produces detailed 3D maps without long wait times

Accomplishments that we're proud of

  1. Running the Pi3x model to generate the Gaussian Splatting in real-time is a much more difficult problem than expected. Since the map needs to be generated in real-time, we developed our own submapping algorithm that records the environment at specific intervals (every 4 seconds, every 5 seconds, etc). Every recorded interval is independently processed on the EC2 instance. The difficult task here is stitching the independent submaps together. To do this, we analyze the entropy of each submap and use an SVD algorithm to figure out which part of the submap should be appended to which part of another submap. We then run another algorithm to perform error correcting and ensure all submaps are appended together correctly.

  2. Rendering pointclouds on the browser can be computationally very taxing, as some large-scale submaps can contain around 30 to 40 million points. To optimize for this, we use the numpy compression algorithm to reduce the computational load on the browser.

  3. While we could’ve used a YOLO model to identify objects in the scene and label them, we took this a step further and developed a SAM3 identification system. When a user queries to find “boxes” in the 3d map, CLIP embeds the camera stream and user query (CLIP embeddings map text and images to the same dimensional space) to reduce the search space of all frames from the camera to look through. SAM3 segments the object we’re looking for. After we’ve segmented the camera frames and identified the objects we’re looking for, we can map those pixels to the point cloud and (with very high accuracy) pinpoint the object in 3D space.

What we learned

We became a lot more experienced with optimizing our algorithm for GPUs and running models on EC2. We also performed time complexity and amortized analysis on our algorithms to improve the point cloud generation speed.

What's next for Strata

Future Features:

  1. Enable multiple robots to operate in the same digital twin simultaneously, optimizing paths, avoiding collisions, and sharing spatial intelligence in real-time
  2. Use LLMs to prompt the robots to autonomously perform their regular tasks, removing the need for constant human oversight after initial scanning
  3. Use AI to simulate and optimize complex tasks like heavy-lifting, inventory sorting, or assembly workflows before execution.

Built With

Share this project:

Updates