Inspiration
Started from thesis project Difusion Models for 3D Novel view synthesis. Inspired by the recent breakthroughs in 3D Gaussian Splatting.
What it does
This MVP prompts the user for text/single-image input, and outputs a .ply object representing the 3d model. It does so in less than 10 seconds. This .ply object contains the parameters of the gaussian splats which can be viewed in many available softwares. The current roadmap is scaling a foundation model for text/image/video to 3d scenes, that leverages fast, real-time, photorealistic rendering inherent to 3D Gaussian Splatting.
How we built it
Used Firebase for hosting. Python for ML model. Google Cloud for running inference on the pre-trained model. Fast API for deploying the API.
Challenges we ran into
Encountered bugs along the way, especially on the backend part, with hosting and Fast API. Some issues with communications in the pipeline.
Accomplishments that we're proud of
We believe this is a solid first step towards showcasing current capabilities, and strongly believe that this can be scaled to achieve large and accurate scenes using current own research, based on physically-grounded approaches.
What we learned
Learned more full stack development, necessary for the deployment of the web app.
What's next for iXiM
iXiM will aim to develop the first foundation model for spatial computing that achiev s any to scene modality with fast, real-time, photorealistic rendering.
Log in or sign up for Devpost to join the conversation.