Inspiration
The ability to describe an object in text or image and see it come to life in 3D form is empowering and revolutionary. Inspired by the origin story of Pixar Animation Studios, Can dedicated countless hours to mastering Unity, Unreal Engine, and Cinema 4D during his college years. Pixar's journey, from its early days of bringing animated characters to life through computer graphics to becoming a powerhouse in animation, deeply resonated with him. By simplifying the process through text descriptions, we are democratizing creativity, opening up a world of possibilities for those who previously faced barriers to entry in the realm of digital creation.
What it does
Building Gen3D has been a journey of innovation, collaboration, and creativity. By combining advanced technologies with a user-centric approach, we have created a platform that opens up new possibilities for digital creation, making it accessible to a broader audience.
Key Features
- Text-to-3D Model Generation: Allows users to input text descriptions to generate 3D models, making the creation process more accessible to those without extensive 3D modeling skills.
- Image-to-3D Conversion: Offers the capability to convert images into 3D models, further simplifying the process of creating digital content from existing 2D assets.
- Support for Various Art Styles: Gen3D supports a wide range of art styles, from voxel and realistic to cartoon and anime, catering to diverse creative needs and preferences.
- User-Friendly Interface: Designed to be artist-friendly with a simple and intuitive interface, ensuring that users can easily navigate and utilize the platform without needing to be experts in modeling or prompting.
- Speed and Efficiency: Designed for rapid content creation, allowing users to generate 3D models and textures in minutes, significantly reducing the time required for modeling and texturing.
How we built it
In the development of Gen3D, our team experimented with a number of 3D generative models for creating digital assets directly from text or images. Among the various models we explored, including Wonder3D, Shap-e, dreamgaussian, zero123plusplus, LGM, mvdream, and stable-zero123, Shap-e emerged as the cornerstone of our project due to their exceptional inference speed and quality. Specifically Shap-e's ability to quickly generate complex and diverse 3D assets made it the ideal choice for our demo, aligning perfectly with our goal of providing users with immediate results upon entering their prompts.
To support the computational demands of these sophisticated models, particularly the resource-intensive stable-zero123, we leveraged Microsoft Azure Cloud. This platform enabled us to host and run extensive tests on other large models such as ProlificDreamer and Fantasia3D, assessing their feasibility for web integration. However, the high GPU requirements of stable-zero123, exceeding 48GB, posed a significant challenge, leading us to prioritize models like Shap-e that offer a balance between performance and resource efficiency.
For the user interface of Gen3D, we chose Streamlit, a decision driven by its simplicity and effectiveness in creating interactive web applications. Streamlit allowed us to build a clean and intuitive platform where users can easily input text or images to generate 3D models. This choice ensured that our website remained user-friendly, encouraging creativity and experimentation among users without requiring them to have technical expertise in 3D modeling or programming.
The integration of the Replicate API for Shap-e was a critical step in achieving our vision. This API facilitated the generation of 3D objects with just 8GB of GPU in under five minutes. Additionally, we dockerized the application to streamline deployment and ensure that Gen3D can be easily accessed and used by creators worldwide. Through these technological choice, we've crafted a platform that stands at the forefront of democratizing 3D content creation, making it faster, more accessible, and more enjoyable for everyone.
Future Viability
Introducing deformable and animated objects into Gen3D can revolutionize the way users create and interact with assets for the metaverse and augmented reality. Deformable objects allow for a more dynamic and realistic representation of objects that can change shape or form in response to interactions or environmental conditions. This can be particularly useful in creating assets that need to exhibit natural behaviors, such as clothing that moves realistically with a character or objects that deform upon impact.
Machine learning can be used to analyze user interactions with the design interface, adapting the interface to better suit the user's design style and preferences. It can also suggest optimizations for 3D models to ensure they are not only visually appealing but also optimized for performance within virtual environments.
Challenges we ran into
One of the primary challenges we faced is the slow iteration speed due to the time it takes to submit a prompt, wait for processing, and obtain an .obj file.
Managing a complex 3D modeling project with a small team of three people was challenging.
Accomplishments that we're proud of
Text-to-3D Model Generation One of our proudest achievements is the development of our Text-to-3D Model Generation feature. This groundbreaking technology allows users to input text descriptions and receive detailed 3D models in return. It's a testament to our commitment to making 3D modeling more accessible to those without extensive 3D modeling skills. This feature has not only democratized the creative process but also opened up new avenues for storytelling and design.
Image-to-3D Conversion Another significant accomplishment is our Image-to-3D Conversion capability. By providing the option to convert 2D images into 3D models, we've simplified the creation of digital content from existing assets. This innovation has been particularly beneficial for users looking to bring their 2D concepts and photographs to life in a three-dimensional space, further bridging the gap between imagination and reality.
Speed and Efficiency Lastly, our focus on Speed and Efficiency has resulted in a platform that supports rapid content creation. Users can now generate 3D models and textures in a matter of minutes, which is a drastic reduction in the time traditionally required for modeling and texturing. This efficiency is not just about speed; it's about enabling a faster iterative design process, allowing creators to experiment and refine their visions with unprecedented agility.
What we learned
To support the computational demands of sophisticated models like stable-zero123, the team utilized Microsoft Azure Cloud. This experience enhanced our understanding of cloud computing platforms and the challenges of managing resources, especially for models requiring high GPU resources. The choice of Streamlit for the user interface development and the integration of the Replicate API for Shap-e model generation provided practical experience in web development and API integration.
What's next for Gen3D
This isn't the last you've heard from us!
Our exploration of advanced 3D generative models and cloud computing platforms, combined with our efforts to make 3D content creation more accessible, positions them well for future innovations in digital creation and the metaverse.


Log in or sign up for Devpost to join the conversation.