Deep-Daze

Inspiration 🌟

Inspired by our university's AI club.

What it does 🌟

The deep-daze GUI provides a high level interface to the deep-daze hybrid CLIP/SIREN Model, which generates a novel image based on the CLIP data's distribution and a user-inputted prompt.

The second component, simply known as IntelliVision, uses the ImageAI open-source computer science library to classify an entire image as well as detect individual objects in the user-fed image.

How we built it 🌟

For the deep-daze GUI, we first used existing code from the original project by lucidrian and used CUDA to accelerate computation. We then produced a skeleton of the GUI in the dashboard creation library, streamlit. The GUI and edited site-package itself were then developed in tandem, until feasible solutions were found for all relevant problems (explained further in the challenges section).

The image prediction and object detection component also uses streamlit. This portion utilized more aesthetic features to emulate a professional web app.

Challenges we ran into 🌟

On the deep-daze GUI, key challenges were reducing runtime to make the GUI useable by altering network architecture (specifically the depth/width of layers) to improve speed while minimizing a loss of model expressiveness. We also tested different configurations to allow the running of a model in parallel to the streamlit script and the displaying of training progress within the GUI, such as a programmatic command line call of the model script or the creation of a separate subprocess set, before settling on our current solution, which uses a customized version of the deep-daze package.

Accomplishments that we're proud of 🌟

Making our first data science dashboard app successfully!

What we learned 🌟

We learned how to use the streamlit library and API, in addition to how to appropriately tune model parameters and toggle between model architectures in each GUI, respectively. We improved our familiarity with pytorch (with the CLIP-SIREN model hybrid) and with tensorflow (ResNet, Yolonet, etc)

What's next for IntelliVision and deep-daze-GUI 🌟

If we choose to continue this project, the next steps would be to add a pretrained upscaling network that processes output from the CLIP/SIREN network to one GUI, and improve/increase the number visualizations for the object and landmark detection in the other GUI

Built With

Updates

Caijun Qin started this project — Sep 19, 2021 03:00 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.