Inspiration
We all deal with nostalgia. Sometimes we miss our loved ones or places we visited and look back at our pictures. But what if we could revolutionize the way memories are shown? What if we said you can relive your memories and mean it literally?
What it does
retro.act takes in a user prompt such as "I want uplifting 80s music" and will then use sentiment analysis and Cohere's chat feature to find potential songs out of which the user picks one. Then the user chooses from famous dance videos (such as by Michael Jackson). Finally, we will either let the user choose an image from their past or let our model match images based on the mood of the music and implant the dance moves and music into the image/s.
How we built it
We used Cohere classify for sentiment analysis and to filter out songs whose mood doesn't match the user's current state. Then we use Cohere's chat and RAG based on the database of filtered songs to identify songs based on the user prompt. We match images to music by first generating a caption of the images using the Azure computer vision API doing a semantic search using KNN and Cohere embeddings and then use Cohere rerank to smooth out the final choices. Finally we make the image come to life by generating a skeleton of the dance moves using OpenCV and Mediapipe and then using a pretrained model to transfer the skeleton to the image.
Challenges we ran into
This was the most technical project any of us have ever done and we had to overcome huge learning curves. A lot of us were not familiar with some of Cohere's features such as re rank, RAG and embeddings. In addition, generating the skeleton turned out to be very difficult. Apart from simply generating a skeleton using the standard Mediapipe landmarks, we realized we had to customize which landmarks we are connecting to make it a suitable input for the pertained model. Lastly, understanding and being able to use the model was a huge challenge. We had to deal with issues such as dependency errors, lacking a GPU, fixing import statements, deprecated packages.
Accomplishments that we're proud of
We are incredibly proud of being able to get a very ambitious project done. While it was already difficult to get a skeleton of the dance moves, manipulating the coordinates to fit our pre trained model's specifications was very challenging. Lastly, the amount of experimentation and determination to find a working model that could successfully take in a skeleton and output an "alive" image.
What we learned
We learned about using media pipe and manipulating a graph of coordinates depending on the output we need, We also learned how to use pre trained weights and run models from open source code. Lastly, we learned about various new Cohere features such as RAG and re rank.
What's next for retro.act
Expand our database of songs and dance videos to allow for more user options, and get a more accurate algorithm for indexing to classify iterate over/classify the data from the db. We also hope to make the skeleton's motions more smooth for more realistic images. Lastly, this is very ambitious, but we hope to make our own model to transfer skeletons to images instead of using a pretrained one.
Log in or sign up for Devpost to join the conversation.