Inspiration
Creating album cover art is hard. It not only requires time, effort, and sometimes talent, but it must be effectively crafted to convey the idea in a song. The work can be offloaded to a professional artist, however this still takes time and now money; for small artists who just need simple quick album cover work, that is where we thought a solution could be crafted.
What it does
The user simply provides our program with a WAV file of any song they like. Then, through some magic, an image is outputted and displayed within a matter of seconds which represents the album cover work for that song.
How we built it
The main user interface was created through Tkinter in Python. Then, the inputted WAV file is also processed in python. The processing consists of firstly breaking down the heavy WAV file into chunks, then performing speech to text by the SpeechRecognition library on each chunk, which extracts all the text from the song. Then, we perform some NLP, where the importance of each word inside the lyrics are computed according to its frequency and placement within the lyrics. From this list, we pass the most important word over through to the WomboAI API. The WomboAI API then responds with the album cover, which is then displayed to the user back in Tkinter.
Challenges we ran into
An astonishing amount of dependancy errors was the biggest challenge to get over. Other large challenges were trying to understand how to properly process the WAV files inputted; how would one compute the importance of each word? How do we properly chunk the song down into sizeable bits to process? How do we interact properly with the API's and integrate it into our work? All of these were where a large majority of our time was spent understanding how to overcome these challenges.
Accomplishments that we're proud of
Proud of not only overcoming the challenges listed, but finding a unique solution to a common problem and simplifying the process for the end user. We worked hard to create something that was about as simple as possible but still had a huge impact and usage.
What we learned
This project taught us mostly around how to find complex solutions to complex problems, and rapidly put together our ideas into fruition. We also learnt a lot about working with API's, how to find the right API for the job, how to read documentation and debug given sparse documentation. This project lastly taught us a lot around NLP, with seeing how to process and derive meaning computationally from something as simple as words.
What's next for Frieze
The biggest areas of growth for Frieze would be:
- Investing in trying out different techniques for computing the importance of each word. Some sort of machine learning attention model like a Transformer could work well for these sort of tasks and do a better job of identifying important words.
- Generating videos. Instead of just one image produced after processing the whole song, why not produce a whole music video? The usefulness of that would exceed what we currently have, given how much exponentially more time, effort, and money goes into creating a music video and contrasting with how fast a computer could do it.
- Optimizations with concurrency for video processing. For example, each chunk of the music inputted is processed one-by-one on the same thread, however processing each chunk concurrently all on different threads would create a far faster experience.
- Support for different languages. Being able to automatically detect the language of the song provided and create artwork for it would open Frieze up to just about anyone on the planet who enjoys music.
Log in or sign up for Devpost to join the conversation.