Inspiration
In the age of social media, video editing has become an essential skills to almost all influencers. One of such skills is to find the appropriate background music for your content. A good background music needs to be able to make the video more engaging, triggering a stronger emotional acknowledgement from the viewer. This process can be very tedious and time-consuming. With the power of GenAI, we can leverage its power to help the user reduce their workload by directly generating background music that is both fitting to the video’s content and creators’ intentions.
What it does
Introducing AI-Tonal, a tool we created which utilizes VLM (Visual-Language Models), LLM (Large Language Model) and Suno (music genAI) to create an end-to-end pipeline that takes the target video and the creator’s intent as prompt to generate quality background music for the creators to use.
How we built it
Challenges we ran into
Combining different models into the pipeline and adjusting the prompt to achieve the result we wanted is difficult.
Accomplishments that we're proud of
We have managed to come up with a full end-to-end pipeline that can create custom background music specific to creators' content and intent.
What we learned
Obtained a much more indepth understanding of multi-modal approaches. The hands-on process provided me with precious experience on designing my own pipeline.
What's next for AI-Tonal
Future development as follows
- more detailed suno prompt for the music to be more aligned with the video
- Add in more models to better understand and combine the contents of the video with the user's intention.
- Allow for AI to perform the combination of music with the video, saving even more effort for the content creators.
Built With
- flask
- html5
- javascript
- openai
- python
- suno
- transformer

Log in or sign up for Devpost to join the conversation.