Inspiration
My teammate and I are passionate about the entertainment tech industry and realized that we had common interests in the creative arts. Madhav has been dancing since the age of nine and Matt has been making films since the fourth grade. Both of us enjoy watching music videos in our free time. We did some research and realized that the process of making a music video from scratch is a time-consuming and expensive process. On average, it takes at least $78,000 and over 90 days to make a professional song (just audio) and another $50k and over 50 days to make a music video. We thought, "what if we could make a music video from thin air".
What it does
Outside Studios enables someone with little to no experience in music production to make music videos in a short time and with little expenses. It enhances the creative intuition of any person and removes any barriers to entry in exploring music. Using AI, you can generate song lyrics, a song with both instrumentals and vocals, and put together a video with customized visuals. Outside Studios lets you create music videos from nothing.
How we built it
We developed a novel workflow to create a music video from ground zero. The process can be split into several parts:
We started by generating lyrics using a well-designed and sufficiently detailed prompt that accurately describes the song to be produced at a high level. Eg: Here is the prompt that we used "Please write a short pop song about the Outside Lands music festival that's based in San Francisco. The vibe of the song should be inspired by the music of Ariana Grande, Kesha, and Lizzo, but should not mention their names. The song should be similar to a billboard top 100 song. The song should be about two people who meet at the festival and fall in love. The lyrics could also include words such as 'windmills, mushrooms, beer, wine, Lands End, Twin Peaks, and Sutro.'"_
For generating both vocals and instrumental, we fed the lyrics in parts into Suno AI (one of our sponsors). We generated each part of the song separately. We started by generating 13 iterations of the first verse until we got the right vibe/genre. This was the longest iteration, because it dictated the tone of the entire song. From there, we continued to append different verses to the song, making stylistic choices at branching pathways along the way. One of the key decision points was when we shifted the tone from pop to rap. For the Outro, we built with the Chorus as the baseline but instead of the lyrics, we added four ‘.’ punctuations in separate lines to generate only instrumental music. After generating all the individual clips, we stacked them together to make the full song.
We then collected raw footage that embodied the lyrics at different parts of the song using Outside Land’s YouTube Channel and Artlist.io. After finding these clips (over 50 of them), we used Stable Diffusion Deforum to render each of those clips in an anime aesthetic..
After getting our final stylized and upscaled the converted clips, we edited together with the song in Adobe Premiere and Runway.ml to make a polished music video. Additionally, we color corrected and deflicked our footage, using video transitions to make the overall aesthetic more energetic.
Challenges we ran into
There were also some challenges that helped us learn a lot through the process.
While neither of our teammates had any background in music production, we wanted to explore this project to see if we could generate a music video from the ground up using AI/ML models. Working in a time-constrained hackathon taught us how to come up with creative solutions.
Making the right prompts on ChatGPT was challenging because it has the potential to be too open-ended. To generate lyrics that capture our vision for the song, we needed some structure. This process required multiple iterations, adding song genres/styles of specific singers and even specifying certain words to be in the lyrics helped us get faster results and structured the process of iteration better.
Using Suno.ai, we were able to generate a song in 3 hours. The entire process was also iterative and there were several edge cases where the model was not generating the right song clips. As an example, when one clip with rap ended, when we generated another clip with the previous clip as the baseline, the model generated a rap but the lyrics were incoherent words for two lines before it became normal. The model was also unable to interpret that the Chorus was supposed to be consistent throughout the song. We also had to play around with empty spaces and punctuations to see how the model filled them up with instrumental music.
Accomplishments that we're proud of
We both did things outside of our comfort zone to create something in the music space. We don’t have a music production background and just used our intuition and creativity to come up with this solution.
We are grateful to have met so many incredible people through the hackathon. We enjoyed talking to the judges/sponsors, mentors, organizers, and fellow participants and found that we can also help each other outside the hackathon.
This is the most extensive project we have both done in terms of audio and visuals. We are happy with how our video turned out and found it very catchy.
What we learned
Building an ambitious project in a limited amount of time.
Utilizing available resources and not relying on spending huge amounts of money to create something.
We also taught ourselves how to use various tools just through video tutorials.
Also got insights on the trends in the music/entertainment industry (and even outside) through presentations and casual conversations with mentors and fellow participants.
What's next for Outside Studios
Outside Studios should democratize access and allow anyone with little to no experience in music production to explore their creative intuition. The “Outside” is a pun on the name Outside Lands and is intended to show that it lets users think outside the box. Outside Studios can become a one stop solution for generation for an entire music video by using API integrations from various AI models and stitching them together with a user-friendly UI and a robust back-end.
Built With
- adobepremiere
- controlnet
- darksushicheckpoint
- deforum
- discord
- huggingface
- runway
- runwayml
- stablediffusion
- suno.ai
- topazvideoaiupscaler


Log in or sign up for Devpost to join the conversation.