Automated Ambience

Inspiration

We make too many decisions these days. This idea came from the realisation that even the smallest choices like picking the perfect music setting for that ambiance dinner date, house party, or even studying music is too mentally taxing!

What if you had a device that could just play music according to where you are?

What it does

The Automated Ambient DJ is a real-time environmental listener that manages the vibe of the setting the user is in. Using the user's microphone, we can detect speech, laughter, animal chewing noises (yes, that is a real category) and differentiate into appropriate genres.

When it did detect a change to another genre it would perform automatic transitions, if you start chatting or placed the device in a busy space, it will automatically play chill jazz. Even when there’s no more audio detected, total silence, the technology will drift back into ‘Lofi’ as if you were studying quietly.

How we built it

The project utilises existing APIs and connected via web technologies with an AI model:

YAMNet (Yet Another Mobile Network) is a pre-trained deep neural network developed by Google Research that can identify and classify 521 different audio events. We picked this due to its efficiency and its lightweight design, this would give us more flexibility in the future if we wanted to move to an embedded system on a Raspberry Pi.

Music Engine, during our initial design we attempted to utilise Spotify's API, but it required a paid subscription so instead we switched to the free YTmusic API, this in turn gave us a lot more flexibility when it came to choosing what we could play via our program.

Sounddevice allowed our Python program to listen in on our microphone to analyse the surrounding sounds.

The Machine learning framework TensorFlow Hub developed by Google which allowed us to run the YAMNet model locally on the machine, this again would have allowed us more flexibility if we were able to pursue the embedded system approach.

We used a flask server which allowed us to run a local web server which any network connected device could access if it was connected to the same internet. This is how we allowed the system to be controlled on our phones, by uploading the music playing to the website any device currently on the website would play the audio and have the ability to pause it locally or skip the song entirely.

Challenges & Limitations

YAMnet limitations, we hoped to originally be able to create moods and different music based on how you spoke, such as shouting for a more rock or metal vibe. This wasn't possible as YAMNet was the closest performing system, which could only differentiate between types of sounds.
Feedback loop, originally when music played from a genre, it would listen to its own output and move to the genre we assigned ‘Music’. We fixed this later with the addition of variable points below.
Hardware shortages, we were planning on creating this system for an embedded device, however due lack of Raspberry PIs and equipment we weren't able to extend to this area, despite having all the source code at our fingertips.

Tuning & Tricks:

Variable threshold for sound and genre:

For genres harder to get to (due to YAMnet sensitivity) we could make the boundary lower.

Furthermore, some sounds were too hard to be a false positive, we could give extra points to sounds like "bird chip" and less points to "speech".

Weighted Threshold system: $$P(t+1) = P(t) + (W(S) * M)$$

Where:

P is the current progress toward a mood shift.
W(S) is the Sound Weight (e.g., 1.5 for Laughter, 0.7 for Speech).
M is the Out-of-Genre Multiplier (a boost applied to sounds that aren't within the current genre, to promote a change if necessary).

A transition only occurs when:

$$\sum W(S) \ge T_{genre}$$

This allows us to set high thresholds for stable genres like "Lofi" for (threshold = 8) and lower, more reactive thresholds for "Upbeat" or "Rock" (threshold = 4).

What's next for Automated Ambience:

With this technology, we believe it can be extended further. By resolving our limitations and fine tuning our models, we can expand into commercial markets later:

Embedded systems, lightweight and portable and able to act as a normal speaker.
Multi-sensory integration, smart home LEDs for lighting.
Natural language processing, able to request certain songs or have voice controlled commands.
Connect to your music provider to give a personalised taste of your genres.

What we learned:

During the initial stages of the project we struggled heavily choosing the right model and getting that model to run smoothly regardless of whether it was either of our machines. We also learned the importance of fine tuning behind the scenes to ensure that the program would actually be incentivized to change song, early in development we noticed it was almost always just choosing chill jazz due to us speaking even when playing test sounds, this led us to realise we needed to implement the weighted system to help ensure all sounds were processed.

When trying to implement our product on an embedded system we faced the drawback that there were no raspberry PIs left for us to experiment with and explore if this idea was possible. This led to us having to drastically scale back our project and go in a different direction, with both of us deciding on implementing the website based approach allowing access to the system via your phone. This gave us a useful demonstration of its lightweight design and possible commercial success in the future.

From delays to thresholds, this was a great learning opportunity on scaling back and refining on the scope we could achieve in 24 hours.