Inspiration

As college students, our lives are often filled with music: from studying at home, partying, to commuting. Music is ubiquitous in our lives. However, we find the current process of listening to music and controlling our digital music player pretty mechanical and boring: it’s either clicking or tapping. We wanted to truly interact with our music. We want to feel our music. During one brainstorming session, a team member jokingly suggested a Minority Report-inspired gesture UI system. With this suggestion, we realized we can use this hackathon as a chance to build a cool interactive, futuristic way to play music.

What it does

Fedoract allows you to control your music in a fun and interactive way. It wireless streams your hand gestures and allows you to control your Spotify with them. We are using a camera mounted on a fedora to recognize hand gestures, and depending on which gesture, we can control other home applications using the technology of IoT. The camera will be mounted wirelessly on the hat and its video feed will be sent to the main computer to process.

How we built it

For the wireless fedora part, we are using an ESP32-CAM module to record and transmit the video feed of the hand gesture to a computer. The ESP32-CAM module will be powered by a power supply built by a 9V battery and a 3V3/5V Elegoo Power Supply. The video feed is transmitted through WiFi and is connected to the main computer to be analyzed using tools such as OpenCV. Our software will then calculate the gesture and perform actions on Spotify accordingly. The software backend is built using the OpenCV and the media pipe library. The media pipe library includes a hand model that has been pre-trained using a large set of data and it is very accurate. We are using this model to get the positions of different features (or landmarks) of the hand, such as fingertips, the wrist, and the knuckles. Then we are using this information to determine the hand gesture made by the user. The Spotify front end is controlled and accessed using the Selenium web driver. Depending on the action determined by hand gesture recognition, the program presses the corresponding button. Note the new window instantiated by the web driver does not have any prior information. Therefore, we need to log in to Spotify through an account at the start of the process. Then we can access the media buttons and other important buttons on the web page. Backend: we used OpenCV in combination with a never-seen-before motion classification algorithm. Specifically, we used Python scripts using OpenCV to capture webcam input to get hand recognition to recognize the various landmarks (joints) of the hand. Then, motion classification was done through a non-ML, trigonometric approach. First, a vector of change in X and Y input movement was computed using the first and last stored hand coordinates for some given period after receiving some hand motion input. Using deltaX and delta Y, we were able to compute the angle of the vector on the x-y plane, relative to a reference angle that is obtained using the display's width and height. If the vector is between the positive and negative reference angles, then the motion is classified and interpreted as Play Next Song, and so on for the other actions. See the diagrams below for more details.

Challenges we ran into

The USB-to-TTL cable we got for the ESP32 CAM was defective, so we were spending way too much time trying to fix and find alternative ways with the parts we have. Worse of all, we were also having trouble powering the ESP32-CAM both when it was connected directly to the computer and when it was running wirelessly using its own power supply. The speaker we bought was too quiet for our purposes, and we did not have the right types of equipment to get our display working in time. The ESP32 CAM module is very sensitive to power fluctuations in addition to having an extremely complicated code upload process. The community around the device is very small therefore there was often misleading advice. This led to a long debugging process. The software also had many issues. First of all, we needed to install MediaPipe on our ARM (M1) Macs to effectively develop using OpenCV but we figured out that it wasn’t supported only after spending some time trying to install it. Eventually, we resorted to the Intel chip version of PyCharm to install MediaPipe, which surprisingly worked, seeing as our chips are not Intel-manufactured. As a result, PyCharm was super slow and this really slowed down the development process. Also, we had minor IDE issues when importing OpenCV in our scripts, so we hotfixed that by simply creating a new project (shrug). Another thing was trying to control the keyboard via the OS but it turned out to be difficult for keys other than volume, so we resorted to using Selenium to control the Spotify client. Additionally, in the hand gesture tracking, the thumbs down gesture was particularly difficult because the machine kept thinking that other fingers were lifted as well. In the hand motion tracking process, the x and y coordinates were inverted, which made the classification algorithm a lot harder to develop. Then, bridging the video live stream coming from the ES32-CAM to the backend was problematic and we spent around 3 hours trying to find a way to effectively and simply establish a bridge using OpenCV so that we could easily redirect the video live stream to be the SW's input feed. Lastly, we needed to link the multiple functionality scripts together, which wasn’t obvious.

Accomplishments that we're proud of

One thing the hardware team is really proud of is the perseverance displayed during the debugging of our hardware. Because of faulty connection cords and unstable battery supply, it took us over 14 hours simply just to get the camera to connect wirelessly. Throughout this process, we had to use an almost brute force approach and tried all possible combinations of potential fixes. We are really surprised we have mental toughness. The motion classification algorithm! It took a while to figure out but was well worth it. Hand gesture (first working product in the team, team spirit) This was our first fully working Minimum Viable Product in a hackathon for all of the team members

What we learned

How does OpenCV work? We learned extensively how serial connection works. We learned that you can use the media pipe module to perform hand gesture recognition and other image classification using image capture. An important thing to note is the image capture must be in RGB format before being passed into the Mediapipe library. We also learned how to use the image capture with webcams to test in development and how to draw helpful figures on the output image to debug.

What's next for Festive Fedora

There is a lot of potential for improvements in this project. For example, we can put all the computing through a cloud computing service. Right now, we have the hand gesture recognition calculated locally, and having it online means we will have more computing power, meaning that it will also have the potential to connect to more devices by running more complicated algorithms. Something else we can improve is that we can try to get better hardware such that we will have less delay in the video feed, giving us more accuracy for the gesture detection.

Built With

Share this project:

Updates