SafeSound

Inspiration

The Quest for Safety in the Modern World

In an era where personal interactions weave through the tapestry of our daily lives, the potential for face-to-face encounters with distressing content looms large. Imagine being able to move through the world with a profound sense of security, where your real-life exchanges are safeguarded, meticulously tailored to shield you from words and topics that unsettle you the most. This vision isn't merely aspirational; it's essential in cultivating safe and inclusive environments where individuals feel empowered to engage fully and confidently.

The Stark Reality

Delving into the heart of the issue, urban studies shed light on a startling reality: up to 65% of women report experiencing verbal harassment in public spaces, a statistic that underscores the pervasive issue of unsolicited and often distressing auditory encounters (Journal of Environmental Psychology, 2020). Beyond the digital realm, real-life interactions present a significant moderation challenge due to their immediate and unpredictable nature. This spontaneity, ranging from overhearing harmful conversations to direct confrontations, amplifies the difficulty of shielding oneself from potential emotional and psychological harm.

The impact of such encounters isn't fleeting; research indicates that unexpected exposure to triggering content can lead to profound psychological effects, including increased anxiety, stress, and in severe cases, re-traumatization (American Psychological Association). Moreover, the burden of navigating these treacherous waters disproportionately falls on women, who are the majority of harassment victims. A report by the World Health Organization (2021) amplifies this point, indicating that nearly 1 in 3 women globally have experienced physical or sexual violence, often manifesting in public settings where verbal harassment is a common precursor.

People who suffer from daily harassment, PTSD, nervous breakdowns, and other mental health concerns, need a solution that helps them traverse through daily life, secure in knowing that they are protected

Bridging Toward a Solution

The current approach to content moderation, heavily reliant on traditional trigger warnings and static tools, falls short in meeting the complex demands of a diverse and global audience. For individuals grappling with the repercussions of daily harassment, PTSD, nervous breakdowns, and various mental health challenges, there's a pressing need for a more dynamic solution that offers reassurance and protection, enabling them to safely navigate their daily lives. This is where our project, SafeSound, comes into play.

What it does

SafeSound is a dynamic, user-driven audio augmentation tool designed to empower individuals with control over their real-life auditory environment. SafeSound operates in two modes simultaneously, a fast sentence parsing mode and an accumulating semantic parsing mode. The fast parser is capable of identifying relevant trigger words in real-time, removing them from conversation instantly. On the other hand, the semantic parser runs alongside the fast parser, analyzing the deeper meaning of conversations. If we feel that a conversation is unsafe, SafeSound will block out any audio until the conversation's topic has changed.

How we built it

Audio Processing Pipeline

Our project consists of two main sections: The audio processing pipeline and the user interface. The audio processing pipeline consists of a series of steps to gather, transform, and augment raw audio input using machine learning and multithreading techniques. The pipeline begins by inputting audio via a microphone, which is immediately converted into chunks and passed into a real-time playback device and a speech to text model. Due to the nature of a real-time solution, we were required to optimize the NLP at all points possible, resulting in a multithreaded approach on fine-tuned models for optimal results. Using text transcription and segmentation, we're able to accurately interpret a user's voice and pass it into one of two parsers- a fast parser which analyzes surface-level trigger words and a trailing parser which analyzes the semantics of a conversation. The semantic parser relies on an exponential running average technique implemented with a large language model to determine a conversation's level of sensitivity, while the fast parser relies on generated mappings for optimal performance. Finally, our pipeline intelligently outputs chunks of audio based on results determined by each parser, resulting in the audio output. This pipeline is based entirely in Python, interfacing with our user interface via GET requests.

User Interface

The second aspect of our project is the user interface, involving a frontend built with React.js and python, a backend (REST API) built with Flask, and a deployment hosted on Vercel. Our landing page routes to a python web application, which sends a POST request to a /generate endpoint to use a large language model for generating triggering words relevant to a specific topic. The audio processing pipeline can then use these generated words as input into the model, enabling fluid transitions between the interface and the application.

Challenges we ran into

Due to the inherent difficulty in implementing a real-time NLP-based solution, we ran into a number of challenges regarding optimization and audio quality. We were required to explore multithreading techniques and utilize fine-tuned models in order to achieve this level of reliability and consistency. Further, we ran into challenges regarding the deployment of the application and overall integration between both parts, eventually resulting in a REST API implementation on the backend.

Accomplishments that we're proud of

One of the crowning achievements of our project is the successful integration of audio processing, playback, and word filtration functionalities, which marked a significant milestone in the development of SafeSound. This complex component was the most challenging aspect of our project, and mastering it was no small feat. The incorporation of the Whisper Model into our application represented a major technical advancement, enabling sophisticated handling of audio data. This achievement is a testament to our team's dedication and hard work, symbolizing a breakthrough that propelled our project forward.

Additionally, the development of our Flask back-end for the user interface is another accomplishment we hold in high regard. Managing the intricate connections, including the integration of the ChatGPT word generator wrapper and the Whisper Model, was initially a formidable challenge. However, successfully navigating these complexities to achieve a functional and robust back-end system showcased our team's resilience and technical skillset.

What we learned

Throughout the course of this project, our team ventured into uncharted territory, particularly in the realms of semantic analysis and audio preprocessing. Learning and implementing concepts such as the leading parser and trailing parser in semantic analysis was a profound learning experience. This exploration into "drawing meaning from text" has not only broadened our technical expertise but has also opened up new avenues for problem-solving and innovation across various future applications.

Embarking on audio preprocessing was a significant leap for every member of our group, given its technical demands and pivotal role in SafeSound's functionality. This journey, though daunting, was incredibly rewarding, equipping us with invaluable skills in handling audio data and instilling a deeper sense of accomplishment and confidence in our abilities as innovators.

The project journey, marked by overcoming challenges, mastering new skills, and fostering a collaborative spirit, has enriched us far beyond the technical achievements alone. We have gained a richer, more nuanced understanding of innovation and perseverance, setting a solid foundation for our future endeavors in technology and problem-solving.

What's next for SafeSound

While we are incredibly satisfied and proud of the progress SafeSound has made within just a 24-hour period, our vision for the project extends far beyond this hackathon. Despite our achievements, there remains a host of features we aspire to implement in the future.

Firstly, the development of a mobile app would significantly enhance accessibility, allowing users to benefit from SafeSound's protective measures on the go. Additionally, we aim to upgrade the system to handle multiple trigger words simultaneously, broadening the scope of conversations and environments that can be made safer.

Another critical area for expansion is the tool's capability to detect and moderate audio content from a wider array of applications, including Instagram reels and various movie streaming platforms. This would ensure users are safeguarded across more of their digital lives. Moreover, we recognize the importance of refining the audio transcription process. By harnessing better computational power, we can improve the accuracy and responsiveness of SafeSound, making it even more effective in real-time situations.

Lastly, enhancing the user experience by making word fade-ins and fade-outs smoother is a priority. This refinement would ensure that interventions by SafeSound are as unobtrusive as possible, maintaining the natural flow of conversations while still providing protection. Each of these enhancements is a step towards realizing our ambitious vision for SafeSound, making it an even more powerful tool for individuals seeking to navigate their auditory world with confidence and security.

Final Thoughts

Reflecting on SafeSound's development, we see it as more than a technological breakthrough; it's a beacon for creating safer, more inclusive spaces. This project has stretched the limits of what we thought possible, blending innovation with a deep commitment to enhancing individual well-being.

SafeSound's journey from idea to implementation encapsulates the essence of collaboration and the power of a vision that looks beyond immediate challenges to the broader impact on society. Our achievements thus far mark only the beginning of what we hope to accomplish. The potential for SafeSound to transform auditory experiences worldwide fuels our ambition to continue refining and expanding its capabilities.

Looking ahead, we're inspired by the possibilities that SafeSound presents for ensuring everyone can navigate their daily lives without fear of distressing encounters. We're grateful for the support and feedback from our community, which has been instrumental in shaping SafeSound's trajectory. Together, we're not just developing a tool; we're paving the way for a future where safety and inclusivity are integral to every auditory interaction. SafeSound represents our first step towards that future, and we're excited for the journey ahead.