Summit: Gamified Data Labeling Platform

Inspiration

We were deeply unsettled by the widespread practice of outsourcing data labeling to underpaid workers, often in exploitative conditions, to train AI models. This motivated us to create a platform that not only ensures fair compensation but also transforms the mundane process into an engaging and rewarding experience. By gamifying data labeling, we make it enjoyable and even addictive, benefiting both users and companies alike. Users get paid for their contributions while engaging in a fun, interactive process, and companies gain access to high-quality, demographically targeted data that enhances model training and evaluation.

What It Does

Summit is a gamified data-labeling platform that enables companies to upload large datasets, define classification and segmentation parameters, and select specific demographics of users for targeted data review. Companies through our platform will have the opportunity to either upload a dataset or a model. They will also be able to specifically determine the number of reviewers per image and a specific target demographic to ensure high-quality validation. In addition, for classification tasks that we deem to necessitate a high level of knowledge (i.e., differentiation of RNA or DNA), we ensure that our users are qualified for these tasks and verify the provided documentation.

Classifiers (mobile users) get to experience Summit as an interactive, reward-based mobile application. Users earn money by labeling and segmenting images, progressing through a system of trophies that encourages continued engagement. Live updates of local users cashing out serve to enhance motivation and create a sense of competition. Additionally, the platform offers numerous features that allow companies to compare their AI model’s predictions with human-labeled data, helping refine their algorithms and improve accuracy.

Technical Functionality

When a company uploads a zipped folder of images and a list of text classifications, our backend unzips and processes them into MongoDB. From there, the company will be able to see how much of the data labeling task is completed and create any additional projects. We also provide A/B testing for companies who are facing a shortage of real-world testing. By leveraging Modal, we are able to host and properly deploy their model to “compete” against users, resulting in a thorough and rigorous testing of their computer vision model.

Users create shareable profiles and have the option to upload credentials (such as diplomas) to qualify for more specialized data labeling tasks. To verify these credentials, we utilize a Roboflow CV model hosted on Modal for Optical Character Recognition (OCR), ensuring only eligible users gain access to specific datasets. They are then able to enter the game and classify or segment images based on their interests. They will also be able to compete against an AI model given by a company to help train it. Currently, to emulate that process, we use Roboflow-integrated models like YOLOv8 to allow the user to complete segmentation and/or classification tasks.

How We Built It

As a team, we divided ourselves into these main groups: Flutter Mobile App Development – Focused on creating an engaging and user-friendly interface. Next.JS App & Auth0 API Integration – Handled frontend development and allowed for user login. MongoDB Data Storage & Access – Designed and implemented a scalable database architecture to efficiently store and retrieve labeled data. Each group built its respective components separately before integrating them into a cohesive system, ensuring seamless functionality across all parts of the platform. Roboflow & Modal – Our machine learning models for credential verification and machine learning models a company may upload are hosted on Modal and grabbed from Roboflow. Figma – We went through several iterations to finalize a design and come up with unique features that we have not seen in other platforms.

Challenges We Ran Into

We encountered several technical and logistical hurdles throughout development. Some of our key challenges included outdated APIs: many dependencies we initially planned to use were deprecated, forcing us to research and implement alternative solutions on the fly. ESLint issues: midway through the hackathon, our web app broke due to conflicts with ESLint modules, requiring rapid debugging and fixes. UI: we faced significant challenges trying to optimize our UI for the best possible user experience. Oftentimes, we had to create our own custom components, making many tasks significantly more challenging than usual. Unfamiliar tech stack: many of us were working with new technologies for the first time, leading to a steep learning curve. Breaking down complex tasks into manageable components helped us navigate this challenge efficiently.

Accomplishments That We're Proud Of

After more than 24 hours of coding, we successfully integrated a fully functional AI-powered data-labeling pipeline, optimizing efficiency and accuracy while ensuring a seamless user experience. To enhance engagement, we designed a dynamic gamification system, transforming tedious labeling tasks into an interactive and rewarding process. Real-time cash-out updates were implemented to keep users motivated, providing instant and transparent earnings. Despite significant technical challenges, we rapidly developed a working prototype within the tight constraints of a hackathon, demonstrating the power of innovation, agility, and problem-solving in high-pressure environments.

What We Learned

Throughout this project, we gained valuable experience in collaboration: coordinating effectively across different teams to integrate multiple technologies. Problem-solving: finding creative workarounds for deprecated APIs and other unexpected technical issues. Integration testing: ensuring seamless functionality between mobile, web, and database components. Client prioritization: emphasizing a clean, engaging, and easy-to-use UI for clients to use, enabling them to have a smooth and enjoyable experience. Working under pressure: meeting tight deadlines while maintaining high-quality code and user experience.

What's Next for Summit

We have ambitious plans to enhance Summit’s capabilities further, including launching: we want our application to live in the hands of users and customers as fast as possible. AI-powered fraud detection: developing predictive features to flag suspicious labeling activity and ensure data integrity. Business-to-business features: conducting customer interviews and learning more about what features we can add to our platform. With these improvements, Summit aims to redefine how data labeling is approached, making it ethical, engaging, and highly effective for both contributors and companies.

Built With

Share this project:

Updates