Overview

Machine learning today has the potential to revolutionize many different fields. However, that is severely limited by 1) the availability of data and 2) availability of high quality data. To address this problem, we created DataBees: a platform to crowdsource data collection for the creation of high-quality and diverse datasets.

As a part of our product, we sought to create a web app where clients can request for data and a cross-platform mobile application using which users can contribute images and audio recordings to the creation of various datasets. Additionally, we planned to include manual validation (through another round of crowdsourcing) and automated validation (through pre-trained models) to ensure data accuracy and fairness.

For the fall semester, we focused our efforts on creating a MVP which at the minimum allows creating a request for audio/image datasets, contributing individual data points to those datasets, and viewing and downloading the collected data. We also wanted to provide a proof of concept to provide the validity of our platform. We were successful on both of these goals as we used our MVP to create an audio dataset on pronouncing radiology terms and used it for data analysis. Looking forward to the spring semester, we would like to integrate manual and automated validation into our platform, work on monetization ideas, publish our mobile app on the app store and test our product on a larger scale.

Built With

Share this project:

Updates