Originally we wanted to build an emotion classifier using videos or photos as input. We had the tools for extracting features and building the models, but, unfortunately, were not able to find a good open source data set, which of course is a crucial problem.

This led us to considering building a classifier for handwriting. We knew about a very famous data set on handwriten digits – the MNIST data set; a huge amount of research has been done using it. It is a very clean and well-constructed data set, but it only contains digits, not the characters. Then, after spending another several hours searching for character data set, we realized that there is no really good one that would fit our needs.

Finally, we remembered about one great Ted Talk by Michael Nielsen (quantum physicist, science writer, and programmer) where he shares his views on open source data and crowd-sourced science. In the first couple minutes, Dr. Nielsen refers to Timothy Gowers who posed a question whether massively collaborative mathematics is possible. Dr. Gowers wanted to use his blog to attack a difficult unsolved mathematical problem. Inspired by this talk and being aware of lack of data problem, we decided to build a website where we (meaning the whole public) will be collectively building various kind of data sets, so everybody will be able to upload or download data sets in a specified format. Of course, this is a long term goal but we believe that this source will be of great value to a huge number of researchers interested in face, action, emotion or handwriting recognition.

Potentially, this would be a huge network of scientists around the globe not only for data exchange purposes but also for proposing the solutions and sharing results and opinions on each other models.

Every researcher knows that finding the data and preprocessing it is what takes more than half of the time.We live in a time of Big Data and together we can build better solutions.

Finally, potentially this website could be used not just for solving various kinds of object or action recognition problems, but for all other problems that require a large amount of data, such as cancer or tumor detection.

We want to bring Machine Learning and Data Science to a new level, so lets collaborate and make this step together!

Share this project:

Updates