Inspiration

Have you ever wondered what dinosaur is most similar to your pet? This question sparked our project, where we explored the links between modern pets and prehistoric dinosaurs using data science.

What it does

Prehistoric Pals uses an image classifier to see which dinosaur is the closest match to your pet and predict when your pet would have lived if it was a dinosaur.

How we built it

We first gathered 2,500 dinosaur images of various types. We had to remove the duplicated pictures as well as some pictures that do not fully encompass the dinosaur they represent. With the images, we utilized a pre-trained VGG16 convolutional neural network (CNN) model to create an image classifier for dinosaurs.

We also gathered a dataset that encompasses over 300 different kinds of dinosaurs across three periods of the Mesozoic era, including many different features such as species, diet, location, etc. We would then have to clean the data, such as removing data containing null values, and prepare to feed it into our machine learning model. To do that, we turned all countries that the dinosaurs resided in into the broader continents, created a new column within our data frame that summarizes our years, and created dummy variables for categorical values. Finally, we used several different algorithms to determine which would produce the most accurate model. After testing them all, we decided to go with the k-random forest regression, which had the lowest test mean-squared error and had a non-parametric fit which seemed to work well for our data.

Challenges we ran into

We had to clean the datasets that we used. For example, we would have to remove some duplicate images for our CNN model.

For our machine learning model, we struggled to find and prepare the data. On one side, we found lots of data but had to pay to use it. Thus, in the dataset that we used, we had few features. Out of these, some (like dinosaur species) did not really correspond to a pet.

Another challenge we faced was finding out how to present what we had done in the front-end setting. Our group had to split off, with half of us doing machine learning while the other half focused on developing the UI for the front end of our webpage.

Accomplishments that we're proud of

Getting a functional and fast website for image processing. Building the graph of diet types vs times. Implementing different kinds of machine learning algorithms.

What we learned

Machine Learning algorithms from a practical perspective. Getting more comfortable with matplotlib graphing. Learning different kinds of machine learning algorithms. Learning how to clean and retrieve data.

What's next for Dino Gym

There is lots of room for improvement in every aspect of every project. A simple but significant change would be to remove all backgrounds of inputted images for better matching. We would like to improve the dataset that our model was trained on by adding more feature information. Additionally, if there were a graph of the approximate counts of each species of dinosaur as well, it would undeniably improve our predictions.

Data Sources

https://www.kaggle.com/datasets/larserikrisholm/dinosaur-image-dataset-15-species https://www.kaggle.com/datasets/kjanjua/jurassic-park-the-exhaustive-dinosaur-dataset/data

Built With

Share this project:

Updates