We are trying to build a recommender system that recommends business to users. To develop a recommendation system for users, essentially we are looking at their preferences in different categories, which are presumably dependent on the rating and location. Since users are the target, we would generate a model to predict the users' preference specifically.

We specifically looked at category, ratings, and gps location.

For Category, we wanted to group similar category businesses so we could recommend it to people who use the same types of businesses often. We did this by using Yelp's categories and parsing through each review and labeling which group type it would be. Eventually, each business would be in one of 22 types of businesses ranging from Active Life to Food. We might recommend more Food businesses if a user tends to review many food places. Alternatively, we can suggest new Active Life businesses such as bowling if the user has tried surfing and skiing.

For Ratings, we wanted to scale the rating based on how many reviews so we can achieve a delicate balance of quality vs quantity. Ideally, we would also use NLP to give more weighting to reviews that spent more time crafting well-written reviews. Also, we would give more weighting to people who consistently review businesses as they would be more experienced in writing reviews. We might also consider trying to recommend places of similar popularity to users if they tend to visit lesser know businesses. Popular and crowded places might not be preferable to all users.

For GPS, we gathered the locations of each review a user made, and averaged the distance of all the reviews. Based on this averaged point, we can make further suggestions in the proximity of the user's previous reviews. If we had more time, we would use a clustering algorithm to figure out hot spots to make recommendations. This way we can avoid making recommendations to a user's previous one-time travel destinations.

Finally, we would build a classifier pipeline with all three of these features and validate based on a dataset which we set aside earlier. We would want to see that users eventually visited places that the classifier pipeline recommended.

Share this project:

Updates