retAIn.hack

Inspiration

Generally speaking, the whole after-sales customer experience is broken. This includes many aspects - from delivery to returns, services and more. Another aspect is the retargeting of existing customers without annoying them by sending out untargeted vouchers and newsletters or showing them unnecessary ads. We aim to change this with our solution and we built a foundation for changing after-sales experiences: a personal inventory of bought products.

What it does

Our solution extracts the purchase history with all kind of product information and categories. Next, it analyzes the data, derives statistics and matches the categories with the current page structure to derive the listing categories and match these with our current database. Based on these insights, we allow a better retargeting and make better suggestions.

How we built it

We received a huge amount of sample data (in two files) from idealo (all the purchases from April and May (partly)). Using the idealo API we enriched the sample data sets with category information. We used Python, Power BI, R, Excel to analyze and enrich the datasets and to do machine learning modelling. We derived the html category trees using R web scraping.

Challenges we ran into

Unfortunately, the purchases data only contains an article sku per purchase but no category information. So we decided to do a two-step process to enrich the purchases data with rich category information. If we worked on the whole data set, we would have to make ~120.000 API requests, one per article sku. This would even be a huge challenge timewise. So we decided to reduce the purchases data down to all the buyers with at least 100 purchases each. This brought down the number of unique article sku's from 120.000 to around 10.000. First, we did around 10.000 API requests to get an atomic category label for each article sku. Then we scraped the idealo website for the category pages. Via the breadcrumb information, we were able to get a full category path for each atomic category label.

Even with the scraped full category path information we were only able to enrich around 56% of the purchase data. We would have to further investigate how we could get hold of the missing category paths.

Accomplishments that we're proud of

Having found a feasible workaround to get the full category information.

Having a clean data set now and starting some analysis.

Having (manually) created a retargeting matrix based on full category paths in order to fine-grain retargeting based on the categories of previous purchases.

What we learned

No data, no AI. As almost always don't expect clean tidy data even from big data providers. As often said, data preprocessing and munging takes up 80% or more of your time before actually getting to the "sexy" parts of machine learning modelling.

Also, we learned that focusing on one thing is key and that building the foundation for the planned solution will take way longer than expected.

What's next for retAIn.hack

Combine dataset with other external data -> Combine with image recognition -> Match with personal products bought before -> integrate AR for easy user interaction -> match with a personal assistant for service issues

Built With

idealo
python
r
webscraping

Updates

Sebastian Daus started this project — May 26, 2018 01:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.