Inspiration

Companies constantly collect email addresses of potential customers from their websites. However, managing and researching this leads is hard and time-consuming. New AI algorithms plus lots of unstructured data from the web can be utilized to make this task a breeze.

What it does

We use different data sources (a companies website content and screenshots, Wikipedia articles, ...) to train a model to classify a company into the German WZ classes.

How we built it

Step by step getting into the data and used technology.

Challenges we ran into

The initial data set was quite challenging due to missing websites and a non-uniform distribution of the classes. Also, handling huge amounts of data on local machines is difficult while setting up cloud services takes some time.

Accomplishments that we're proud of

We used a broad set of feature, engineered a late fusion for prediction and trained our own models.

What we learned

Training and especially tweaking machine learning models takes time ;)

What's next for leadif.ai

New data sources (Wikipedia meta data, sub pages, entities), better utilization of website screenshots, predicting/listing competing companies, and more.

Built With

Share this project:

Updates