Inspiration
Companies constantly collect email addresses of potential customers from their websites. However, managing and researching this leads is hard and time-consuming. New AI algorithms plus lots of unstructured data from the web can be utilized to make this task a breeze.
What it does
We use different data sources (a companies website content and screenshots, Wikipedia articles, ...) to train a model to classify a company into the German WZ classes.
How we built it
Step by step getting into the data and used technology.
Challenges we ran into
The initial data set was quite challenging due to missing websites and a non-uniform distribution of the classes. Also, handling huge amounts of data on local machines is difficult while setting up cloud services takes some time.
Accomplishments that we're proud of
We used a broad set of feature, engineered a late fusion for prediction and trained our own models.
What we learned
Training and especially tweaking machine learning models takes time ;)
What's next for leadif.ai
New data sources (Wikipedia meta data, sub pages, entities), better utilization of website screenshots, predicting/listing competing companies, and more.
Built With
- azure
- css
- html
- ibm-watson
- javascript
- jupyter-notebook
- keras
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.