Newline Group Case Study
Using Machine Learning to classify Risk
- Estafet Team
The Story
Estafet helped an insurer speed up risk pricing by using machine learning to analyse historical claims data. Starting with input from pricing actuaries, the team trained the machine learning model to find key correlations such as age, hospital type, and service (e.g. cardiac vs podiatry), and then process the narrative of historic claims with natural language programming to find other relevant information such as the condition, treatment and outcome. We used this to place each hospital into a risk category (along with a measure of confidence in that classification) so to help actuaries set premiums. Because we could correctly classify the “easier” 80% of cases instantly, we freed up the actuaries to work on the edge cases, increasing their productivity five-fold.

Challenges
- PRICING ON PARTIAL INFORMATION
- RISK OF MISSING KEY RISK FACTORS
- EXPENSIVE STAFF WASTING TIME ON WORK THAT COULD BE DONE BY MACHINE

Machine Learning to mine hidden information
Estafet’s approach combined domain expertise, data processing, and machine learning for effective, efficient risk pricing. This included:
- Understanding the Data: Pricing actuaries identified important data patterns, such as higher claims for children, severity variations between public and private hospitals, and the impact of medical service type.
- Data Preparation and Quality: We consolidated, translated and standardised data from various sources using Natural Language Programming (NLP) techniques such as lemmatization, stemming, and n-grams to find key information within the text.
- Iterative Experimentation with Machine Learning: We build frameworks to enable rapid testing of different ML techniques for specific fields, so that we could verify results and improve our models. We built on the previous iteration to reach the accuracy we needed.
- Productivity Gains: The goal was never to replace actuaries but to automate easy cases (80%) so actuaries could focus on complex ones. This increased their productivity as they only needed to consider 20% of cases.
- Adaptability Across Domains: This approach and framework are adaptable, applying across diverse domains where large datasets can drive valuable business insights.


The Solution
We knew the solution required a robust, data-driven framework, but the work was in preparing the various datasets and then integrating actuarial knowledge to make our results meaningful. Here’s what we did:
- Domain Expertise Integration: Each day, we had a focussed session with pricing actuaries to understand risk factors and data correlations that influence claim costs, such as age, treatment type, and hospital settings. This allowed us to guide the NLP towards key words and phrases, such as “brain haemorrhage” and “secondary infection”, in the body of the text describing the claim.
- Data Quality and Preparation: We took data from multiple sources and transformed it into a single, canonical form. In addition to the NLP techniques, which helped uncover new or more detailed information, we were also able to fill in missing information from the claim (e.g. age), making records more useful.
- Flexible Machine Learning Frameworks: Experimentation is a key part of the process. We developed a flexible framework to allow rapid prototyping so we could trial different models for different data fields, such as regression models for age-based claims and NLP models for analysing which medical services were involved. Feedback from each iteration improved accuracy, often in unexpected ways.
- Automated Processing to Assist Actuaries: When the ML model assigned a risk category to a hospital, it also calculated the confidence in that decision. This helped us find the cut-off where we needed expert actuaries to review individual cases. Highlighting the hospitals where there was uncertainty (and automating the classification of those where there was confidence) focussed Actuaries attention where it was needed most.
Scalability Across Use Cases: The framework’s adaptability allows Estafet’s solution to be applied to other data-heavy fields, such as IT monitoring or IoT, giving actionable insights for CEOs and CTOs.
Deliverables
- ML PLATFORM TO IMPORT AND ANALYSE MULTIPLE DATA SETS
- FAST AND AUTOMATED CLASSIFICATION OF RISK
- FOCUSSED SET OF CASES WHICH REQUIRE URGENT ATTENTION
The Success
The project delivered a transformative leap in efficiency and accuracy. By collaborating with Estafet, Newline dramatically accelerated insurance risk pricing while enhancing accuracy. Automating 80% of routine cases freed actuaries to concentrate on complex cases, increasing their productivity. The consolidated data and tailored machine learning models provided reliable, actionable insights, allowing faster and better-informed pricing decisions. Looking to the longer term, the Estafet’s flexible framework meant that they could move beyond the initial data on hospitals to examine other untapped reserves of business intelligence.
Outcomes
- REDUCED EXPOSURE TO INSURANCE RISK
- BETTER USE OF EXPERT STAFF
- OPENED OPPORTUNITIES TO MINE COMPANY DATA TO GROW BUSINESS
