Predict health insurance costs using Linear Regression on demographic and health features. This project showcases end-to-end machine learning, from data cleaning and feature engineering to visualization and error analysis. Interaction terms capture combined effects like smoking Γ BMI and smoking Γ age.
The model predicts insurance charges based on age, BMI, smoking status, and interaction features. Linear regression quantifies the relationship between predictors and charges. Includes visualizations, correlation analysis, and detailed error metrics to evaluate performance.
π Linear Regression Model
β‘ Feature Engineering β Interaction terms for combined effects.
π§Ή Data Cleaning β Handles missing values, encodes categorical variables.
π Exploratory Data Analysis (EDA) β Scatter plots, boxplots, histograms, correlations.
π Error Analysis β MAE, RMSE, RΒ², and percent error across age groups and smoker status.
π Visualizations β Actual vs predicted insurance charges
Python
Pandas & NumPy
Matplotlib & Seaborn
scikit-learn