Titanic Survival Prediction

This project involves analyzing the famous Titanic dataset to predict passenger survival using various machine learning models. The goal was to achieve the highest prediction accuracy possible.

Project Overview

The Titanic Survival Analysis project aimed to predict whether a passenger survived the disaster using features such as age, gender, ticket class, and other attributes. The analysis includes EDA, feature engineering, model building, and evaluation.

Dataset

The Titanic dataset used for this project contains the following key features:

PassengerId: Unique identifier for each passenger
Pclass: Ticket class (1st, 2nd, or 3rd)
Sex: Gender of the passenger
Age: Age of the passenger
SibSp: Number of siblings/spouses aboard
Parch: Number of parents/children aboard
Ticket: Ticket number
Fare: Passenger fare
Cabin: Cabin number (if available)
Embarked: Port of embarkation (C, Q, S)

The test set lacks the Survived column, requiring the model to predict survival outcomes.

Exploratory Data Analysis (EDA)

Key insights gained from EDA:

Majority of passengers in higher classes had a higher survival rate.
Women had a significantly higher survival rate compared to men.
Passengers with lower ticket fares had lower survival rates.

Feature Engineering

The following transformations and feature engineering techniques were applied:

Handling missing values in columns like Age, Cabin, and Embarked.
Creating new features based on ticket class and family size.
Encoding categorical variables.
Normalizing numerical features to improve model performance.

Model Building

Various machine learning models were tested, including:

Logistic Regression
Random Forest
XGBoost
CatBoost
MLP Classifier

Hyperparameter tuning was performed to optimize model performance. The Random Forest Classifier (after resampling) provided the most accurate and balanced predictions.

Hyperparameter tuning was performed to optimize model performance.

Model Evaluation

The best-performing model achieved an accuracy score of 0.77 on Kaggle. The Random Forest Classifier (Resampled) was finalized as the best model based on evaluation metrics.

Evaluation Metrics

Metrics Table for comparison (Accuracy, Precision and Recall)
Confusion Matrix
ROC-AUC Curve for best performing model

Kaggle Submission

A final submission was made to Kaggle, achieving a prediction score of 0.77.

Conclusion

Effective feature engineering and hyperparameter tuning significantly contributed to the model's performance.
The test set's prediction accuracy of 0.77 demonstrates a strong predictive model.
Further improvements could involve advanced ensemble methods.

How to Run the Project

Clone this repository:
```
git clone <repository-url>
```
Navigate to the project directory:
```
cd Titanic_Survival_Analysis
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Run the Jupyter Notebook:

jupyter notebook Titanic_Survival_Analysis.ipynb

Live App

App Link:- Click Here

Acknowledgments

Kaggle for providing the Titanic dataset.
Data Science Community for continuous learning and support.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
catboost_info		catboost_info
LICENSE		LICENSE
README.md		README.md
Titanic_Survival_Analysis.ipynb		Titanic_Survival_Analysis.ipynb
app.py		app.py
confusion-matrices.png		confusion-matrices.png
metrics-table.png		metrics-table.png
requirements.txt		requirements.txt
rf_res_model.pkl		rf_res_model.pkl
roc-auc-curve.png		roc-auc-curve.png
titanic_test.csv		titanic_test.csv
titanic_train.csv		titanic_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic Survival Prediction

Table of Contents

Project Overview

Dataset

Exploratory Data Analysis (EDA)

Feature Engineering

Model Building

Model Evaluation

Evaluation Metrics

Kaggle Submission

Conclusion

How to Run the Project

Live App

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Titanic Survival Prediction

Table of Contents

Project Overview

Dataset

Exploratory Data Analysis (EDA)

Feature Engineering

Model Building

Model Evaluation

Evaluation Metrics

Kaggle Submission

Conclusion

How to Run the Project

Live App

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages