Inspiration

Mental health has always been a topic close to my heart. During my time as a student, I watched peers around me silently struggle — showing up to class exhausted, disengaged, and overwhelmed — while no one around them had any idea how serious things had become. The statistics are staggering. 1 in 5 students experiences a diagnosable mental health condition. Yet less than 25% of them ever seek professional help. The gap isn't always resources — it's awareness and timing. That question stuck with me: what if we could detect the warning signs before the crisis hits? When I saw this hackathon, I knew exactly what I wanted to build. Not another chatbot. Not another mood tracker. But a real, data-driven early warning system that gives counselors actionable intelligence — weeks before a student reaches breaking point.

What I Learned

This project taught me more than I expected — both technically and humanly. Technically:

How to merge and clean real-world datasets with inconsistent formats, missing values, and encoding issues How to engineer a meaningful target variable from raw behavioral data using weighted composite scoring How deeply XGBoost understands tabular data — the feature importance results genuinely surprised me How CSS injection in Streamlit can transform a basic Python script into something that looks and feels like a real product

About the problem:

Depression and suicidal thoughts are by far the strongest predictors of risk — confirming what mental health researchers have known for years Sleep quality and financial stress are underrated indicators that institutions rarely track Early intervention doesn't require expensive tools — it requires the right data, organized correctly

How I Built It

The project was built in three distinct phases: Phase 1 — Data I started with two small datasets totaling just 621 rows — far too small for a reliable model. After researching publicly available mental health datasets on Kaggle, I found the Student Depression Dataset with 27,870 records. The challenge was that all three datasets had completely different structures, column names, and encodings. I spent significant time:

Standardizing column names and formats Converting CGPA ranges like "3.00 - 3.49" into numeric midpoints Mapping ordinal categories like sleep duration into numerical scales Engineering the Risk_Level target variable using a weighted formula:

Risk Score=4.0×Depression+3.5×Suicidal Thoughts+1.5×Academic Pressure+1.2×Financial Stress The final merged dataset had 27,971 rows, 13 columns, and zero null values.

Phase 2 — Model I chose XGBoost for its proven performance on tabular data. After tuning hyperparameters — learning rate, max depth, subsample ratio — the model achieved: Accuracy=97.7%F1=0.977ROC-AUC=0.999 The most revealing insight came from feature importance. Depression had an importance score of 0.549 — more than all other features combined. This validated the model's logic and gave me confidence in its real-world reliability.

Phase 3 — Web Application I built the entire interface in Streamlit with custom CSS injection. The goal was to make it feel like a real clinical tool — not a student project. Every card, color, font, and layout decision was intentional:

Dark theme — reduces eye strain for counselors using it daily Color-coded risk results — green, amber, red for instant recognition Probability breakdown bars — transparency into why a student is flagged Three-page structure — Overview for management, Predictor for counselors, Model Performance for technical reviewers

Challenges

  1. Small Initial Dataset The first two datasets combined had only 101 usable rows after merging — completely insufficient for XGBoost. Finding, validating, and integrating a third dataset while maintaining column consistency was the most time-consuming part of the project.
  2. Engineering the Target Variable There was no pre-existing "risk level" column. I had to design a composite scoring formula that was medically logical, mathematically sound, and balanced across classes. Getting the bin thresholds right so the three classes were reasonably distributed took multiple iterations.
  3. Making Streamlit Look Professional Default Streamlit looks generic. Achieving the polished dark UI required deep CSS injection — overriding Streamlit's internal component classes, custom HTML card components, and careful attention to typography and spacing. It was tedious but worth every line.
  4. Keeping It Simple Without Losing Depth The hardest design decision was knowing what to leave out. Early versions had too many charts, too many tabs, and too much complexity. Stripping it back to three clean pages — while keeping all the technical depth judges look for — required multiple rounds of simplification.

What's Next

MindTrack in its current form is a strong proof of concept. Given more time, I would:

Integrate a real institutional database for live student data ingestion Add a counselor dashboard with student tracking over time Build a mobile-friendly version for students to self-assess Explore federated learning so institutions can train on private data without sharing it Partner with universities to validate predictions against real counseling outcomes

Final Thought

"The best time to help a struggling student was two weeks ago. The second best time is right now — with the right tools."

MindTrack is my attempt to build those tools. I hope it starts a conversation about how data, used responsibly, can make student support systems smarter, faster, and more human.

Built With

Share this project:

Updates