Inspiration
The project was inspired by the real-world challenge of preventing avoidable hospital readmissions, especially in emergency departments (EDs). Frequent ED readmissions not only indicate gaps in patient care but also contribute to overcrowding and strain hospital resources. We wanted to use data science to proactively identify patients at high risk of returning, while providing actionable support through a chatbot.
What it does
Assess a patient — Describe a patient in plain language (e.g. 72 year old male with COPD and CHF, temp 101, BP 135/85, pulse 110, pain 8/10) or upload an ED record PDF; the system parses vitals and conditions, runs a readmission model, and returns a Patient Summary, risk score, and recommendations. Ask the knowledge base — Ask clinical questions (e.g. What are risk factors for ED revisits?) and get relevant excerpts from the indexed guidance. Stats — View precomputed NHAMCS-based stats (72-hour revisit rates, admission rates by condition and region).
How we built it
Data — NHAMCS ED encounter data (e.g. SAS ZIP) is loaded from paths in config.yaml; the pipeline builds stats (regional and condition-level 72-hour revisit and admission percentages) and writes artifacts/stats.json. Readmission model — A stacked ensemble classifier (trained through our modeling pipeline) is loaded from artifacts/ at service startup. The ensemble combines predictions from LightGBM, Random Forest, and Logistic Regression base learners. It generates a base readmission probability using patient features such as age, sex, vital signs, comorbidities, triage level, and visit characteristics. The chatbot then applies an evidence-informed clinical risk adjustment — implemented as log-odds shifts for factors like abnormal vitals, high-risk conditions, age brackets, and triage acuity — to produce the final probability presented to the user. RAG knowledge base — Markdown files under med_proj/rag/knowledge_base/ are indexed in two ways: TF-IDF (scikit-learn): document-level vectors → artifacts/kb_index.joblib (always built). FAISS (optional): same docs are chunked (RecursiveCharacterTextSplitter), embedded with sentence-transformers/all-MiniLM-L6-v2, and stored as artifacts/rag_faiss/. Retrieval uses FAISS when present, otherwise falls back to TF-IDF. ED form parsing — Uploaded PDFs are converted to text (pypdf); regex and NLP extractors fill structured state (age, sex, vitals, ESI/triage, conditions, chief complaint, allergies, disposition, diagnosis) and that state is merged into the chat session for the next assessment.
Challenges we ran into
The majority of our observations did not return to the ER, causing class imbalance which is common in healthcare data. This can be problematic if unaddressed, since the model will be prone to false negatives. Our data so had a large amount of columns., meaning we needed to find an efficient way to select relevant features.
Accomplishments that we're proud of
We are proud of our ability to apply data science for better patient care. This was done by building a predictive model to identify high-risk ED patients, helping hospitals anticipate and reduce avoidable readmissions.
What we learned
This project-building process taught us about integrating machine learning predictions with a Retrieval-Augmented Generation (RAG) chatbot, enabling the system to provide context-aware explanations.
What's next for Acuity.ai
Next, we would refine model interpretability: integrate SHAP or other explainability tools so clinicians can see why a patient is at high risk. Expand dataset coverage: incorporate longitudinal patient data and additional ED metrics to improve prediction accuracy. Deploy real-time RAG chatbot: enable live querying of patient risk and actionable recommendations directly in the ED workflow. Explore proactive interventions: link predictions to care pathways, helping reduce unnecessary readmissions.
Log in or sign up for Devpost to join the conversation.