This project was completed as part of Break Through Tech’s AI Studio in partnership with Planned Parenthood Federation of America (PPFA).
Our team analyzed and augmented chatbot interaction data to improve the accuracy and reliability of sexual and reproductive health information delivered to users.
- Apply supervised machine learning to classify chatbot responses as accurate or inaccurate.
- Augment the dataset with synthetic data to address class imbalance.
- Improve chatbot performance metrics for better health outcomes.
- Ensure ethical AI practices when handling sensitive health-related data.
- Data Preprocessing: Cleaning and normalizing text from real and synthetic chat transcripts.
- Feature Engineering: TF-IDF vectorization for text classification.
- Modeling: Logistic Regression and Random Forest classifiers, evaluated with a confusion matrix and binary classification metrics.
- Evaluation: Accuracy, precision, recall, and F1-score analysis.
- Ethical Considerations: Ensured privacy protection and fairness across different demographics.
- Achieved improved classification accuracy for identifying low-quality chatbot responses.
- Reduced false negatives, enabling better identification of potentially harmful or inaccurate information.
- Developed insights to guide future chatbot training and deployment strategies.
- Confusion matrix heatmaps showing model performance.
- Precision-recall and ROC curves for model comparisons.
- Integrate the model into the live chatbot system.
- Expand training data for better generalization.
- Implement explainability tools (e.g., SHAP, LIME) for decision transparency.
Camila Lightfoot: Data preprocessing, feature engineering, Logistic Regression model tuning, documentation, and presentation to PPFA stakeholders.
[Other teammates]: (Add their specific contributions if applicable).
- GitHub Repository: Fall-AI-Studio
- Dataset: Combination of real and generated chatbot conversations (private for confidentiality).
- Camila Lightfoot: Data preprocessing, SVM model tuning, GitHub repo maintenance, presentation to mentors.
- [Other teammates]: (List their roles and contributions if applicable)
- GitHub:
- Dataset: