Improving Feedback in Massive Open Online Course (MOOC) Learning through EEG Analysis

As online learning platforms become increasingly popular, one of the main challenges is obtaining feedback from students about lesson clarity. In traditional classroom settings, teachers can observe their students' facial expressions and body language for signs of confusion, but online learning can lack real-time responses and even the absence of facial expressions if cameras are turned off. To address this issue, graduate student researchers at Carnegie Mellon University designed a classifier using a commercially available electroencephalography (EEG) headset recorder device, which showed weak but above-chance performance in determining whether a student was confused or not (Wang, Li, et. al, 2011).

Our research seeks to reproduce their results and improve upon their classifier by implementing a more stringent model selection process, feature selection, scaling, and grid search cross-validation for hyperparameter optimization. Specifically, we explore the effectiveness of these techniques in improving the accuracy of the EEG-based feedback interaction model for Massive Open Online Courses (MOOCs). By applying these techniques, we aim to improve the accuracy of the classifier and demonstrate the effectiveness of the techniques in improving EEG-based feedback interaction for online learning platforms.

Our paper and slides can be found in the reports folder of this repository

Methods

We aimed to improve the accuracy of a classification task using multiple methods. We tested six industry-standard classifiers, including logistic regression, KNN, SVM, random forest, decision tree, and Naive Bayes, with grid search cross-validation to optimize hyperparameters.

We also implemented LASSO regularization for feature selection and analyzed a correlation heatmap to identify important features. We scaled the data using standardization before training, and tested each model using raw data, scaled data, feature-selected data, and scaled feature-selected data.

Our results showed that the subset of features including Beta1, Alpha1, Gamma2, and Theta achieved the best balance between multicollinearity and individual feature importance. Additionally, our investigation into generating statistical features from the theta band did not result in significant improvements to classification accuracy.

Our findings, as presented in the Results section below, demonstrate a notable enhancement in the accuracy of the classification task by leveraging the various methods discussed above, including the use of multiple models and the implementation of grid search cross-validation, feature selection using LASSO, and scaling of data.

Architectural Diagram

Results

We were able to improve the accuracy of the classification tasks in three out of four cases. Here are our results for the student-specific, personalized models:

Our decision tree was the best model for both predefined and student-defined confusion. The models that achieved the accuracies in the figure above were achieved by standardizing the data and using the entire feature set rather than selecting the subset of features we mention in Methods. Our model had 83% accuracy for user-defined confusion as compared to the 56% baseline in the original paper (27% increase). We were unable to increase the accuracy for the predefined confusion target.

Here are our results for the student-independent, more generalized models:

Random forest models performed the best for these student-independent experiments, with the best model for predefined confusion trained on standardized and feature-selected data. We saw an accuracy of 65% for this task, as compared to 57% in the original paper (8% increase). The best model for student-defined confusion used the entire feature set with no standardization. For this model, we had an accuracy of 74%, increased from the original researchers' 51% (23% increase).

The decision tree and KNN models had moderately worse results but took less time to train than random forests. Decision trees and random forests were the best performers due to their hierarchical approach in decision making and were more interpretable than other models. We have a further analysis of overfitting and the impact of regularization in our paper.

Authors

Alex Bzdel

Amanda Breton

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
cleaned_data		cleaned_data
reports		reports
source_data		source_data
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
EEG_means.ipynb		EEG_means.ipynb
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
StudentIndependent.ipynb		StudentIndependent.ipynb
StudentIndependent_feature_selection.ipynb		StudentIndependent_feature_selection.ipynb
StudentSpecific.ipynb		StudentSpecific.ipynb
StudentSpecific_feature_selection.ipynb		StudentSpecific_feature_selection.ipynb
baseline_models.ipynb		baseline_models.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Feedback in Massive Open Online Course (MOOC) Learning through EEG Analysis

Methods

Architectural Diagram

Results

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Improving Feedback in Massive Open Online Course (MOOC) Learning through EEG Analysis

Methods

Architectural Diagram

Results

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages