This project explores Natural Language Processing (NLP) techniques to analyze and model text data. The goal is to build and evaluate models that can process human language effectively, with applications in areas such as sentiment analysis, text classification, and explainable AI.
- Preprocessing of raw text (tokenization, stopword removal, lemmatization)
- Feature engineering with TF-IDF and word embeddings (Word2Vec, GloVe, BERT)
- Model training using classical ML algorithms (Logistic Regression, Random Forest)
- Deep learning models implemented in PyTorch (RNNs, LSTMs, Transformers)
- Model evaluation with accuracy, F1-score, and ROC-AUC
- Explainability with SHAP or attention visualization
- Languages: Python
- Libraries: PyTorch, Scikit-learn, Pandas, NumPy, / HuggingFace Transformers