Odyssey-ASL: Real-Time American Sign Language Letter and Word Detection

Project Overview

Odyssey-ASL is a machine learning project designed to bridge communication between American Sign Language (ASL) users and non-signers. The goal was to build a real-time system that can recognize and translate ASL hand signs into letters, words, and potentially full sentences using computer vision and deep learning techniques.

Led by a student engineering team alongside Cal Poly Pomona's Software Engineering Association, this project aimed to provide both an accessible tool for communication and a hands-on learning platform for developing skills in machine learning, computer vision, and agile software practices.

Features

Video-Based Gesture Recognition: Supports real-time ASL detection from live video feeds.
Hybrid Model Architecture: Combines Convolutional Neural Networks (CNNs) for spatial feature extraction and Long Short-Term Memory Networks (LSTMs) for temporal sequence learning.
Data Preprocessing Pipeline: Includes frame extraction, resizing, and normalization for consistent input to the model.
Training Evaluation: Supports model accuracy reporting, loss tracking, and visualization of confusion matrices.

Learning Objectives

This project was designed as a deep-dive into:

Machine Learning fundamentals: preprocessing, model training, evaluation, and tuning.
Computer Vision techniques for gesture recognition.
Model fusion strategies for enhanced performance.
Agile team collaboration with Scrum-based sprints and code reviews.
Real-time application development with Python, TensorFlow, and OpenCV.

Dataset

Letter Detection: Sign Language MNIST
Word Detection: World-Level American Sign Language (WLASL)

Preprocessing Steps:

Extracted a fixed number of frames per video.
Resized frames to uniform dimensions.
Normalized pixel values from [0, 255] to [0, 1] for neural network compatibility.
Augmented the dataset using techniques like flipping, rotation, and brightness scaling to improve generalization.

Model Architecture

3D Convolutional Neural Networks (3D CNN): For learning spatiotemporal features from video data.
CNN-LSTM Combination: A hybrid architecture combining CNNs for frame-level spatial analysis and LSTMs for sequence modeling, enabling effective gesture interpretation over time.

Real-Time Implementation

Framework: TensorFlow / Keras for model training.
Video Input: OpenCV used to capture real-time webcam streams.
Output: Model predictions overlayed directly on the video stream for real-time feedback.

Challenges & Solutions

Challenge	Solution
Limited labeled data for ASL gestures	Used data augmentation to expand training data.
Model overfitting to training conditions	Standardized background, clothing, and signing speed in both training and testing environments.
Real-time inference vs model complexity	Balanced performance using batch normalization, LSTM unit tuning, and convolution layer reduction.

Installation

Create and activate a virtual environment (recommended):

python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

Install project dependencies:
```
pip install -r requirements.txt
```
Download and place the dataset folders (grouped_videos_10) in the Words->Data directory.

Project Timeline

Sprint	Focus
Sprint 1	Project kickoff, research, and planning
Sprint 2	Dataset acquisition and preprocessing
Sprint 3	Model selection and baseline implementation
Sprint 4	Model tuning and feature engineering
Sprint 5	Real-time testing with MediaPipe integration
Sprint 6	Advanced techniques and model fusion (if applicable)
Sprint 7	Final prototype and demonstration

Contributors

Project Lead: Tony Gonzalez
Co-Author: Prerna Joshi
Nhan Thai
Iker Goni
Kayla Scarberry
Michael Castillo
Brisa Ramirez
Kevin Kopcinski
Vincent Terrelonge
Ben Stevenson
Kathee Avendano

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Alphabet		Alphabet
Words		Words
ASL Final Presentation.pdf		ASL Final Presentation.pdf
ASL Model Demo.mp4		ASL Model Demo.mp4
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Odyssey-ASL: Real-Time American Sign Language Letter and Word Detection

Project Overview

Features

Learning Objectives

Dataset

Preprocessing Steps:

Model Architecture

Real-Time Implementation

Challenges & Solutions

Installation

Project Timeline

Contributors

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Odyssey-ASL: Real-Time American Sign Language Letter and Word Detection

Project Overview

Features

Learning Objectives

Dataset

Preprocessing Steps:

Model Architecture

Real-Time Implementation

Challenges & Solutions

Installation

Project Timeline

Contributors

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages