HELLO!

My name is Dongim, and I am passionate about data analysis and model training, integrating machine learning and AI! With a strong background in natural language processing, geospatial analysis, and autonomous systems, I am highly motivated to solve real-world problems through impactful technologies.

8

Patents applied

100+

Students tutored

5

Research experience

12+

Years of Playing Guitar

3

AI-related certifications

3rd

Degree in Taekwondo

SKILLS

PROGRAMMING: Python • C/C++ • R • MATLAB • HTML

MACHINE LEARNING: PyTorch • TensorFlow • NLTK • OpenCV • Scikit-learn

DEVOPS & TOOLS: AWS • Docker • VMware • PostgreSQL • Linux/Unix • Web Scraping • Django

PROJECTS

ROAD SAFETY INTELLIGENCE WITH AUGMENTED LLM

MIT Break Through Tech, Michelin Mobility Intelligence

Develop a natural language interface chatbot for geospatial analysis of LA crash data.

Project Image
LangChain
Geospatial Analysis
Streamlit

Developing a natural language interface for geospatial analysis of LA crash data, utilizing LangChain for function calling, to handle complex queries within the geospatial dataset. Performed exploratory data analysis (EDA) and integrated Points of Interest data with crash data using Haversine formula for accurate distance calculations, to enable automated spatial data extraction and analysis using LLMs. Developing a responsive interface using Streamlit, integrating users' real-time GPS, to provide robust and interactive geospatial insights.

Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image

FINE-TUNING ASR MODELS ON STUTTERING RECORDINGS

Olin Public Interest Technology

Fine-tune ASR models to reduce bias against stuttered speech.

Project Image
ASR Models
SageMaker
LibriSpeech

Leading a team to address bias in Automatic Speech Recognition (ASR) models against stuttered speech using LibriSpeech/Stutter data. Identified disparities in Word Error Rate (WER), with OpenAI’s Whisper showing a 2x increase and Facebook’s Wav2Vec a 6.4x increase in transcribing stuttered speech; Successfully reduced Whisper’s WER by 3% and Wav2Vec’s by 33% by removing repeated words. Fine-tuned Wav2Vec on AWS SageMaker, built word tokenizers for 2268 characters with Chinese stuttered speech, and uploaded the model to Hugging Face.

Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image

BENCHMARKING STUTTERING RECORDINGS AGAINST ASR MODELS

Boston University, AImpower.org

Assess bias in Mandarin speech recognition using ASR models.

Project Image
ASR Models
NLTK
Pydub

Evaluated leading ASR models (Whisper, Google Speech-to-Text, Wav2Vec, Azure, WeNet) to assess bias in recognizing Mandarin stuttered speech. Segmented 50+ hours of Mandarin speech data with labeled transcriptions using Pydub, processed it on BU’s Shared Computing Cluster, addressed hallucinations, and calculated WER, CER, BLEU (NLTK), WordNet Wu-Palmer Similarity, and GloVe Cosine Similarity. Demonstrated that more stutter segments lead to higher error rates, with WeNet achieving a WER of 0.30, outperforming Wav2Vec's 0.52; analyzed model performance across stutter types, revealing a 0.2 WER difference between sound repetitions and interjections. Authored comprehensive reports explaining model performance and bias analysis, including technical visualizations; Conducted weekly meetings and presentations with the client company’s CEO.

Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image

INVERTED PENDULUM ROBOT SIMULATION TEAM

Olin Autonomous Robot Training Lab

Train an inverted pendulum robot to balance in the upright position.

Project Image
Reinforcement Learning
W&B
Hyperparameter Sweeps

Developed a custom Gym environment to train an inverted pendulum robot to balance in the upright position, featuring real-time simulation visualization. Applied the Proximal Policy Optimization reinforcement learning (RL) algorithm, performing hyperparameter sweeps (e.g., learning rate, reward function, entropy coefficient) to optimize model performance; Used Weights & Biases for machine learning experiment tracking. Successfully solved the CartPole swing-up problem, achieving stable balance in simulation through iterative RL training.

Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image

AI-GENERATED IMAGE CLASSIFICATION MODEL

ML Class Project

Build a deep learning model to classify AI-generated images from MidJourney.

Project Image
CNN
Image Classification
MidJourney

Developed a deep learning model using CNNs to classify AI-generated images from MidJourney against real images, with 11 classes. Applied data augmentation techniques to enhance generalization, with max-pooling layers to prevent overfitting. Achieved up to 80% accuracy on test data, demonstrating the model's capability to distinguish between real and AI-generated images.

Expanded Project Image Expanded Project Image Expanded Project Image

UNDP SUDAN 2024 CONFLICT EVENTS ANALYSIS

United Nations Development Programme

Analyzed relationship between conflict events, refugees movements, and food insecurity in Sudan.

Project Image
H3 Indexing
Confusion Matrix
Data Visualization

Analyzed and visualized the relationship between conflict events, refugees movements, and food insecurity in Sudan from 2019 to 2024, using H3 hexagonal indexing for geospatial data analysis, identifying conflict hotspots, trends, and humanitarian impacts across regions. Calculated confusion matrices to reveal strong correlations between the variables. (e.g., r=0.85 between conflict events and food insecurity levels by region indicates that areas with higher conflict tend to experience more severe food insecurity.)

Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image Expanded Project Image

CERTIFICATIONS

MACHINE LEARNING FOUNDATIONS
Cornell University, Jul 2024
DEEP LEARNING WITH PYTORCH : GENERATIVE ADVERSARIAL NETWORK
Coursera, Jun 2024
MACHINE LEARNING WITH PYTHON
IBM, Dec 2023