π Major: Data Theory | Minors: Data Science Engineering and Korean Language
π± Hometown: Califon, New Jersey (Population: 900)
π« Where am I now?: Los Angeles, California (Population: 3.9 million)
π Hobbies: Listening to music, reading and writing poetry, hiking, exploring art museums
π€ Coursework: Linear Algebra, Mathematical Statistics, Optimization, Machine Learning, Data Mining, Real Analysis, Korean
Programming Languages: Python, R, SQL, Java, C++, Bash
Libraries/Frameworks: BeautifulSoup, CatBoost, Matplotlib, Pandas, PyTorch, Seaborn, Sentence-Transformers, scikit-learn, sqlite3, XGBoost
Tools: ChromaDB, Git, Jupyter Notebooks, Microsoft Office Suite (Excel, PowerPoint, etc), Microsoft SQL Server, Tableau, VSCode
Data Science Intern at Yale University
- Selected as one of 25 undergraduates from a global pool of over 850 applicants for Yale's inaugural Big Data Summer Immersion (BDSY), engaging with Yale faculty over logistic regression, neural networks, advanced SQL querying, and the applications of generative AI models
- Designed and led the presentation of an original research poster to an audience of ~100 Yale faculty and researchers at the Symposium on Big Data, Human Health, and Statistics (more details on this in the Projects section below!)
Machine Learning Researcher at DataRes
- Selected from a competitive university-wide applicant pool to collaborate with peers on deep learning and natural language processing projects
- Most recent project, a Car Manual LLM, makes use of sentence-transformers, the Chroma vector database, and a retrieval-augmented generation (RAG) pipeline --> more information in the Projects section below!
Statistical Researcher at the Lohmueller Lab
- Implementing statistical modeling techniques via Python and Bash scripts to visualize genetic variation across the Channel Island and Gray Fox populations
- Utilizing Hoffman2, UCLA's high-powered computing (HPC) cluster
Peer Learning Facilitator at UCLA AAP
- Serving as a peer mentor for low-income, underrepresented undergraduates in the AAP community
- Started as the sole Math 115A (Linear Algebra) peer tutor during the Spring 2025 quarter, now teaching Statistics 100A (Introduction to Probability)
Assessing Wealth's Influence on Child Mortality Across Peru - Big Data Summer Immersion at Yale
Presented an original research poster, analyzing ~2.5 million pneumococcal cases to identify no wealth-based disparities in child mortality decline using time-series analysis, Bayesian hierarchical modeling, and unsupervised machine learning (Apriori algorithm)
- Collaborators: Antonio Bolea (Yale University) and Kevin Truong (University of California, Berkeley)
- Notable R Libraries: ggplot2, JAGS, arules, arulesViz
Car Manual Large Language Model - DataRes Research Team
Developed Drive and Diagnose (DAD), a scalable multimodal NLP and RAG pipeline capable of processing any car manual PDF or dashboard warning light to help drivers identify and address vehicle issues
- Collaborators: Aryan Gupta, Christian Chen, and Parnika Chaturvedi
- Technologies: ChromaDB, OpenAI CLIP, OpenAI GPT-4 Vision, Python, TypeScript
- Notable Python Libraries: BeautifulSoup, PyPDF2, Sentence-Transformers
