About Me

Team working illustration

Hey, this is Bala Swapnika Gopi!

I work at the intersection of statistics, data science and product analytics. I focus on turning messy, large scale datasets into clear, statistically sound insights that drive real decisions.

My background spans building end-to-end analytics pipelines, defining and owning core metrics, designing experiments and applying statistical methods to understand behavior, measure impact and evaluate tradeoffs. I care deeply about hypothesis testing, significance, causality and validation are central to how I work, not afterthoughts.

I am especially drawn to data science and machine learning roles where strong statistical thinking directly shapes product direction, operational efficiency, and customer experience.

Feel free to explore my work or reach out if you would like to connect or collaborate.

Skills

Languages & Tools

Python R SQL Git Docker VS Code

Statistics

A/B Testing Hypothesis Testing ANOVA Chi-Square Causal Inference Statistical Modeling

Frameworks & Platforms

Airflow AWS (EMR) GCP Flask Jupyter Apache Spark Kafka Streamlit

Libraries

Scikit-learn TensorFlow PyTorch Keras Pandas NumPy HuggingFace spaCy NLTK

ML & Modeling

Linear Regression Logistic Regression Decision Trees Random Forest AdaBoost Gradient Boosting K-Means PCA SVM CNN LSTM Time Series Forecasting BERT EDA ETL Pipelines

Visualization & BI

Matplotlib Power BI Tableau Seaborn Plotly

GenAI

LangChain OpenAI API Stable Diffusion LlamaIndex Prompt Engineering Vector Databases RAG Pipelines LLMs (BERT, GPT) Gradio

What I have done so far

UMCP Logo

Master's Student

University of Maryland, College Park

Aug 2024 - May 2026

Pursuing a Master of Science in Data Science with a strong focus on machine learning, statistics, data engineering, analytics, and applied AI.

UMD Logo

Operations Analyst

University of Maryland

Sep 2024 - Present

Built data driven operational analytics solutions for campus dining, including demand forecasting, customer segmentation, transaction monitoring, and food waste prediction. Used Python, SQL, scikit-learn, and K-Means clustering to improve inventory planning, revenue visibility, menu decisions, and procurement efficiency.

Connyct Logo

Data & AI Intern

Connyct

Jun 2025 - Aug 2025

Automated data collection from 50+ university websites using Python, Selenium, and Beautiful Soup to improve dataset freshness for CampusAI analytics. Built modular data processing, cleaning, and validation workflows to improve reliability of downstream AI and analyst facing solutions.

Infor Logo

Associate Machine Learning Engineer

Infor

Apr 2022 - Aug 2024

Designed and deployed ML and GenAI solutions across enterprise analytics, finance, and operations workflows. Built contextual search pipelines, Natural Language to SQL tools, DeepAR forecasting models, OCR and LLM-based invoice extraction systems, and CLIP based visual search to improve automation, reporting, and knowledge discovery.

JNTUH Logo

Bachelor's Student

JNTUH College of Engineering Jagtial

Aug 2018 - May 2022

Completed a Bachelor of Technology in Electronics and Communication Engineering, building a strong foundation in programming, mathematics, analytical thinking, and engineering problem solving.

Projects

MedMind AI

MedMind AI — Clinical Decision Support System

An AI-powered clinical decision support system that reads patient symptoms, retrieves evidence from PubMed using RAG (BioMedBERT + ChromaDB), and generates differential diagnoses with explainable reasoning traces. Built with LangGraph multi-node agents, FastAPI, and React, featuring self-critique reflection and safety guardrails to prevent hallucinations.

View on GitHub
Real-Time Fraud Detection

Real-Time Fraud Detection Pipeline

End-to-end real-time fraud detection system using Apache Kafka for streaming transactions, Apache Spark for distributed processing, and an ML model for instant fraud classification. Features a live monitoring dashboard and is fully containerized with Docker Compose for scalable deployment.

View on GitHub
Sales Inventory Dashboard

Sales Inventory Dashboard

Built an end-to-end Retail Sales & Delivery Performance Dashboard using Python, Google BigQuery, SQL, and Power BI on the Olist e-commerce dataset. Designed a layered data pipeline (raw to staging to mart to KPI), modeled business KPIs, and developed a 3-page interactive dashboard to analyze category trends, delivery reliability, and state-level sales performance.

View on GitHub
Emotion-To-Art Generator

Emotion to Art Generator

Developed an end-to-end multimodal deep learning system that maps textual emotions to visual art using Hugging Face Transformers, Diffusers (Stable Diffusion v1.5), and Gradio. The pipeline classifies emotion from text and produces emotion-aligned artwork through a fine-tuned diffusion model.

View on GitHub
VLLM Evaluation Project

Evaluating Security & Robustness of Vision-Language Models

Built a VLM safety and robustness evaluation project to assess GPT-4o and GPT-4o-mini on benchmark tasks involving out-of-distribution reasoning, counterfactual questions, and sketch-based visual understanding.

View on GitHub
Real-Time Bitcoin Data Processing

Real-time Bitcoin Data Processing (Ray)

End-to-end real-time data processing pipeline for Bitcoin market data using Apache Ray, enabling scalable, distributed computation and live analytics on streaming financial data.

View on GitHub
Facial Expression Recognition

Facial Expression Recognition

Deep learning model to recognize and classify facial expressions in real-time, leveraging convolutional neural networks for accurate emotion detection across multiple categories.

View on GitHub
CreditCard Fraud Detection

CreditCard Fraud Detection

Machine learning pipeline for detecting fraudulent credit card transactions using imbalanced classification techniques, anomaly detection, and ensemble methods to maximize detection accuracy.

View on GitHub
Customer Churn Prediction

Customer Churn Prediction

Predictive modeling system to identify customers likely to churn, using feature engineering, classification algorithms, and business-driven insights to support proactive retention strategies.

View on GitHub
Fantasy Cricket Game

Fantasy Cricket Game

Interactive fantasy cricket application where users build virtual teams, track live scores, and compete based on real player performances, combining data-driven predictions with engaging gameplay.

View on GitHub

Get in touch

Do you have a project in your mind, contact me here