I build event-driven microservice platforms, handle time-series data at scale, and ship applied-AI systems — LLM agents and real-time voice pipelines. I like owning services end-to-end, from architecture to production on Kubernetes.
A real-time speech-to-speech voice assistant: VAD → STT → LLM → TTS streamed over a single WebSocket on one 12 GB GPU. Re-architected a monolithic GPU pipeline into queue-backed worker services (RabbitMQ + vLLM), cutting single-turn latency 2.2s → 1.4s (34%) and p95 under load 7.6s → 3.3s (~55%).
A web-research agent built from scratch — a hand-rolled ReAct loop (no agent framework) with search_web / read_url tools, bounded tool budgets, and source-cited answers. Ships a prompt A/B-test eval harness comparing three research strategies on accuracy, safety, tool usage, and latency.
A Redis server implemented from scratch in Rust — RESP protocol parsing, a key/value store with expiry, and concurrent client handling.
Merged a fix into the JSON language server that powers JSON validation and IntelliSense in VS Code — supporting enum discriminators to prevent an exponential-time validation freeze. A correctness-and-performance fix in a core Microsoft developer tool.
Merged a fix into pnpm (the fast, disk-efficient package manager, 35k+ ★) — fix(default-reporter): erase trailing characters on the progress line, a rendering fix in the CLI's terminal output.