π Hi, Iβm Harsh β I Build Distributed AI Systems, MCP Agents & High-Performance Inference Platforms
Iβm a Senior Machine Learning & Distributed Systems Engineer with 10+ years of experience architecting large-scale AI infrastructure, GPU inference platforms, and multi-agent (MCP) systems. I specialize in mission-critical systems demanding tight control over latency, reliability, orchestration, and control-plane design.
I've shipped:
- Real-time distributed systems at Microsoft
- Identity & risk-scoring engines at AWS
- Multi-agent (MCP) platforms for automating developer workflows end-to-end
- Designed real-time platforms processing 10M+ GPU telemetry events/day (TP99 <120ms, 99.99% uptime, Azure-scale compute)
- Architected global, multi-region orchestration systems using Kubernetes, Synapse, Spark, ADF (2s pipeline latency reduction)
- Built low-level telemetry & diagnostics for Maia 100 AI accelerators (Redfish API integration)
- Developed multiple MCP servers powering:
- Automated PR generation
- Repo-wide code intelligence
- Contextual retrieval from CI/CD + logs
- Issue tracking & GitHub tool integration
- Implemented deterministic workflows, tool-calling chains, and developer automation pipelines
- Designed Mosaic-style agent frameworks (planning, reasoning, orchestration)
- Created end-to-end log-based RAG for diagnostics/investigation workflows
- Built a full restaurant recommendation RAG system with LlamaIndex + Elasticsearch (vector search, hybrid retrieval, embeddings, caching)
- Developed semantic search tools for design docs (Azure AI Foundry + Semantic Kernel). Improved retrieval efficiency by 60%
- Architected microservice-based ML pipelines and anomaly detection frameworks
- Built scalable ETL pipelines (Spark, DynamoDB, Kafka), integrated distributed monitoring/alerting
- Designed REST APIs, CI/CD workflows, and containerized services for cloud platforms
Infrastructure & Distributed Systems
- Kubernetes β’ Docker β’ Spark/Flink β’ Synapse β’ Redfish API
- Control Plane Design β’ Autoscaling β’ Routing
AI / ML / LLM Systems
- Inference Pipelines β’ Vector Search β’ RAG β’ Embeddings β’ Observability
- Feature Engineering β’ GPU Telemetry
Agents & MCP
- Tool Calling β’ Multi-agent Orchestration β’ PR/Repo Automation β’ Deterministic Workflows β’ MCP Servers
Cloud Platforms
- Azure (AI Foundry, Functions, Compute, AI Search)
- AWS (SageMaker, DynamoDB, CloudFormation)
- GCP (Familiar)
Languages
- Python β’ Go β’ Java β’ C++ β’ Bash β’ JavaScript/Node β’ SQL/NoSQL
A complete LlamaIndex + Elasticsearch based system utilizing multi-source ingestion, hybrid retrieval, embeddings, and chat-style personalization.
Multi-agent workflow automation for code review, PR generation, CI/CD understanding, and intelligent repo analysis.
Replicated Databricks Agent Bricks patterns: tool orchestration, structured reasoning, vector-based retrieval, agent messaging layers.
- Building distributed inference & scheduling systems
- Designing latency-aware routing, capacity planning, and control-plane components
- Creating MCP-enabled agent ecosystems for automation & reasoning
- Optimizing GPU utilization and system reliability at scale
- Scaling observability, health monitoring, and model versioning
- Architecting backend systems for mission-critical AI workloads
- Advancing agent orchestration with MCP
- Building deterministically reproducible agent workflows
- Improving inference through caching, batching, and routing
- Developing RAG systems grounded in operational logs & telemetry
- Exploring LLM safety, validation, and structured reasoning integrations
If youβre working on high-performance AI infrastructure, next-gen inference, or agentic frameworks, letβs connect!
I'm especially interested in collaborations where safety, reliability, and real-time performance are paramount.

