Infrastructure Archives - MERCHANT PROTOCOL

MERCHANT PROTOCOL · Issue 03 Category: Infrastructure 01

Category: Infrastructure

Infrastructure

What Is Lightpanda? The Headless Browser Built for AI

By Jonathon Byrdziak · 12 min read · Mar 13, 2026

You keep hearing about AI agents that browse the web, scrape data, and automate workflows — but behind the scenes, most of them are dragging around a full copy of Chrome that eats 200+ MB of RAM per tab. One infrastructure team switched their scraping pipeline from headless Chrome to Lightpanda and cut their AWS bill from $480/month to under $60 — running 10x more concurrent sessions on the same hardware. In this article, I'll show you exactly what Lightpanda...

Continue reading →

· · ·

What Is vLLM? Serve LLMs 4x Faster with PagedAttention

By Jonathon Byrdziak · 13 min read · Mar 09, 2026

You have a GPU, a model, and a dream of self-hosted inference â€” but every request eats 60-80% more memory than it should, and your throughput flatlines. A team at UC Berkeley built vLLM with PagedAttention, cutting KV cache waste to under 4% and pushing throughput 2-4x higher than anything else available. Anyscale reported serving thousands of concurrent users on the same hardware that previously choked at hundreds. In this article, you''ll learn exactly how PagedAttention works, how vLLM compares...

Continue reading →

· · ·

Ollama: Run AI Models Locally for Free

By Jonathon Byrdziak · 16 min read · Mar 06, 2026

You know AI APIs exist — ChatGPT, Claude, Gemini — but every token costs money, your data leaves your machine, and you have zero control over the model. Ollama changes that. One DevOps engineer used Ollama to self-host LLMs on a $3,000 GPU workstation and cut his annual AI spend from over $60,000 in API fees down to $2,400 a year. In this article, you will learn exactly how to install Ollama, run your first model, and build a private...

Continue reading →

· · ·

DuckDB: Query Billions of Rows on Your Laptop

By Jonathon Byrdziak · 16 min read · Mar 02, 2026

DuckDB lets you run analytical queries on billions of rows using nothing but your laptop. No servers, no cloud bills, no waiting. A developer at Vantage switched from PostgreSQL to DuckDB for querying AWS cost data and turned a 7-minute query into a 4-second one — a 110x speedup on the same hardware. This article walks you through exactly what DuckDB is, when to use it, and how to start processing massive datasets locally in minutes.\n\n[mp_toc]\n\n\nTL;DR:\n\nDuckDB is a free, open-source...

Continue reading →

· · ·

llama.cpp: Run AI Models Locally for Free

By Jonathon Byrdziak · 16 min read · Feb 21, 2026

You know AI costs are eating your budget — but you assume running models yourself requires a server room and a PhD. With llama.cpp, a single developer can run powerful language models on a laptop, a desktop, or a cheap used workstation — with zero API fees, total data privacy, and performance that rivals cloud services. One engineer on Hacker News reported replacing three SaaS AI subscriptions after loading a quantized model onto a $800 refurbished workstation, cutting monthly costs...

Continue reading →

· · ·

AI Terraform: English to Deployed in Minutes

By Jonathon Byrdziak · 14 min read · Feb 17, 2026

You know Terraform is powerful, but writing HCL from scratch feels like learning a second language just to spin up a server. What if you could describe your infrastructure in plain English and get production-ready code in minutes? A DevOps team at a financial services company did exactly that — cutting deployment time from days to under 30 minutes using AI-generated Terraform. In this article, Merchant Protocol walks you through the exact tools, prompts, and workflows to go from English...

Continue reading →

· · ·

What Is ChromaDB? The Easiest Vector Database

By Jonathon Byrdziak · 15 min read · Feb 16, 2026

You know you need a vector database for your AI project, but the options are overwhelming and most of them demand infrastructure expertise you don't have. One indie developer wired ChromaDB into a customer support bot in under an hour and cut response latency by 60 percent — with four lines of Python. In this article, you'll learn exactly what ChromaDB is, how it compares to Pinecone and Weaviate, and how to go from pip install to production-ready RAG pipeline.\n\n[mp_toc]\n\n\nTL;DR:\n\nChromaDB...

Continue reading →