Multi-Model Failover In Your AI Gateway
Think about two scenarios that are pretty common. 1) You hit a rate limit or run out of tokens, so you have to "downgrade" to a small/less powerful Model. 2)
Managing an Agents Uptime (Reliability Engineering for Agents)
"treat 'em like cattle, not pets".
This was, and continues to be, how many look at Kubernetes Pods and microservice-based architecture. It makes a lot of sense for objects like
Configuring Tool Traces In Your MCP Gateway
An Agent makes a call to an LLM. The LLM decides which MCP server tool should be used for a task. The Agent then makes a call to said tool. This can happen
Building Your Production-Grade SRE Agent
AI is only as good as the information you provide it. Aside from the general hallucinations and wild outcomes we sometimes see from LLMs, the general gist of an Agent not performing as
Making Your Agent Model-Aware With Inference Extension vLLM, & Routing
Your Agent has a "mind of its own" (well, it was programmed to act a particular way). For example, Claude Code is known to downgrade your Model for particular tasks to