What it is
An LLM gateway (also called an "LLM proxy") sits between an application and the LLM providers it calls. The application sends a request to the gateway; the gateway authenticates, applies policy (rate limits, cost ceilings, redaction, model allowlists), routes to the appropriate provider, captures the response with full telemetry (tokens, latency, cost, prompt, output), and returns it to the application. Vendors include Helicone, Portkey, LiteLLM, OpenRouter, and the gateway layers inside agent operations platforms like AgentPM. Gateways are typically OpenAI-API-compatible so existing client SDKs work without code changes.
Why it matters
Without a gateway, every application calls every provider directly — meaning every team reimplements rate limiting, cost capture, observability, and provider failover. Cost data lives in N vendor consoles and never aggregates. With a gateway, every LLM call across every team and every provider funnels through one capture point: a single bill view, a single audit log, a single place to enforce "no sensitive data leaves to consumer-tier endpoints." Gateways are the substrate that makes cross-vendor cost attribution and unified governance possible. Without one, you cannot answer the basic question "where is my AI bill going across all my vendors."
Key components
- OpenAI-compatible API surface — drop-in replacement for direct provider clients
- Provider routing — by capability, cost, latency, or explicit model selection
- Telemetry capture — every call recorded with cost, tokens, latency, and outcome
- Policy enforcement — rate limits, redaction, allowlists, cost ceilings, residency rules
- Failover and retries — graceful handling when a provider is degraded or down
Related terms
MCP (Model Context Protocol)
Anthropic's open standard for connecting AI models to external data sources and tools. Think of it as a universal adapter for AI.
BYOK (Bring Your Own Key)
A model where users provide their own API keys for AI services (like OpenAI, Anthropic, or other LLM providers) instead of relying on the platform's bundled AI.
Agent Operations
The discipline of running AI agents in production — capturing what they do, attributing what it costs, evaluating what they produce, and intervening when something goes wrong. The operational layer above agent observability and orchestration.
Agent Infrastructure
The runtime, network, and tooling substrate that AI agents need to execute reliably — sandboxed compute, tool access, memory, gateways to LLM providers, and the orchestration plumbing that connects them. Closer to the metal than agent operations.
LLM Cost Attribution
The practice of tying every LLM call back to the task, agent, process, or skill that triggered it — across every vendor — so AI spend can be measured against outcomes, not just tokens.