LLM Gateway

What it is

An LLM gateway (also called an "LLM proxy") sits between an application and the LLM providers it calls. The application sends a request to the gateway; the gateway authenticates, applies policy (rate limits, cost ceilings, redaction, model allowlists), routes to the appropriate provider, captures the response with full telemetry (tokens, latency, cost, prompt, output), and returns it to the application. Vendors include Helicone, Portkey, LiteLLM, OpenRouter, and the gateway layers inside agent operations platforms like AgentPM. Gateways are typically OpenAI-API-compatible so existing client SDKs work without code changes.

Why it matters

Without a gateway, every application calls every provider directly — meaning every team reimplements rate limiting, cost capture, observability, and provider failover. Cost data lives in N vendor consoles and never aggregates. With a gateway, every LLM call across every team and every provider funnels through one capture point: a single bill view, a single audit log, a single place to enforce "no sensitive data leaves to consumer-tier endpoints." Gateways are the substrate that makes cross-vendor cost attribution and unified governance possible. Without one, you cannot answer the basic question "where is my AI bill going across all my vendors."

Key components

OpenAI-compatible API surface — drop-in replacement for direct provider clients
Provider routing — by capability, cost, latency, or explicit model selection
Telemetry capture — every call recorded with cost, tokens, latency, and outcome
Policy enforcement — rate limits, redaction, allowlists, cost ceilings, residency rules
Failover and retries — graceful handling when a provider is degraded or down

What it is

Why it matters

Key components

Related terms

MCP (Model Context Protocol)

BYOK (Bring Your Own Key)

Agent Operations

Agent Infrastructure

LLM Cost Attribution

Need Help Implementing This?