Foundation Model

What it is

Foundation model is the term coined by Stanford's CRFM in 2021 for AI models trained on very broad data (text, sometimes plus images, audio, code, video) at enormous scale, intended to serve as the substrate for many downstream uses rather than a single task. The defining characteristics: (1) trained once at high cost on a vast corpus, (2) general-purpose enough to be adapted (through prompting, fine-tuning, or RAG) to many specific applications, (3) typically built and operated by a relatively small number of vendors with the compute to train them. Examples include OpenAI's GPT family, Anthropic's Claude family, Google DeepMind's Gemini, xAI's Grok, Meta's Llama (open-weight), DeepSeek's V3 and R1 (open-weight), Mistral's family, and Qwen. The "foundation" framing emphasizes that other software builds on top — the way operating systems were the foundation for applications in the prior software era.

Why it matters

The foundation-model layer is the most consequential infrastructure shift of the decade. The relationship a business has with foundation models — whether as a buyer of API access, an operator of open-weight models on its own infrastructure, or a partner inside a hyperscaler relationship — shapes its AI economics, its data governance posture, and its strategic optionality. For agent operations specifically, the central architectural question is whether the agents are tightly coupled to one foundation model or can swap between several based on capability, cost, and policy (see vendor-neutral AI, capability registry). The vendor landscape consolidates and reshapes constantly — a model that's state-of-the-art today may be third-tier in eighteen months — so designs that bind to a single foundation model carry meaningful migration risk.

Key components

Trained at scale — vast corpora, large parameter counts, multi-million-dollar training runs
General-purpose substrate — adapted downstream via prompting, fine-tuning, or RAG
Vendor landscape — Anthropic, OpenAI, Google, xAI, Meta (open-weight), DeepSeek, Mistral, Qwen
Open-weight vs closed — Llama, DeepSeek, Qwen are downloadable; GPT, Claude, Gemini, Grok are API-only
Reshapes constantly — frontier capability shifts among vendors quarterly, making vendor-neutral architecture valuable

Related terms

LLM (Large Language Model)

The AI technology behind ChatGPT, Claude, and the intelligence in Agentforce. Trained on massive amounts of text to understand and generate human language.

AI Agent

An autonomous AI system that can perceive its environment, make decisions, and take actions to achieve specific goals - without constant human direction.

Vendor-Neutral AI

An architecture pattern where AI capabilities — skills, agents, evaluations — are defined separately from the LLM vendor that runs them, so the same capability can execute on Anthropic, OpenAI, xAI, Gemini, or local models without rewriting.

Capability Registry

A structured catalog that maps AI capabilities (reasoning, structured output, tool use, vision, long context) to the models that can serve them — the substrate that makes skills portable across LLM vendors.

Reasoning Model

A class of large language model trained to spend hidden internal "thinking" tokens before producing a user-facing answer — often dramatically improving performance on math, code, science, and complex multi-step problems compared to non-reasoning models of similar size.

Back to AIki Glossary