R

Reasoning Model

A class of large language model trained to spend hidden internal "thinking" tokens before producing a user-facing answer — often dramatically improving performance on math, code, science, and complex multi-step problems compared to non-reasoning models of similar size.

What it is

A reasoning model is an LLM that is post-trained (usually with reinforcement learning) to emit a structured internal reasoning trace before its final answer. The user typically does not see the trace; what they see is a higher-quality response that took longer to produce. Examples include OpenAI's o1 and o3 family, Anthropic's extended-thinking modes (in Claude Opus 4.x and Sonnet 4.x), DeepSeek R1, Qwen QwQ, Google Gemini Deep Think, and xAI's reasoning variants. Reasoning models are usually slower per query and more expensive per answer than non-reasoning models of similar parameter count, but they unlock measurably better performance on benchmarks involving multi-step deduction (AIME math, GPQA science, SWE-bench code, frontier ARC-AGI puzzles). They are the most prominent commercial application of the broader "test-time compute" principle.

Why it matters

Reasoning models inverted the assumption that "answer in one shot" was the only inference pattern. For complex work — debugging code across files, writing a multi-section research report, planning a multi-step agent workflow — they produce qualitatively different output than fast chat models. For routine work — extracting fields from a document, classifying support tickets, simple Q&A — they're slower and pricier without delivering benefit. The operational decision in 2026 is no longer "which one model do we use" but "which workloads warrant a reasoning model and which don't." That decision is the routing layer's job; the substrate that captures cost-per-task and quality-per-task lets that layer learn rather than guess.

Key components

  • Internal reasoning traces — structured thinking tokens emitted before the user-facing answer
  • RL post-training — the technique that produces reasoning behavior from a base model
  • Benchmark uplift — disproportionate gains on math, code, science, multi-step deduction
  • Cost and latency tradeoff — slower and pricier per answer, but higher quality on the right workloads
  • Routing implications — workload classification becomes a first-class operational concern

Need Help Implementing This?

We specialize in putting AI and Agentforce to work for Salesforce customers. Let's talk about your use case.

Book Intro Call