"LLM vs. Agent: The Difference That Changes Everything"

Open ChatGPT and type a question. Within seconds, you get an answer. The model reads your input, generates a response, and stops. The interaction is over. There's no plan, no follow-through, no memory of what you asked yesterday. It's stateless, single-turn, reactive.

Now consider a different scenario. You tell an AI: *"Research the top five competitors of Notion, write a comparison table, search for their latest pricing pages, identify which one has the best free tier, and send me a summary by email."*

A plain LLM would do its best to answer from training data — probably outdated, definitely not actually visiting any websites, and definitely not sending any emails. An AI agent, by contrast, would *plan* that task into steps, *execute* each step using real tools, *observe* the results, *adjust* if something fails, and *complete* the goal with actions that have real-world effects.

The difference isn't raw intelligence. It's architecture.

## What an LLM Actually Is

A large language model is a function. Given a sequence of tokens, it predicts the most probable next token — over and over until it decides to stop. That's the entirety of what happens during inference. The model doesn't "think" in the human sense. It maps input to output using billions of learned parameters, shaped by training on text from the internet, books, and code.

This is genuinely powerful. LLMs can write, reason, translate, summarize, and generate code that actually runs. But they have hard limits baked into this architecture:

- **No persistent state**: Each API call starts fresh. The model doesn't remember previous conversations unless the entire history is included in the current prompt. This becomes expensive and eventually impossible as history grows.
- **No ability to act**: An LLM can *describe* how to book a flight, but it can't actually click a button, call an API, or write to a file. Its outputs are text — nothing more.
- **Knowledge cutoff**: Training data has a cutoff date. Anything that happened after that date is unknown to the model unless provided in the prompt.
- **Single-pass reasoning**: Except for techniques like chain-of-thought prompting, a standard LLM produces an answer in one forward pass. It can't go back, check its work against external reality, or iterate on failure.

## What an Agent Adds

An AI agent wraps an LLM inside a larger loop that adds three missing capabilities:

**1. Planning.** Instead of responding directly to user input, the agent first decomposes the goal into a sequence of subtasks. Modern agents use prompting strategies — most notably the ReAct framework — that interleave reasoning steps ("I should search for competitor pricing first") with action steps ("call the search tool with query: 'Notion competitor pricing 2025'").

**2. Tool use.** The agent has access to a set of functions it can call: web search, code execution, file read/write, calendar access, email, database queries, API calls. When the LLM decides a tool is needed, it outputs a structured call spec (typically in JSON), which the agent runtime executes in the real world and feeds the result back to the LLM.

**3. Observation and iteration.** After each action, the result becomes part of the LLM's context. If a web search returns nothing useful, the agent can try a different query. If code throws an error, the agent can read the traceback, reason about the bug, and rewrite the function. This observe-reason-act loop can repeat many times before the agent concludes it's done.

## The Scaffolding Around the Model

The LLM itself hasn't changed. What's changed is everything *around* it.

An agent framework like LangChain, AutoGen, or the emerging agent runtime built into models like Claude and Gemini 2.0 provides:

- A **system prompt** that defines the agent's role, available tools, and behavioral constraints
- A **tool registry** with schemas the LLM can read to understand what each tool does and what parameters it expects
- An **execution loop** that interprets LLM outputs, runs tool calls, handles errors, and re-prompts with results
- An **output parser** that extracts structured decisions from the LLM's text generation

The LLM becomes the reasoning engine; the framework becomes the body that lets it act.

## Why This Changes Everything

The gap between LLM and agent isn't just technical — it's economic. A model that can only answer questions is a better search engine. A model that can autonomously execute multi-step workflows is closer to an employee. The same underlying intelligence, directed differently, produces dramatically different utility.

This explains why every major AI lab is racing toward agentic capabilities. OpenAI's Operator, Anthropic's Computer Use API, Google's Project Mariner, and dozens of startups (Devin, Cursor, Lindy, Imbue) are all building variations on the same insight: the next step beyond "AI that talks" is "AI that does."

And doing turns out to be hard in ways that talking isn't. Which is exactly what the rest of this series is about.

"LLM vs. Agent: The Difference That Changes Everything"

// COMMENTS

ON THIS PAGE