"The ReAct Loop: How Agents Think Before They Act"

In 2022, a paper from Google and Princeton introduced a deceptively simple idea: what if language models alternated between *reasoning* and *acting*, rather than producing a single output? The paper called the approach **ReAct** — short for Reason + Act — and it became the architectural backbone of nearly every practical AI agent built since.

Understanding ReAct means understanding how agents actually think.

## The Problem with Pure Reasoning

Chain-of-thought prompting was a major breakthrough. By asking models to "think step by step," researchers found that LLMs made significantly fewer errors on complex tasks. The model would reason through sub-problems before arriving at an answer. For math problems, logical puzzles, and multi-step QA, it helped dramatically.

But pure reasoning has a ceiling. The model can only reason about what's already in its context window. If the correct answer depends on information it wasn't trained on, or requires executing something in the real world, no amount of chain-of-thought will get it there. The reasoning happens in a vacuum.

The opposite extreme — "act first, reason later" — was also insufficient. Models given tool access but no structured reasoning tended to call tools randomly, fail to recover from errors, and produce outputs with no traceable logic. You couldn't debug them or trust them.

ReAct solved both problems by combining the two.

## The ReAct Trace

A ReAct agent generates its output in a structured format that alternates between three types of text:

**Thought**: The model's internal reasoning about what to do next. Not shown to the user — it's reasoning scratch space. Example: *"The user wants the current CEO of OpenAI. I should search for this rather than relying on training data, which may be outdated."*

**Action**: A structured tool call. The model outputs a specific action name and its parameters. Example: `Action: search("current CEO of OpenAI 2025")`

**Observation**: The result returned by the tool, injected back into the context by the agent runtime. Example: `Observation: Sam Altman is the CEO of OpenAI as of 2025.`

This Thought → Action → Observation sequence repeats until the agent reaches a final answer. Each iteration gives the model new information to reason about. Crucially, the model can *read its own previous reasoning* — the entire trace is part of the context — so it can adjust strategy mid-task.

## A Full Trace Example

Here's a condensed example of a ReAct trace for the task: *"What is the market cap of the company that makes the iPhone?"*

```
Thought: I need to find Apple's current market cap. Market caps change daily, so I should search for the latest figure.

Action: search("Apple Inc market cap 2025")

Observation: Apple Inc (AAPL) market cap is approximately $3.2 trillion as of April 2025.

Thought: I now have the information I need. Apple makes the iPhone, and its current market cap is ~$3.2 trillion.

Final Answer: Apple Inc, which makes the iPhone, has a market cap of approximately $3.2 trillion as of April 2025.
```

Short task, simple trace. But the same pattern scales to tasks with 30+ steps, multiple tool types, and branching recovery from errors.

## How Recovery Works

One of the most powerful aspects of the ReAct loop is its natural support for error handling. If a tool call fails, the observation contains the error message. The model reads it, reasons about what went wrong, and tries a different approach.

```
Action: get_webpage("https://example.com/api/data")

Observation: Error 403: Access forbidden.

Thought: Direct access is blocked. I should try a cached version or search for the data through a different source.

Action: search("example.com data site:web.archive.org")
```

A human debugger would recognize this behavior immediately — it's how good programmers work. The model doesn't give up on an error; it uses the error as information. This self-correcting behavior emerges from the loop structure, not from any special training for error handling.

## Limitations of ReAct

ReAct is powerful but not magic. Several failure modes appear consistently in practice:

**Reasoning hallucination.** The model's "Thought" steps can be confident and coherent but wrong. If the model reasons incorrectly about what tool to call or what the observation means, subsequent steps compound the error. Unlike human reasoning, there's no internal checkpoint that flags "this thought is factually suspect."

**Context window overflow.** Long tasks generate long traces. Eventually the trace length approaches the model's context limit. Older parts of the reasoning history get truncated, causing the model to repeat actions or lose track of what it already tried.

**Tool misspecification.** If the agent calls a tool with incorrect parameters — misinterpreting what format an argument expects — the tool fails. Models vary widely in how gracefully they recover from ambiguous tool schemas.

**Looping.** Without explicit termination conditions, agents occasionally enter loops — retrying the same failed action, searching the same query, or reasoning in circles. Production agent frameworks add loop detection and maximum step limits as safeguards.

## What Came After ReAct

ReAct is still the dominant pattern, but newer approaches are layering on top of it. **Tree of Thoughts** explores multiple reasoning branches in parallel and picks the most promising one — better for problems with many possible solution paths. **Reflexion** adds a retrospective step where the agent evaluates its completed trace and stores lessons for future runs. **LATS** (LLM-powered Monte Carlo Tree Search) applies game-tree search logic to agent planning.

These approaches all share ReAct's core insight: interleaving thought and action is more powerful than separating them. The variations are about how to search the space of possible plans, how to evaluate quality, and how to retain what was learned.

The reason ReAct remained dominant despite these advances is practical: it's simple enough to implement reliably and debug when it goes wrong. Complexity has a cost in agent systems — every additional layer is another surface for failure. In the next chapter, we look at what the "act" part of ReAct actually means: tool use, the mechanism by which agents reach into the real world.

"The ReAct Loop: How Agents Think Before They Act"

// COMMENTS

ON THIS PAGE