"What Exactly Is an AI Agent?"

Humans carry decades of accumulated knowledge and experience into every conversation. We remember what we discussed with someone last week. We know our own preferences, habits, and history. We have short-term working memory for the current context and long-term semantic memory for general knowledge.

LLMs, by default, have none of this. Every API call starts fresh. The model that helped you debug code yesterday has no memory of it today. For a chatbot, this is an inconvenience. For an agent operating over days or weeks on complex tasks, it's a fundamental architectural problem.

Memory is what separates a capable agent from a reliable one.

## The Four Types of Agent Memory

Agent memory researchers generally distinguish four categories, each with different storage mechanisms and access patterns:

### 1. In-Context Memory (Working Memory)

This is the simplest form: everything in the model's current context window. System prompt, conversation history, tool call results, reasoning traces — all of it is visible to the model simultaneously during a single run.

In-context memory is perfect for within-session continuity. The agent remembers what it reasoned two steps ago because that reasoning is literally present in the input. The limitation is size. Current top models handle 128K to 1M token context windows — impressive, but finite. A long task with many tool calls, large document reads, and extended reasoning can saturate even million-token contexts.

The practical implication: in-context memory must be actively managed. Long conversations need summarization. Redundant tool call results need pruning. Raw file contents need replacement with extracted key facts. These management decisions are often handled automatically by agent frameworks, but they introduce information loss — what gets pruned affects what the agent remembers.

### 2. External Retrieval Memory (Long-Term Semantic Memory)

For knowledge that doesn't fit in context, the standard solution is **Retrieval-Augmented Generation (RAG)**: storing information in a vector database and retrieving the most relevant pieces at query time.

The process works like this: documents are split into chunks, each chunk is converted to a dense vector embedding (a numerical representation of its semantic meaning), and stored in a vector index. When the agent needs information, it embeds the query, searches the index for the nearest vectors (most semantically similar chunks), and injects the retrieved text into context.

RAG enables agents to work with corpora far too large for any context window — internal company documents, research paper archives, codebase knowledge bases, customer history logs. A coding agent can retrieve relevant function definitions from a million-line codebase. A research agent can pull from thousands of paper abstracts.

The failure modes of RAG are specific and well-documented. **Retrieval quality** degrades when queries are vague, when relevant information is spread across many chunks, or when the embedding model doesn't capture domain-specific semantics well. **Chunk size** is a critical tuning parameter — too small and context is lost; too large and irrelevant information floods in. **Recency bias** is a problem for frequently updated information; vector indexes don't automatically update when source documents change.

### 3. Episodic Memory (Past Interaction History)

Episodic memory stores *what happened in past sessions* — not general knowledge, but specific events. "Last Tuesday, the user asked me to analyze their Q1 sales data. I found three anomalies. They asked me to investigate anomaly #2 further and set a reminder for next Monday."

Storing this requires a structured database, not just a vector index. Key fields typically include: session ID, timestamp, task description, outcome, user feedback, important observations. When a new session starts, the agent retrieves relevant past episodes (using semantic search over descriptions, or direct lookup by task type) and loads them into context.

Episodic memory is what allows an agent to be genuinely *personalized*. It's the difference between an assistant that needs to be told your preferences every time and one that already knows them. It's also the memory type most agents currently lack — most production agents have no episodic storage and start fresh each session.

### 4. Procedural Memory (Learned Behaviors)

Procedural memory is the most complex and least standardized: stored instructions, workflows, and policies that shape how the agent behaves independent of any specific task.

This can be as simple as a persistent system prompt that grows over time as the user adds preferences ("always use metric units," "prefer Python over JavaScript"). Or as complex as a structured workflow store where the agent retrieves multi-step procedures for recognized task types ("when asked to write a blog post, follow this outline sequence").

Some agent frameworks are experimenting with allowing agents to *update their own procedural memory* based on experience — a primitive form of learning. The agent notices it repeatedly makes the same mistake, writes a corrective rule, and stores it for future runs. This is promising but fragile; poorly designed self-updating memory can accumulate contradictory rules or loop into recursive self-modification.

## Memory Management in Practice

Real agent systems don't implement all four memory types equally. The choices depend on use case:

- A **customer support agent** needs episodic memory (customer history) and external retrieval (product documentation), but limited procedural memory.
- A **coding agent** needs strong retrieval (codebase knowledge), good working memory management (long files), and procedural memory (coding style guidelines). Episodic memory is less critical.
- A **personal assistant agent** benefits most from all four types — especially episodic memory and growing procedural preferences.

The current state of the art (mid-2025) is that external retrieval is mature and widely deployed, in-context management is well-understood, episodic memory is available in some products (ChatGPT's persistent memory, Claude's Projects feature), and procedural self-update remains largely research-stage.

Memory is what makes agents usable over time. But for many tasks — ones too complex for a single agent operating alone — the bottleneck isn't memory. It's scope. The answer to that is multi-agent systems: specialized AIs working in concert, which is exactly where we're headed next.

"Memory Architecture: How Agents Remember and Learn"

// COMMENTS

ON THIS PAGE