Agent Context Management#

Agents are stateless by default. Every new session starts with a blank slate – no knowledge of previous conversations, past mistakes, or learned preferences. This is the fundamental problem of agent context management: how do you give an agent continuity without overwhelming its context window?

Types of Agent Memory#

Agent memory falls into four categories:

  • Short-term memory: The current conversation. Lives in the context window, disappears when the session ends.
  • Long-term memory: Facts persisted across sessions. “The production cluster runs Kubernetes 1.29.” Must be explicitly stored and retrieved.
  • Episodic memory: Records of specific past events. “On Feb 15, we debugged a DNS failure caused by a misconfigured service name.” Useful for avoiding repeated mistakes.
  • Semantic memory: General knowledge distilled from episodes. “Bitnami charts name resources using the release name directly.”

Most systems only implement short-term and long-term. Episodic and semantic memory require more infrastructure but provide significantly better performance over time.

File-Based Memory: The MEMORY.md Pattern#

The simplest memory system is a markdown file the agent reads at session start and updates as it learns. Claude Code uses this approach with MEMORY.md files stored in ~/.claude/ and project-level .claude/ directories.

# Memory

## Project: API Service
- Framework: FastAPI with SQLAlchemy
- Database: PostgreSQL 15 on RDS
- Deploy: ECS Fargate, Terraform-managed
- Tests: pytest, run with `make test`
- The health check endpoint is /api/v1/health, NOT /health

## Preferences
- Always use absolute imports
- Error responses follow RFC 7807 (Problem Details)
- Never add print statements; use structured logging with structlog

This pattern has real advantages. The memory is human-readable, version-controlled, and easy to audit. The agent reads it at the start of each session and has immediate context about the project.

The downside is scale. A MEMORY.md file works for tens of facts. At hundreds, it becomes noisy. The agent spends context window tokens reading things that are not relevant to the current task.

Key-Value Memory#

For structured facts, a key-value store scales better than free-form text. Each fact gets a key for direct retrieval:

{
  "project.framework": "FastAPI",
  "project.database.type": "PostgreSQL",
  "project.database.version": "15",
  "project.deploy.platform": "ECS Fargate",
  "user.preference.imports": "absolute",
  "user.preference.error_format": "RFC 7807"
}

The agent queries for specific keys (project.database.*) instead of reading everything. The tradeoff: you lose the narrative context that makes MEMORY.md easy for humans to maintain.

Vector/Embedding Memory#

When the agent needs relevant past context without knowing the exact key, vector search works. Past interactions are embedded and stored in a vector database. At query time, the agent embeds the current task and retrieves the most similar entries:

Current task: "Fix the database connection timeout in production"

Retrieved memories (by similarity):
1. "Feb 15: Production DB connections exhausted — pool_size was 5, increased to 20"
2. "Jan 28: SQLAlchemy pool_pre_ping=True prevents stale connections after RDS maintenance"
3. "Connection string format: postgresql+asyncpg://user:pass@host:5432/dbname"

This is RAG applied to agent memory. It scales to thousands of entries but requires an embedding model and a vector store.

Context Window Management#

Every agent has a finite context window. When memory exceeds it, prioritize:

  1. System instructions – always included, non-negotiable
  2. Current task context – the user’s request and referenced files
  3. Active working state – tool results, intermediate outputs from this session
  4. Retrieved long-term memory – most relevant persisted facts
  5. Recent session history – what happened earlier in this conversation
  6. Background knowledge – general project info that might be useful

Trim from the bottom up. Background knowledge goes first. Old conversation turns get summarized or dropped. Retrieved memories get capped at the top-k most relevant.

Session Handoff#

When one agent session ends and another begins – or when one agent hands off to a different agent – context must transfer. Three patterns work:

Structured summary: The outgoing agent writes a summary of what it did, what it learned, and what remains. The incoming agent reads this as its starting context.

## Session Summary (2026-02-21 14:30)
- Task: Debug production connection timeouts
- Root cause: Connection pool exhausted (pool_size=5, max_overflow=0)
- Fix applied: Updated pool_size to 20, max_overflow to 10 in terraform/modules/rds/variables.tf
- Remaining: Deploy change to production (terraform apply pending approval)

Shared state file: Both agents read and write to a common state file. Works for multi-agent systems where agents collaborate asynchronously.

Message passing: The outgoing agent sends a structured message with task state, decisions made, and open questions. This is the pattern used in multi-agent frameworks with explicit handoff protocols.

Memory Decay#

Not everything should be remembered forever. A debugging session from six months ago is rarely relevant. Memory decay prevents noise from accumulating.

  • TTL (Time-to-Live): Memories expire after a set period. Episodic memories might expire after 30 days. Project facts persist indefinitely.
  • Relevance scoring: Track how often a memory is retrieved. Memories that are never accessed decay in priority.
  • Explicit pruning: Periodically review stored memories and remove outdated ones. “The staging cluster runs Kubernetes 1.27” is wrong if you upgraded to 1.29.

Common Mistakes#

Storing too much: Every conversation turn, every tool result, every file read. The memory fills with noise and retrieval degrades. Store conclusions and decisions, not raw data.

Storing too little: Only keeping what the user explicitly asks to remember. The agent misses learnable patterns – recurring errors, preferred approaches, project conventions.

No organization: Dumping everything into a flat list. Without categories or keys, retrieval becomes a search through noise. Structure your memory from the start, even if it is just section headers in a markdown file.

Ignoring privacy: Storing API keys or personal information in plain-text memory. Enforce rules about what is never persisted programmatically, not by convention.