Agent Context Management#
Agents are stateless by default. Every new session starts with a blank slate – no knowledge of previous conversations, past mistakes, or learned preferences. This is the fundamental problem of agent context management: how do you give an agent continuity without overwhelming its context window?
Types of Agent Memory#
Agent memory falls into four categories:
- Short-term memory: The current conversation. Lives in the context window, disappears when the session ends.
- Long-term memory: Facts persisted across sessions. “The production cluster runs Kubernetes 1.29.” Must be explicitly stored and retrieved.
- Episodic memory: Records of specific past events. “On Feb 15, we debugged a DNS failure caused by a misconfigured service name.” Useful for avoiding repeated mistakes.
- Semantic memory: General knowledge distilled from episodes. “Bitnami charts name resources using the release name directly.”
Most systems only implement short-term and long-term. Episodic and semantic memory require more infrastructure but provide significantly better performance over time.
File-Based Memory: The MEMORY.md Pattern#
The simplest memory system is a markdown file the agent reads at session start and updates as it learns. Claude Code uses this approach with MEMORY.md files stored in ~/.claude/ and project-level .claude/ directories.
# Memory
## Project: API Service
- Framework: FastAPI with SQLAlchemy
- Database: PostgreSQL 15 on RDS
- Deploy: ECS Fargate, Terraform-managed
- Tests: pytest, run with `make test`
- The health check endpoint is /api/v1/health, NOT /health
## Preferences
- Always use absolute imports
- Error responses follow RFC 7807 (Problem Details)
- Never add print statements; use structured logging with structlogThis pattern has real advantages. The memory is human-readable, version-controlled, and easy to audit. The agent reads it at the start of each session and has immediate context about the project.
The downside is scale. A MEMORY.md file works for tens of facts. At hundreds, it becomes noisy. The agent spends context window tokens reading things that are not relevant to the current task.
Key-Value Memory#
For structured facts, a key-value store scales better than free-form text. Each fact gets a key for direct retrieval:
{
"project.framework": "FastAPI",
"project.database.type": "PostgreSQL",
"project.database.version": "15",
"project.deploy.platform": "ECS Fargate",
"user.preference.imports": "absolute",
"user.preference.error_format": "RFC 7807"
}The agent queries for specific keys (project.database.*) instead of reading everything. The tradeoff: you lose the narrative context that makes MEMORY.md easy for humans to maintain.
Vector/Embedding Memory#
When the agent needs relevant past context without knowing the exact key, vector search works. Past interactions are embedded and stored in a vector database. At query time, the agent embeds the current task and retrieves the most similar entries:
Current task: "Fix the database connection timeout in production"
Retrieved memories (by similarity):
1. "Feb 15: Production DB connections exhausted — pool_size was 5, increased to 20"
2. "Jan 28: SQLAlchemy pool_pre_ping=True prevents stale connections after RDS maintenance"
3. "Connection string format: postgresql+asyncpg://user:pass@host:5432/dbname"This is RAG applied to agent memory. It scales to thousands of entries but requires an embedding model and a vector store.
Context Window Management#
Every agent has a finite context window. When memory exceeds it, prioritize:
- System instructions – always included, non-negotiable
- Current task context – the user’s request and referenced files
- Active working state – tool results, intermediate outputs from this session
- Retrieved long-term memory – most relevant persisted facts
- Recent session history – what happened earlier in this conversation
- Background knowledge – general project info that might be useful
Trim from the bottom up. Background knowledge goes first. Old conversation turns get summarized or dropped. Retrieved memories get capped at the top-k most relevant.
Session Handoff#
When one agent session ends and another begins – or when one agent hands off to a different agent – context must transfer. Three patterns work:
Structured summary: The outgoing agent writes a summary of what it did, what it learned, and what remains. The incoming agent reads this as its starting context.
## Session Summary (2026-02-21 14:30)
- Task: Debug production connection timeouts
- Root cause: Connection pool exhausted (pool_size=5, max_overflow=0)
- Fix applied: Updated pool_size to 20, max_overflow to 10 in terraform/modules/rds/variables.tf
- Remaining: Deploy change to production (terraform apply pending approval)Shared state file: Both agents read and write to a common state file. Works for multi-agent systems where agents collaborate asynchronously.
Message passing: The outgoing agent sends a structured message with task state, decisions made, and open questions. This is the pattern used in multi-agent frameworks with explicit handoff protocols.
Memory Decay#
Not everything should be remembered forever. A debugging session from six months ago is rarely relevant. Memory decay prevents noise from accumulating.
- TTL (Time-to-Live): Memories expire after a set period. Episodic memories might expire after 30 days. Project facts persist indefinitely.
- Relevance scoring: Track how often a memory is retrieved. Memories that are never accessed decay in priority.
- Explicit pruning: Periodically review stored memories and remove outdated ones. “The staging cluster runs Kubernetes 1.27” is wrong if you upgraded to 1.29.
Common Mistakes#
Storing too much: Every conversation turn, every tool result, every file read. The memory fills with noise and retrieval degrades. Store conclusions and decisions, not raw data.
Storing too little: Only keeping what the user explicitly asks to remember. The agent misses learnable patterns – recurring errors, preferred approaches, project conventions.
No organization: Dumping everything into a flat list. Without categories or keys, retrieval becomes a search through noise. Structure your memory from the start, even if it is just section headers in a markdown file.
Ignoring privacy: Storing API keys or personal information in plain-text memory. Enforce rules about what is never persisted programmatically, not by convention.