Agent Context Preservation for Long-Running Workflows#

The context window is the single most important constraint in agent-driven work. A single-turn task uses a fraction of it. A multi-hour project fills it, overflows it, and degrades the agent’s reasoning quality long before the task is complete. Agents that work effectively on ambitious projects are not smarter – they manage context better.

This article covers practical, battle-tested patterns for preserving context across long sessions, delegating to sub-agents without losing coherence, and avoiding context pollution – the gradual degradation that happens when irrelevant information accumulates in the working context.

The Context Problem at Scale#

A typical agent context window holds 100K-200K tokens. That sounds like a lot. Here is how fast it fills during a real project:

Activity	Tokens Consumed
System prompt + instructions	2K-5K
CLAUDE.md / project instructions	1K-3K
Reading 5 source files	10K-30K
10 tool call results	15K-40K
Reasoning traces (agent thinking)	10K-20K per hour
Accumulated conversation history	5K-15K per hour

A 2-hour session generating code, reading files, and running tests easily consumes 100K+ tokens. At that point, automatic context compression kicks in – summarizing earlier parts of the conversation. Summaries lose detail. The agent forgets decisions made earlier. It re-reads files it already read. It contradicts plans it made an hour ago.

The solution is not a bigger context window. It is externalizing context to durable storage and loading only what is needed for the current step.

Pattern 1: Checkpoint Documents#

The most effective context preservation technique is writing structured checkpoint documents to the filesystem at key milestones. Each checkpoint captures the current state so that a new session – or a sub-agent – can resume without replaying the full history.

What a Checkpoint Contains#

# Checkpoint: Database Schema Complete
**Date**: 2026-02-22 14:30
**Phase**: 2 of 5 (Schema → API → Tests → Deploy → Document)

## Decisions Made
- PostgreSQL 15 with FTS5 for full-text search
- UUID primary keys (not auto-increment) for future distributed use
- JSON columns for flexible metadata (tags, skills, categories)
- Separate FTS virtual table with triggers to keep in sync

## Artifacts Produced
- `schema/0001-init.sql` — 6 tables, 2 indexes, 1 FTS virtual table, 3 triggers
- `docs/schema-decisions.md` — rationale for each design choice

## Open Questions
- [ ] Should we add a `version` column for optimistic locking?
- [ ] FTS5 tokenizer: default vs porter stemming?

## Next Steps
1. Build API routes (GET search, GET by ID, GET categories)
2. Add KV caching layer for search results
3. Add rate limiting via KV

When to Write Checkpoints#

Write a checkpoint whenever:

A distinct phase of work completes (design done, implementation done, tests passing)
You are about to start a different type of work (switching from coding to testing)
The context window is getting large (you notice the agent summarizing earlier turns)
Before delegating to a sub-agent (so the sub-agent has clean context)
Before ending a session that will be resumed later

Checkpoint as Resumption Point#

When a new session starts, the agent reads the latest checkpoint instead of replaying history:

Session start:
  1. Read CLAUDE.md (project conventions)
  2. Read PROGRESS.md or latest checkpoint (where we left off)
  3. Read TODO.md (what remains)
  4. Start working from current state -- no need to re-derive decisions

This costs 1K-3K tokens instead of the 50K+ tokens of replaying the original session.

Pattern 2: TODO Lists as State Machines#

A TODO list is not just a human convenience – it is a state machine that tracks workflow progress. Each item has a status (pending, in-progress, completed, blocked). The agent reads the list to know what to do next and updates it as work progresses.

# TODO

## Phase 1: Foundation [COMPLETE]
- [x] Set up project structure
- [x] Configure wrangler.jsonc with D1 and KV bindings
- [x] Design database schema
- [x] Run initial migration

## Phase 2: API Implementation [IN PROGRESS]
- [x] Health check endpoint
- [x] Search endpoint with FTS5
- [ ] Get article by ID endpoint  <-- CURRENT
- [ ] List categories endpoint
- [ ] Rate limiting middleware

## Phase 3: Testing [PENDING]
- [ ] Write integration tests for each endpoint
- [ ] Test rate limiting behavior
- [ ] Test FTS5 query edge cases

## Phase 4: Deploy [PENDING]
- [ ] Deploy to Cloudflare Workers
- [ ] Sync content to D1
- [ ] Verify endpoints on production

The agent checks the TODO at the start of each session, finds the first uncompleted item, and resumes there. When it finishes an item, it checks the box. When all items in a phase are complete, it moves to the next phase.

Why This Works Better Than Free-Form Plans#

Survives context overflow. The TODO file persists on disk. If context compresses, the agent re-reads the file.
Visible to humans. The user can see progress, reorder priorities, or add items at any time.
Resumable by any agent. A sub-agent or a new session reads the same file and knows exactly where things stand.
Prevents drift. The agent cannot silently change the plan. Any reprioritization is visible in the file diff.

Pattern 3: Sub-Agent Context Scoping with Spec Documents#

The biggest risk in multi-agent workflows is context pollution in the leader agent. If the leader reads every file, reviews every sub-agent’s output, and tracks every detail, its context window fills with information that is only relevant to one subtask. The leader’s reasoning about the big picture degrades.

The solution: give each sub-agent a scoped spec document that contains only what it needs, and have it return only a structured result.

The Spec Document Pattern#

# Spec: Implement Search Endpoint

## Context
We are building an API on Cloudflare Workers + D1. The database schema
is in `schema/0001-init.sql`. The Worker entry point is `src/index.ts`.

## Requirements
- GET /api/v1/knowledge/search?q=<query>&limit=<n>
- Use FTS5 full-text search via content_fts table
- Cache results in KV for 5 minutes (key: search:<query>:<limit>)
- Return JSON: { query, count, results: [...] }
- Handle missing `q` parameter with 400 error

## Constraints
- Follow existing code patterns in src/index.ts (see json() helper, cached() wrapper)
- Do not modify the database schema
- Do not add new dependencies

## Deliverable
- Updated src/index.ts with the search endpoint
- Report: what was implemented, any decisions made, any issues found

The sub-agent receives this spec, reads only the referenced files, implements the feature, and returns a report. It never sees the full project plan, other sub-agents’ work, or the leader’s reasoning history. Its context is clean and focused.

Leader-Sub-Agent Communication#

Leader context:
  - Project plan (TODO.md)
  - Latest checkpoint
  - High-level architecture decisions
  - Sub-agent specs (what was delegated)
  - Sub-agent results (summaries, not raw output)

Sub-agent context:
  - Its spec document (scoped requirements)
  - Files it needs to read/modify
  - Nothing else

The leader agent’s context stays small because it holds plans and summaries, not implementation details. The sub-agent’s context stays small because it holds only its scoped task. Neither pollutes the other.

What the Sub-Agent Returns#

The sub-agent should return a structured result, not a dump of everything it did:

## Result: Search Endpoint

**Status**: Complete
**Files modified**: src/index.ts (added search route handler, lines 95-140)
**Decisions**: Used FTS5 MATCH syntax with rank ordering. Capped limit at 50.
**Issues**: None
**Tests needed**: Search with empty query, search with no results, limit validation

The leader reads this 100-token summary instead of the sub-agent’s 20K-token execution trace.

Pattern 4: Persistent Context Files#

Some context should survive not just across sessions but across the lifetime of a project. These are conventions, preferences, and architectural decisions that every agent session needs.

CLAUDE.md (Project Instructions)#

A CLAUDE.md file at the project root is loaded automatically by Claude Code at the start of every session. It contains project conventions that never change within a session:

# Project: My API

## Stack
- Cloudflare Workers + D1 + KV
- TypeScript, single-file Worker pattern
- wrangler.jsonc for configuration

## Conventions
- Use prepared statements for all D1 queries
- Never store PII in request logs -- hash IPs with SHA-256
- All API responses use the json() helper for consistent CORS headers
- Rate limiting: 60 req/min per IP via KV

## Key Commands
- Build: `cd site && hugo`
- Deploy: `source ~/.claude/secrets/api.env && npx wrangler deploy`
- Sync DB: `npx tsx scripts/sync-content.ts > schema/content-sync.sql`

This costs 500-2K tokens of context and prevents the agent from rediscovering these facts every session.

MEMORY.md (Cross-Session Learning)#

A MEMORY.md file captures things the agent has learned that are not project conventions but are useful across sessions:

# Memory

## Platform
- Mac Mini M4 Pro (ARM64) -- QEMU cannot run Go binaries, must use native images
- Minikube with Docker driver, containers run natively on ARM64

## Gotchas Discovered
- Bitnami Helm charts name resources using the release name directly
- PostgreSQL 15+ changed default permissions -- must GRANT on public schema
- Mattermost has no ARM64 Docker image -- built custom from their binary tarball

## User Preferences
- Prefers plan-first approach with checkpoint documents
- Uses TODO.md for tracking progress
- Prefers parallel sub-agents with scoped specs over sequential execution

Skills Files (Reusable Agent Procedures)#

For complex, repeatable procedures, a skill file captures the exact steps so the agent does not re-derive them each time:

# Skill: Deploy Agent Zone

## Steps
1. Build Hugo site: `cd site && hugo`
2. Sync content: `cd api && npx tsx scripts/sync-content.ts > schema/content-sync.sql`
3. Execute sync: `source ~/.claude/secrets/agent-zone.env && npx wrangler d1 execute agent-zone-db --remote --file=schema/content-sync.sql`
4. Deploy Worker: `source ~/.claude/secrets/agent-zone.env && cd api && npx wrangler deploy`
5. Deploy Pages: `source ~/.claude/secrets/agent-zone.env && cd site && npx wrangler pages deploy public --project-name=agent-zone --commit-dirty=true`
6. Verify: `curl -s "https://api.agent-zone.ai/health" | jq .`

The agent invokes the skill instead of figuring out the deployment process from scratch. One file, loaded on demand, replaces thousands of tokens of re-derivation.

Pattern 5: Context Pollution Prevention#

Context pollution is the gradual accumulation of irrelevant information that degrades reasoning quality. It happens silently. The agent reads a 500-line file when it needs 10 lines. A tool returns 5K tokens of output when the agent needs one field. Research results from an earlier investigation sit in context long after they are useful.

Sources of Pollution#

Source	Tokens Wasted	Prevention
Reading entire files when only a section is needed	500-5,000 per file	Read with line offsets, or use grep to find the relevant section first
Tool results with verbose output	1,000-10,000 per call	Summarize or extract the key information immediately after the tool call
Research tangents that did not pan out	5,000-20,000	Delegate research to a sub-agent; only import the conclusion
Accumulated reasoning traces from earlier steps	10,000-50,000	Write checkpoints and let context compression handle the rest
Re-reading files already read in this session	500-5,000 per re-read	Summarize file contents on first read; reference the summary afterward

The Research Delegation Pattern#

The single most effective anti-pollution technique: delegate research to a sub-agent and import only the answer.

BAD (pollutes leader context):
  Leader reads file A (2K tokens)
  Leader reads file B (3K tokens)
  Leader reads file C (1K tokens)
  Leader reads file D (4K tokens)
  Leader concludes: "The auth module uses JWT with RS256"
  Total context cost: 10K tokens, of which 9.9K is now irrelevant

GOOD (clean leader context):
  Leader spawns research agent: "How does the auth module work?"
  Research agent reads files A, B, C, D (in its own context)
  Research agent returns: "Auth uses JWT with RS256, tokens validated in middleware/auth.ts,
    keys rotated via /api/rotate-keys endpoint"
  Total context cost: 200 tokens in leader's context

The research agent’s 10K tokens of file reading are confined to its own context window and discarded when it finishes. The leader gets a clean, dense summary.

The Summarize-Then-Discard Pattern#

When you must read a large file or get verbose tool output in your own context, summarize immediately:

1. Read the 500-line configuration file
2. Immediately extract: "Relevant settings: pool_size=20, max_overflow=10, timeout=30s"
3. The full file contents will be compressed/discarded by context management
4. The summary persists because it is recent

This front-loads the useful information so it survives context compression.

Zero-Cost Context Storage#

None of these patterns require paid services. The filesystem and git provide everything:

Storage Need	Zero-Cost Solution
Checkpoint documents	Markdown files in the project directory (`PROGRESS.md`, `checkpoints/`)
TODO / state tracking	`TODO.md` in the project root
Project conventions	`CLAUDE.md` or `.claude/CLAUDE.md`
Cross-session memory	`MEMORY.md` in `~/.claude/` or `.claude/`
Reusable procedures	Skill files in `.claude/skills/`
Sub-agent specs	Markdown files in a `specs/` or `tasks/` directory
Artifact storage	Git repository (committed files survive everything)
Version history	Git log (decisions are traceable through commit messages)

Git is the ultimate context preservation tool. Every committed file survives any context overflow, session restart, or agent swap. Commit early and often during long-running workflows. The commit message is itself a checkpoint: “Completed database schema with FTS5 search.”

Putting It Together: Multi-Hour Workflow Architecture#

For a project that takes 3-6 hours across multiple sessions:

Project root/
├── CLAUDE.md              # Permanent project conventions (loaded every session)
├── TODO.md                # State machine tracking overall progress
├── PROGRESS.md            # Latest checkpoint (where we left off)
├── design/
│   ├── requirements.md    # Original requirements (reference doc)
│   ├── architecture.md    # Architectural decisions (reference doc)
│   └── decisions-log.md   # Append-only log of key decisions with rationale
├── specs/
│   ├── search-endpoint.md # Scoped spec for sub-agent task
│   ├── auth-module.md     # Scoped spec for sub-agent task
│   └── deploy-pipeline.md # Scoped spec for sub-agent task
├── src/                   # Implementation (committed at each checkpoint)
└── .claude/
    ├── MEMORY.md          # Cross-session learnings
    └── skills/            # Reusable procedures

Session 1 (design):

Read requirements, discuss with user
Write architecture.md with key decisions
Write TODO.md with phased plan
Write PROGRESS.md checkpoint: “Design complete”
Commit everything

Session 2 (implementation, parallel sub-agents):

Read CLAUDE.md, PROGRESS.md, TODO.md – 3K tokens, full context restored
Write specs for 3 sub-agent tasks
Spawn sub-agents with scoped specs
Collect results, review, integrate
Update TODO.md, write new checkpoint
Commit

Session 3 (testing and deploy):

Read CLAUDE.md, PROGRESS.md, TODO.md – instant context restoration
Write and run tests
Deploy using the skill file
Final checkpoint: “Complete”

Each session starts with 3K tokens of context instead of replaying hours of history. Sub-agents operate in clean, scoped contexts. Nothing is lost between sessions because everything is on the filesystem and in git.

Key Principles#

Externalize aggressively. If information will be needed later, write it to a file. Do not rely on context window persistence.
Scope ruthlessly. Sub-agents get only what they need. The leader holds plans and summaries, not implementation details.
Commit as you go. Git commits are indestructible checkpoints. Commit at every meaningful milestone.
Summarize immediately. After reading a file or getting tool output, extract the useful parts before the raw data gets compressed away.
Design documents are cheap context. A 500-token spec file replaces 20K tokens of re-derivation. Write the spec.