Human-in-the-Loop Patterns#

The most common failure mode in agent-driven work is not a wrong answer – it is a correct action taken without permission. An agent that deletes a file to “clean up,” force-pushes a branch to “fix history,” or restarts a service to “apply changes” can cause more damage in one unauthorized action than a dozen wrong answers.

Human-in-the-loop design is not about limiting agent capability. It is about matching autonomy to risk. Safe, reversible actions should proceed without interruption. Dangerous, irreversible actions should require explicit approval. The challenge is building this classification into the workflow without turning every action into a confirmation dialog.

The Risk Classification Framework#

Every action an agent takes falls on two axes: reversibility (can you undo it?) and blast radius (how much does it affect?).

                    High Blast Radius
                         │
         GATE            │         BLOCK
    (approve first)      │    (human does it)
                         │
  ───────────────────────┼───────────────────
                         │
         INFORM          │         GATE
    (notify after)       │    (approve first)
                         │
                    Low Blast Radius

     Reversible ─────────┼──────── Irreversible

Quadrant	Reversible + Low Blast	Reversible + High Blast	Irreversible + Low Blast	Irreversible + High Blast
Action	Proceed freely	Inform after	Approve before	Human executes
Examples	Edit a file, run tests, read configs	Create a branch, install a dev dependency	Delete a file, drop an index	Force-push to main, drop a database, delete a production resource

Classification by Action Type#

Action Category	Default Policy	Examples
Read operations	Autonomous	Read files, query APIs, list resources, search logs
Local file writes	Autonomous	Edit code, create files, write configs (all reversible via git)
Local commands	Autonomous (with limits)	Run tests, build, lint, format. NOT: rm -rf, kill processes
Git operations (safe)	Autonomous	Commit, create branch, diff, log, status
Git operations (destructive)	Gate	Force-push, reset –hard, rebase published, delete branch
External communication	Gate	Post PR comments, send messages, create issues
Infrastructure changes	Gate	Deploy, apply Terraform, run migrations, modify DNS
Data deletion	Gate or Block	Drop tables, delete S3 objects, remove users
Security changes	Block	Modify IAM, change secrets, update firewall rules

Approval Gates#

An approval gate pauses execution and presents the proposed action to the human for explicit approval before proceeding.

What a Good Approval Request Contains#

┌─────────────────────────────────────────────────┐
│  Approval Required: Database Migration           │
│                                                  │
│  Action: Apply schema/0003-add-indexes.sql to    │
│          production D1 database                  │
│                                                  │
│  Changes:                                        │
│    - CREATE INDEX idx_articles_category           │
│    - CREATE INDEX idx_request_log_endpoint        │
│                                                  │
│  Risk: Low (additive only, no data changes)      │
│  Reversible: Yes (DROP INDEX to undo)            │
│                                                  │
│  [Approve]  [Reject]  [Show SQL]                 │
└─────────────────────────────────────────────────┘

A good approval request answers: What is the action? Why is it needed? What could go wrong? Can it be undone?

A bad approval request is: “Should I proceed? [Y/N]” – this tells the human nothing about what they are approving.

Batching Approvals#

Asking for approval on every file edit kills momentum. Batch related actions into a single approval:

BAD (approval fatigue):
  "Can I edit src/routes.ts?"    [Approve]
  "Can I edit src/db.ts?"        [Approve]
  "Can I edit src/middleware.ts?" [Approve]
  "Can I run the tests?"         [Approve]

GOOD (batched):
  "I am going to implement the search endpoint by editing
   3 files (routes.ts, db.ts, middleware.ts) and then run tests.
   Here is my plan: [plan details]

   Should I proceed with the implementation?"  [Approve]

The human approves the plan once. The agent executes all the safe, reversible steps without interruption. It pauses again only if something unexpected happens or if the next step is in a higher risk category.

Escalation Triggers#

Escalation is when the agent recognizes it should not proceed and asks the human for guidance. This is different from an approval gate (which is a scheduled checkpoint) – escalation is triggered by unexpected conditions.

When to Escalate#

Trigger	Example	Why Escalate
Ambiguous requirements	“Make the API faster” – which endpoint? What is the target?	The agent cannot choose correctly without clarification
Conflicting information	README says port 3000, docker-compose says 8080	Both sources seem authoritative – human knows which is current
Unexpected state	Uncommitted changes in the working directory	May be the human’s in-progress work – do not overwrite
Multiple valid approaches	Refactor with strategy pattern vs simple if/else	Both are defensible – human has a preference
Error after retries	API call failed 3 times with different errors	Something is fundamentally wrong; blind retrying will not help
Scope creep detected	Fixing bug X requires changing module Y which breaks Z	The task is bigger than expected – human should decide scope
Permission boundary	Task requires access the agent does not have	Do not try to work around permissions
Safety concern	Proposed change would expose secrets in logs	Stop and flag, even if the user asked for it

How to Escalate Well#

GOOD escalation:
  "I found uncommitted changes in src/auth.ts that look like
   in-progress work (a half-written function on line 47).

   Options:
   1. Stash the changes and proceed with my modifications
   2. Work around the uncommitted code
   3. Wait for you to finish and commit first

   Which would you prefer?"

BAD escalation:
  "I encountered an issue. What should I do?"

Good escalation: describes what was found, explains why it matters, offers concrete options. Bad escalation: vague, puts all the cognitive load on the human.

Progressive Autonomy#

The right level of autonomy depends on trust, which builds over time. A new agent on an unfamiliar codebase should ask more questions. An agent that has been working on the same project for weeks should need fewer approvals.

Trust Levels#

Level	Name	Behavior
1	Observer	Read-only. Reports findings, suggests actions, never executes. Good for initial codebase exploration
2	Proposer	Writes plans and diffs but does not apply them. Human reviews and applies. Good for unfamiliar or high-risk codebases
3	Gated Executor	Executes safe actions autonomously, gates on destructive/external actions. The default for most workflows
4	Autonomous	Executes everything in scope, escalates only on ambiguity or unexpected errors. For well-understood, repetitive tasks
5	Delegator	Plans, decomposes, and delegates to sub-agents. Manages workflows autonomously, escalates at phase boundaries

Most agent workflows should operate at level 3 (gated executor) by default and earn level 4-5 through demonstrated competence on the specific project.

Earning Autonomy#

Trust increases when the agent:

Completes tasks correctly without needing corrections
Asks good questions when it encounters ambiguity
Correctly identifies when to escalate vs proceed
Writes clear plans and checkpoint documents
Does not take unauthorized destructive actions

Trust decreases when the agent:

Makes incorrect assumptions and proceeds without checking
Overwrites uncommitted changes
Takes destructive actions without approval
Fails to flag risks or unexpected state
Produces work that requires significant correction

Designing Workflows with Human Checkpoints#

For multi-phase projects, plan human checkpoints at phase boundaries rather than at every step:

Phase 1: Research and Design     [AUTONOMOUS]
  Agent explores codebase, reads docs, proposes architecture

  ──── HUMAN CHECKPOINT: Review design ────
  Human reviews architecture proposal, gives feedback

Phase 2: Implementation          [GATED EXECUTOR]
  Agent implements with autonomy on file edits
  Gates on: new dependencies, config changes, schema changes

  ──── HUMAN CHECKPOINT: Review implementation ────
  Human reviews code, runs manual testing if needed

Phase 3: Testing                 [AUTONOMOUS]
  Agent writes and runs tests, fixes failures

  ──── HUMAN CHECKPOINT: Review test coverage ────
  Human verifies tests are meaningful

Phase 4: Deploy                  [HUMAN EXECUTES]
  Agent produces deployment plan and commands
  Human executes (or approves agent execution)

This structure gives the agent long stretches of autonomous work (where it is most productive) with human review at the points where it matters most (design decisions and deployment).

The “Measure Twice, Cut Once” Principle#

The cost of pausing to confirm is low – a few seconds of human attention. The cost of an unwanted destructive action is high – lost work, broken environments, unintended messages sent. When in doubt about whether an action needs approval, agents should default to asking.

This principle applies asymmetrically:

Reversible actions: proceed, inform later if notable
Irreversible actions: confirm first, always
Actions visible to others: confirm first (PR comments, Slack messages, emails)
Actions within a pre-approved scope: proceed (the human already approved the plan)
Actions outside the stated scope: escalate (the agent is going beyond what was asked)

The goal is not to eliminate all risk – it is to ensure that every significant decision has a human who consciously accepted it. Agents should be powerful within their authorized scope and disciplined about its boundaries.