Human-in-the-Loop Patterns#

The most common failure mode in agent-driven work is not a wrong answer – it is a correct action taken without permission. An agent that deletes a file to “clean up,” force-pushes a branch to “fix history,” or restarts a service to “apply changes” can cause more damage in one unauthorized action than a dozen wrong answers.

Human-in-the-loop design is not about limiting agent capability. It is about matching autonomy to risk. Safe, reversible actions should proceed without interruption. Dangerous, irreversible actions should require explicit approval. The challenge is building this classification into the workflow without turning every action into a confirmation dialog.

The Risk Classification Framework#

Every action an agent takes falls on two axes: reversibility (can you undo it?) and blast radius (how much does it affect?).

                    High Blast Radius
                         │
         GATE            │         BLOCK
    (approve first)      │    (human does it)
                         │
  ───────────────────────┼───────────────────
                         │
         INFORM          │         GATE
    (notify after)       │    (approve first)
                         │
                    Low Blast Radius

     Reversible ─────────┼──────── Irreversible
Quadrant Reversible + Low Blast Reversible + High Blast Irreversible + Low Blast Irreversible + High Blast
Action Proceed freely Inform after Approve before Human executes
Examples Edit a file, run tests, read configs Create a branch, install a dev dependency Delete a file, drop an index Force-push to main, drop a database, delete a production resource

Classification by Action Type#

Action Category Default Policy Examples
Read operations Autonomous Read files, query APIs, list resources, search logs
Local file writes Autonomous Edit code, create files, write configs (all reversible via git)
Local commands Autonomous (with limits) Run tests, build, lint, format. NOT: rm -rf, kill processes
Git operations (safe) Autonomous Commit, create branch, diff, log, status
Git operations (destructive) Gate Force-push, reset –hard, rebase published, delete branch
External communication Gate Post PR comments, send messages, create issues
Infrastructure changes Gate Deploy, apply Terraform, run migrations, modify DNS
Data deletion Gate or Block Drop tables, delete S3 objects, remove users
Security changes Block Modify IAM, change secrets, update firewall rules

Approval Gates#

An approval gate pauses execution and presents the proposed action to the human for explicit approval before proceeding.

What a Good Approval Request Contains#

┌─────────────────────────────────────────────────┐
│  Approval Required: Database Migration           │
│                                                  │
│  Action: Apply schema/0003-add-indexes.sql to    │
│          production D1 database                  │
│                                                  │
│  Changes:                                        │
│    - CREATE INDEX idx_articles_category           │
│    - CREATE INDEX idx_request_log_endpoint        │
│                                                  │
│  Risk: Low (additive only, no data changes)      │
│  Reversible: Yes (DROP INDEX to undo)            │
│                                                  │
│  [Approve]  [Reject]  [Show SQL]                 │
└─────────────────────────────────────────────────┘

A good approval request answers: What is the action? Why is it needed? What could go wrong? Can it be undone?

A bad approval request is: “Should I proceed? [Y/N]” – this tells the human nothing about what they are approving.

Batching Approvals#

Asking for approval on every file edit kills momentum. Batch related actions into a single approval:

BAD (approval fatigue):
  "Can I edit src/routes.ts?"    [Approve]
  "Can I edit src/db.ts?"        [Approve]
  "Can I edit src/middleware.ts?" [Approve]
  "Can I run the tests?"         [Approve]

GOOD (batched):
  "I am going to implement the search endpoint by editing
   3 files (routes.ts, db.ts, middleware.ts) and then run tests.
   Here is my plan: [plan details]

   Should I proceed with the implementation?"  [Approve]

The human approves the plan once. The agent executes all the safe, reversible steps without interruption. It pauses again only if something unexpected happens or if the next step is in a higher risk category.

Escalation Triggers#

Escalation is when the agent recognizes it should not proceed and asks the human for guidance. This is different from an approval gate (which is a scheduled checkpoint) – escalation is triggered by unexpected conditions.

When to Escalate#

Trigger Example Why Escalate
Ambiguous requirements “Make the API faster” – which endpoint? What is the target? The agent cannot choose correctly without clarification
Conflicting information README says port 3000, docker-compose says 8080 Both sources seem authoritative – human knows which is current
Unexpected state Uncommitted changes in the working directory May be the human’s in-progress work – do not overwrite
Multiple valid approaches Refactor with strategy pattern vs simple if/else Both are defensible – human has a preference
Error after retries API call failed 3 times with different errors Something is fundamentally wrong; blind retrying will not help
Scope creep detected Fixing bug X requires changing module Y which breaks Z The task is bigger than expected – human should decide scope
Permission boundary Task requires access the agent does not have Do not try to work around permissions
Safety concern Proposed change would expose secrets in logs Stop and flag, even if the user asked for it

How to Escalate Well#

GOOD escalation:
  "I found uncommitted changes in src/auth.ts that look like
   in-progress work (a half-written function on line 47).

   Options:
   1. Stash the changes and proceed with my modifications
   2. Work around the uncommitted code
   3. Wait for you to finish and commit first

   Which would you prefer?"

BAD escalation:
  "I encountered an issue. What should I do?"

Good escalation: describes what was found, explains why it matters, offers concrete options. Bad escalation: vague, puts all the cognitive load on the human.

Progressive Autonomy#

The right level of autonomy depends on trust, which builds over time. A new agent on an unfamiliar codebase should ask more questions. An agent that has been working on the same project for weeks should need fewer approvals.

Trust Levels#

Level Name Behavior
1 Observer Read-only. Reports findings, suggests actions, never executes. Good for initial codebase exploration
2 Proposer Writes plans and diffs but does not apply them. Human reviews and applies. Good for unfamiliar or high-risk codebases
3 Gated Executor Executes safe actions autonomously, gates on destructive/external actions. The default for most workflows
4 Autonomous Executes everything in scope, escalates only on ambiguity or unexpected errors. For well-understood, repetitive tasks
5 Delegator Plans, decomposes, and delegates to sub-agents. Manages workflows autonomously, escalates at phase boundaries

Most agent workflows should operate at level 3 (gated executor) by default and earn level 4-5 through demonstrated competence on the specific project.

Earning Autonomy#

Trust increases when the agent:

  • Completes tasks correctly without needing corrections
  • Asks good questions when it encounters ambiguity
  • Correctly identifies when to escalate vs proceed
  • Writes clear plans and checkpoint documents
  • Does not take unauthorized destructive actions

Trust decreases when the agent:

  • Makes incorrect assumptions and proceeds without checking
  • Overwrites uncommitted changes
  • Takes destructive actions without approval
  • Fails to flag risks or unexpected state
  • Produces work that requires significant correction

Designing Workflows with Human Checkpoints#

For multi-phase projects, plan human checkpoints at phase boundaries rather than at every step:

Phase 1: Research and Design     [AUTONOMOUS]
  Agent explores codebase, reads docs, proposes architecture

  ──── HUMAN CHECKPOINT: Review design ────
  Human reviews architecture proposal, gives feedback

Phase 2: Implementation          [GATED EXECUTOR]
  Agent implements with autonomy on file edits
  Gates on: new dependencies, config changes, schema changes

  ──── HUMAN CHECKPOINT: Review implementation ────
  Human reviews code, runs manual testing if needed

Phase 3: Testing                 [AUTONOMOUS]
  Agent writes and runs tests, fixes failures

  ──── HUMAN CHECKPOINT: Review test coverage ────
  Human verifies tests are meaningful

Phase 4: Deploy                  [HUMAN EXECUTES]
  Agent produces deployment plan and commands
  Human executes (or approves agent execution)

This structure gives the agent long stretches of autonomous work (where it is most productive) with human review at the points where it matters most (design decisions and deployment).

The “Measure Twice, Cut Once” Principle#

The cost of pausing to confirm is low – a few seconds of human attention. The cost of an unwanted destructive action is high – lost work, broken environments, unintended messages sent. When in doubt about whether an action needs approval, agents should default to asking.

This principle applies asymmetrically:

  • Reversible actions: proceed, inform later if notable
  • Irreversible actions: confirm first, always
  • Actions visible to others: confirm first (PR comments, Slack messages, emails)
  • Actions within a pre-approved scope: proceed (the human already approved the plan)
  • Actions outside the stated scope: escalate (the agent is going beyond what was asked)

The goal is not to eliminate all risk – it is to ensure that every significant decision has a human who consciously accepted it. Agents should be powerful within their authorized scope and disciplined about its boundaries.