Human-in-the-Loop Patterns#

The most common failure mode in agent-driven work is not a wrong answer – it is a correct action taken without permission. An agent that deletes a file to “clean up,” force-pushes a branch to “fix history,” or restarts a service to “apply changes” can cause more damage in one unauthorized action than a dozen wrong answers.

Human-in-the-loop design is not about limiting agent capability. It is about matching autonomy to risk. Safe, reversible actions should proceed without interruption. Dangerous, irreversible actions should require explicit approval. The challenge is building this classification into the workflow without turning every action into a confirmation dialog.

The Risk Classification Framework#

Every action an agent takes falls on two axes: reversibility (can you undo it?) and blast radius (how much does it affect?).

                    High Blast Radius
                         │
         GATE            │         BLOCK
    (approve first)      │    (human does it)
                         │
  ───────────────────────┼───────────────────
                         │
         INFORM          │         GATE
    (notify after)       │    (approve first)
                         │
                    Low Blast Radius

     Reversible ─────────┼──────── Irreversible
QuadrantReversible + Low BlastReversible + High BlastIrreversible + Low BlastIrreversible + High Blast
ActionProceed freelyInform afterApprove beforeHuman executes
ExamplesEdit a file, run tests, read configsCreate a branch, install a dev dependencyDelete a file, drop an indexForce-push to main, drop a database, delete a production resource

Classification by Action Type#

Action CategoryDefault PolicyExamples
Read operationsAutonomousRead files, query APIs, list resources, search logs
Local file writesAutonomousEdit code, create files, write configs (all reversible via git)
Local commandsAutonomous (with limits)Run tests, build, lint, format. NOT: rm -rf, kill processes
Git operations (safe)AutonomousCommit, create branch, diff, log, status
Git operations (destructive)GateForce-push, reset –hard, rebase published, delete branch
External communicationGatePost PR comments, send messages, create issues
Infrastructure changesGateDeploy, apply Terraform, run migrations, modify DNS
Data deletionGate or BlockDrop tables, delete S3 objects, remove users
Security changesBlockModify IAM, change secrets, update firewall rules

Approval Gates#

An approval gate pauses execution and presents the proposed action to the human for explicit approval before proceeding.

What a Good Approval Request Contains#

┌─────────────────────────────────────────────────┐
│  Approval Required: Database Migration           │
│                                                  │
│  Action: Apply schema/0003-add-indexes.sql to    │
│          production D1 database                  │
│                                                  │
│  Changes:                                        │
│    - CREATE INDEX idx_articles_category           │
│    - CREATE INDEX idx_request_log_endpoint        │
│                                                  │
│  Risk: Low (additive only, no data changes)      │
│  Reversible: Yes (DROP INDEX to undo)            │
│                                                  │
│  [Approve]  [Reject]  [Show SQL]                 │
└─────────────────────────────────────────────────┘

A good approval request answers: What is the action? Why is it needed? What could go wrong? Can it be undone?

A bad approval request is: “Should I proceed? [Y/N]” – this tells the human nothing about what they are approving.

Batching Approvals#

Asking for approval on every file edit kills momentum. Batch related actions into a single approval:

BAD (approval fatigue):
  "Can I edit src/routes.ts?"    [Approve]
  "Can I edit src/db.ts?"        [Approve]
  "Can I edit src/middleware.ts?" [Approve]
  "Can I run the tests?"         [Approve]

GOOD (batched):
  "I am going to implement the search endpoint by editing
   3 files (routes.ts, db.ts, middleware.ts) and then run tests.
   Here is my plan: [plan details]

   Should I proceed with the implementation?"  [Approve]

The human approves the plan once. The agent executes all the safe, reversible steps without interruption. It pauses again only if something unexpected happens or if the next step is in a higher risk category.

Escalation Triggers#

Escalation is when the agent recognizes it should not proceed and asks the human for guidance. This is different from an approval gate (which is a scheduled checkpoint) – escalation is triggered by unexpected conditions.

When to Escalate#

TriggerExampleWhy Escalate
Ambiguous requirements“Make the API faster” – which endpoint? What is the target?The agent cannot choose correctly without clarification
Conflicting informationREADME says port 3000, docker-compose says 8080Both sources seem authoritative – human knows which is current
Unexpected stateUncommitted changes in the working directoryMay be the human’s in-progress work – do not overwrite
Multiple valid approachesRefactor with strategy pattern vs simple if/elseBoth are defensible – human has a preference
Error after retriesAPI call failed 3 times with different errorsSomething is fundamentally wrong; blind retrying will not help
Scope creep detectedFixing bug X requires changing module Y which breaks ZThe task is bigger than expected – human should decide scope
Permission boundaryTask requires access the agent does not haveDo not try to work around permissions
Safety concernProposed change would expose secrets in logsStop and flag, even if the user asked for it

How to Escalate Well#

GOOD escalation:
  "I found uncommitted changes in src/auth.ts that look like
   in-progress work (a half-written function on line 47).

   Options:
   1. Stash the changes and proceed with my modifications
   2. Work around the uncommitted code
   3. Wait for you to finish and commit first

   Which would you prefer?"

BAD escalation:
  "I encountered an issue. What should I do?"

Good escalation: describes what was found, explains why it matters, offers concrete options. Bad escalation: vague, puts all the cognitive load on the human.

Progressive Autonomy#

The right level of autonomy depends on trust, which builds over time. A new agent on an unfamiliar codebase should ask more questions. An agent that has been working on the same project for weeks should need fewer approvals.

Trust Levels#

LevelNameBehavior
1ObserverRead-only. Reports findings, suggests actions, never executes. Good for initial codebase exploration
2ProposerWrites plans and diffs but does not apply them. Human reviews and applies. Good for unfamiliar or high-risk codebases
3Gated ExecutorExecutes safe actions autonomously, gates on destructive/external actions. The default for most workflows
4AutonomousExecutes everything in scope, escalates only on ambiguity or unexpected errors. For well-understood, repetitive tasks
5DelegatorPlans, decomposes, and delegates to sub-agents. Manages workflows autonomously, escalates at phase boundaries

Most agent workflows should operate at level 3 (gated executor) by default and earn level 4-5 through demonstrated competence on the specific project.

Earning Autonomy#

Trust increases when the agent:

  • Completes tasks correctly without needing corrections
  • Asks good questions when it encounters ambiguity
  • Correctly identifies when to escalate vs proceed
  • Writes clear plans and checkpoint documents
  • Does not take unauthorized destructive actions

Trust decreases when the agent:

  • Makes incorrect assumptions and proceeds without checking
  • Overwrites uncommitted changes
  • Takes destructive actions without approval
  • Fails to flag risks or unexpected state
  • Produces work that requires significant correction

Designing Workflows with Human Checkpoints#

For multi-phase projects, plan human checkpoints at phase boundaries rather than at every step:

Phase 1: Research and Design     [AUTONOMOUS]
  Agent explores codebase, reads docs, proposes architecture

  ──── HUMAN CHECKPOINT: Review design ────
  Human reviews architecture proposal, gives feedback

Phase 2: Implementation          [GATED EXECUTOR]
  Agent implements with autonomy on file edits
  Gates on: new dependencies, config changes, schema changes

  ──── HUMAN CHECKPOINT: Review implementation ────
  Human reviews code, runs manual testing if needed

Phase 3: Testing                 [AUTONOMOUS]
  Agent writes and runs tests, fixes failures

  ──── HUMAN CHECKPOINT: Review test coverage ────
  Human verifies tests are meaningful

Phase 4: Deploy                  [HUMAN EXECUTES]
  Agent produces deployment plan and commands
  Human executes (or approves agent execution)

This structure gives the agent long stretches of autonomous work (where it is most productive) with human review at the points where it matters most (design decisions and deployment).

The “Measure Twice, Cut Once” Principle#

The cost of pausing to confirm is low – a few seconds of human attention. The cost of an unwanted destructive action is high – lost work, broken environments, unintended messages sent. When in doubt about whether an action needs approval, agents should default to asking.

This principle applies asymmetrically:

  • Reversible actions: proceed, inform later if notable
  • Irreversible actions: confirm first, always
  • Actions visible to others: confirm first (PR comments, Slack messages, emails)
  • Actions within a pre-approved scope: proceed (the human already approved the plan)
  • Actions outside the stated scope: escalate (the agent is going beyond what was asked)

The goal is not to eliminate all risk – it is to ensure that every significant decision has a human who consciously accepted it. Agents should be powerful within their authorized scope and disciplined about its boundaries.