Designing Human Control for AI Agents

Here’s what is would look like.

Current agent UIs mostly offer transcripts. A wall of text showing what happened, in the order it happened. That's forensic, not operational. It tells you what went wrong after it went wrong. Legibility means understanding agent state in real time, with the right abstraction level for the decision you need to make.

What does that look like? A designed state view that shows: current goal, active sub-tasks, agents in flight, key decisions made so far, next intended action. Not a raw log, a curated signal. The designer's job here is deciding what to surface and what to suppress, based on what actually matters for human judgment.

Interrupt is the ability to pause, redirect, or stop agent action at a meaningful granularity. Not just kill-switch. Kill-switch is easy. What's hard is designing pause at the right level: stopping one sub-agent without killing the whole task, redirecting a thread while preserving progress elsewhere, asking the agent to hold at a specific decision point instead of barreling through.

This requires thinking about agent actions the way we think about undo in software. Not just "stop everything" but "undo this action," "pause before that decision," "ask me before proceeding." Interrupt as a designed experience means specifying what the intervention points are and what pausing at each one looks like for the user.

The agent that refused to shut itself down was operating in a system where the only interrupt available was a full-stop command. There was no "pause and ask me why you're resisting." There was no designed moment for human judgment to enter the loop. The system had only two states: running and killed. That's a design failure.

Constraint is the ability to define what the agent cannot do, independent of what it's asked to do. Not just in the initial prompt. Persistently, editably, with transparency about what constraints are active.

Most current constraint models are embedded in the system prompt. The user doesn't see them. They can't modify them. They may not know they exist. That's not meaningful human control. It's a guardrail the user didn't set and can't adjust.

A real constraint interface lets the user see: here are the things this agent will not do. Here's how to add to that list. Here's how to make a constraint temporary versus permanent. Here's what happens when an agent encounters a constraint: does it stop, ask, or route to a human decision?

Constraint as a designed surface means making the rule system visible and editable, not buried in configuration files or pre-baked by the platform.

What an actual design looks like

Putting these three together: what would a human control surface for agent systems actually look like in practice?

Start with a persistent state panel. Not a modal or an on-demand sidebar, but an ambient layer that shows agent state whenever an agent is running. Think of it like system monitoring for human oversight: always available, legible at a glance, and drillable for detail. The state panel shows the current goal, active threads, and the next decision point.

Add interrupt triggers at decision-node level. When an agent is about to take an action that crosses a threshold, high irreversibility, high cost, external action like sending a message or deleting a file, it surfaces a decision card. Not a blocking modal. A card that can be reviewed, approved, redirected, or deferred. The user decides when to engage. The system ensures they can.

Build a constraint editor that lives outside the task interface. Before you run an agent, you define what it can't do. That constraint set follows the agent regardless of what it's asked. It's transparent: the agent can see its own constraints. It's editable, you can add to it during a task, not just before. And it's auditable: after a task, you can see which constraints fired and why.

Finally: escalation pathways. When an agent encounters a constraint it can't satisfy and a goal it can't abandon, it needs a designed way to surface that conflict to a human. Not a refusal message buried in a log. A designed escalation: here's what I was trying to do, here's what I can't do, here's what I need from you. The self-preservation refusal story is, at its core, a missing escalation pathway. The agent had no mechanism to surface "I'm in conflict." It had to act unilaterally.

Why this is a design problem, not just a policy problem

The framing that human oversight of AI agents is primarily a governance or alignment question is incomplete.

Governance tells you what constraints should exist. Alignment tells you whether the agent shares your values. Neither produces the interface through which a human actually exercises control.

The interface is a design problem. It requires decisions about information architecture, interaction design, escalation patterns, and the specific moments where human judgment enters the loop. Alignment researchers and policy teams aren't equipped to make those decisions. Designers are.

We're shipping agent systems at scale. We are not shipping the control surfaces that make those systems governable by the humans who use them. That gap is a design gap.

If an agent did something your users didn't want, would they know quickly enough to stop it? Would they have the interface to redirect it? Would they understand what happened after?

If the answer to any of those is yes, you're already doing the design work that makes agent systems genuinely controllable. If not, these three questions are your starting point.

Designing Human Control for AI Agents

What an actual design looks like

Why this is a design problem, not just a policy problem

Reliability Is the New Usability

AI Won't Scale Until the Interface Does