Part 2: Designing Systems, Not Screens

Part 1 made the case that the interface is becoming optional. This piece is about what design actually becomes when that's true — what the work looks like, what the artifacts are, and who in the organization is positioned to do it.

Hero image for Part 2: Designing Systems, Not Screens

Over the next few years, the function of design shifts. Not as a tooling update, and not incrementally. The fundamental question changes, from "what should this screen look like" to "how should this system behave over time."

That changes what designers make, what researchers measure, how product managers think about value, and what engineers are responsible for. The whole cross-functional model reorganizes around a different definition of the deliverable.

The new design problem

When software can act on behalf of users, the design problem changes shape.

The interface used to mediate every interaction. User wanted something, the interface provided a path. Designer's job: make that path clear, efficient, and hopefully satisfying. It's a job that's becoming a smaller fraction of the work.

In the near-term, agents will run multi-step workflows. Users will approve at critical junctures, not at every step. The design problem becomes: what should the system do without asking, and when must it stop? Where is the delegation boundary? What happens when the system hits an ambiguous situation? Who decides, and what does recovery look like when it decides wrong?

You're choosing the rules for what the system can do on its own versus what it must ask you before doing. For example, it can send a calendar invite automatically, but it can't email a client unless you approve it first. That rule is the real design, and the interface is what you build to show and control those rules.

These are interaction design problems. They're just not screen design problems.

What UX design becomes

Designers become choreographers of human-agent handoffs.

The concrete change is that design artifacts start to look more like behavior specifications than screen designs. You define delegation boundaries: what the agent can do without approval, what requires confirmation, and what requires full human review. You design escalation paths — not just error states, but the full range of "this got complicated, here's how the system hands control back." You design for failure gracefully so users can recover without losing trust.

In practice, a behavior specification is a document that says the agent can draft the email but cannot send it. It can flag the anomaly but cannot act on it. Those permission lines are the deliverable. That is what you are designing.

There's a new problem here: trust calibration. When a system makes a recommendation, how confident is it, and how do you communicate that accurately without undermining or inflating trust? HCI research is consistent: users who don't understand a system's capability boundaries either over-rely or abandon it. Both outcomes are failures. Designing for accurate mental models is no longer optional.

In a real product, a system that says "Here's a draft for your review" behaves differently than one that says "I've sent the recommended response." One preserves the human decision. The other has already made it.

The design system evolves alongside this. The primitives encoded there start to look different: capability contracts, confirmation patterns, fallback states, provenance cues. A provenance cue shows a user that a summary was generated from these three documents in their inbox, not the model's general training data. It's the citation. The timestamp. The thing that answers "where did this come from" before the user has to ask. Not just visual components — interaction policies.

There's a longer conversation worth having about how the roles of UX Design and UX Research will evolve as AI matures. Today, these roles are distinct practices with distinct outputs. But as design artifacts start to look more like behavioral specifications, and research starts to look more like continuous evaluation, the line between them gets harder to draw. The skills are converging, even if the titles haven't caught up yet.

What UX research becomes

Research doesn't disappear. It becomes more important and harder to do well.

The shift is from study execution toward evaluation operations. That doesn't mean foundational research goes away — before a system exists, you still need to understand the people it will serve, the context it will operate in, and the failure modes that matter most. That front-end work doesn't compress just because the back-end has changed. What does shift is what happens once a system is live: you're building infrastructure for continuous measurement of system behavior, not just commissioning episodic studies. Scenario libraries. Test harnesses. Ongoing monitoring for trust signals and failure modes. Research becomes a standing capability alongside the discovery work, not a replacement for it.

Once a system is in production, the team's day-to-day looks different. They're monitoring a live system and flagging when something changes — not just scoping the next study.

The metrics change too. Task success is still relevant, but it's not enough. Override frequency. Correction rate. Abandonment after an AI action. Whether users understood what the system just did and why. These aren't satisfaction metrics. They're signals for whether the system is actually working for the people using it.

Research also becomes the function that defines what "good" looks like. As AI generates more content, more recommendations, more decisions, someone has to hold the standard. That's a research function, and it requires the same rigor that good usability work has always required, applied to a messier, more dynamic target.

As with design, the trajectory here points toward something more unified. Researchers who can specify behavior and designers who can evaluate it start to occupy similar territory. Whether that produces new hybrid roles or just closer collaboration probably depends on the org. But the gap is closing either way.

What PM and engineering become

Product managers don't disappear, but the job changes in ways that existing job descriptions don't fully capture.

Right now, the most useful thing a PM can do is act as a capability translator: defining where AI genuinely helps and where it adds friction, owning the ROI story on automation decisions, and being honest about what should be automated at all. Over time, the work looks more like orchestration architecture — thinking in outcomes rather than features, deciding which parts of a workflow belong to the system and which require human judgment, and designing clean transitions between those states. Capable PMs were already supposed to think this way. AI just removes the excuse not to, because the cost of feature-level thinking in an agentic product becomes immediately visible.

For engineering, the shift is from building features to building systems that can act. That requires different quality models. Stability, recoverability, and behavioral consistency become as important as functionality. The delivery pipeline includes evaluation suites, behavior monitoring, and incident review. It's not enough for a feature to ship. It has to keep working under real-world conditions that nobody fully anticipated.

When moving fast loses the user

There's a failure mode in this transition that I keep coming back to.

When teams ship faster than they design — regardless of whether AI is involved — users pay for it. In agentic products, the cost is higher, because the system keeps acting after the gap. Users who had no idea what the system had done, no way to review it, no clear path to correct it. When something goes wrong, they don't just distrust that action. They distrust everything the system has ever done quietly on their behalf.

Designing for delegation means designing for recovery. Every automated action should have a legible trail, an understandable reason, and a clear correction path. That's not optional infrastructure. It's the product working as advertised. Getting it wrong is a product failure, not a design detail that got deprioritized.

What to build toward now

The teams that will navigate this well are already working on three things.

First, a practice of behavior specification: designers who can define system behavior, not just screen behavior. This is a skill that develops through deliberate practice, not by accident.

Second, evaluation infrastructure: the ability to continuously test whether the system is behaving as intended and whether users understand it. This takes time to build. Teams that start now will have a real advantage in three years.

Third, a shared language across functions: UX, PM, and engineering working from the same model of what "good" means for a system that acts. That alignment doesn't happen by accident. Someone has to build it. In most orgs right now, nobody has claimed that job.

Why UX is the right function for this

Look at what the governance role actually requires.

Someone has to define when a system should act and when it should wait. That's a judgment about human agency and context — not a technical specification, not a product roadmap line item. It requires understanding of how people form trust, how they recover from errors, and how they build mental models of systems they can't fully see.

Someone has to hold the standard for what the human experience of an agentic system should be. Not just whether it's fast or accurate, but whether users understand what it did, feel in control of what it does next, and can get back to solid ground when it gets something wrong.

Someone has to make behavior legible — translate system logic into interactions that feel coherent to a person who isn't thinking about the system at all.

These have been UX problems for as long as UX has existed. The surface they operate on is changing. The underlying discipline isn't. When you describe what governance of human-AI systems actually demands, you're describing interaction design applied to a larger problem space. The skills are the same. The stakes are higher. The scope is broader.

That doesn't mean every UX team is ready for it. Moving from interface execution to behavior governance requires broadening how the function defines its work. Designers need to be fluent in system behavior, not just screen behavior. Researchers need to build for ongoing evaluation, not just episodic studies. Leadership needs to define the function's scope expansively enough to match the problems that need solving.

But the foundation is there. The question for most organizations isn't whether UX is the right function to govern human-AI interaction. It's whether they've given the function room to do it.

Part three looks at what happens to the organizations that haven't.