Like many others, I’m enthralled by the breakout of large language models (or AI for short). It will probably be the most interesting if not consequential technology developments in this past decade.
In addition to the remarkable capabilities that these AI tools show, as a designer of user interfaces, I’m also fascinated by how these probabilistic systems challenge the way we understand and design user interfaces today.
To keep it short and sweet: GUIs in the last 30 years have been rooted in the same assumptions of deterministic and predictable inputs. If you read any HCI or human factors textbooks, ideas that were postulated then, like Fitt’s law, still holds true. I often think of user interfaces as deterministic state machines; the user is clicking buttons and traversing a linear process to reach their desired output point.
AI systems however break many of the underlying assumptions. For one, a lot of AI are probabilistic systems. An image detection algorithm makes a prediction and provides a bounding box and confidence level. Chat systems built on large language models can receive arbitrary input and produce reasonably sounding outputs. There are no longer buttons to press, and flows to traverse, or at least something that can be reasonably visualized in Figma.
Therefore, we must change the way we think about user interfaces. There will still be UI controls, but the way that they get used will change. I also do not think that the chat UI is the only viable way to interact with these systems, and here I propose three ways to think about them.
I. Constrained input; unconstrained output
In this first model, the user will be able to click a button and software will perform different functions depending on the application state. Adobe Photoshop’s Content-Aware Fill is a good example where the fill uses the surrounding data to generate reasonable approximations. Similarly, in GitHub Copilot, the autocomplete uses the surrounding code to generate autocomplete predictions.
At a glance, this doesn’t look too different from what we have today. As designers, this can in fact result in less interaction design and shorter user flows! It has the gift of familiarity; users can do more within the interface paradigm that they are familiar with.
If done well, they will be able to achieve more without needing to master the user interface or query language. If done poorly, users will be confused and they will feel like they are constantly fighting the software.
In the short term, this will become the dominant AI user interface paradigm as companies roll out AI integrations into their existing products. Tools will autocomplete content as the user starts filling out forms. Authoring tools will start predicting the user intent and offering shortcuts. They might even try to preempt the user!
There will be incremental improvements but this does not fully tap into what truly makes AI unique.
II. Unconstrained input; constrained output
A much more interesting class of applications is one where the requirements for output are well defined but input is unpredictable. For example, turning unstructured language into structured data like summarization, translating spoken language into drive through orders, and using computer vision to detect and count objects in a fulfillment setting.
Many of these systems will be human in a loop systems; the software is working with large data streams that are automatically captured or provided. The bulk of the user interface will be constructed in service of two similar but different concerns: error correction and working the AI to produce the desired output.
To make the use cases work effectively, the software will have to provide sufficient controls so that the user can shape the outcome they need. A useful case study is how autocomplete is received by people using iOS or android phones. Most people use them, some people turn them off completely in favor of manually constructing every word, others make ducking jokes about them.
There are likely going to be two sets of new ideas here. The first is to show how the output is derived from the input. In image detection use cases, it would be to correlate the output with bounding boxes from the original input footage. For language models, it can be to highlight and show locations where the source material is being referenced from. The user can then be presented with a commanding model where they can start modifying the output. Perhaps the AI “misheard” and made a translation mistake, the user should be able to trivially correct it. The design challenge here is to understand the failure modes and create tools to easily work with or around them.
The second part is to create controls for the user to modify the output to better suit their needs. Early work here, such as in generative AI tools, include sliders that can control the temperature of the inference, or a button to quickly vary the parameters and regenerate an image. As we dive deeper and better understand use cases where systems like these are useful, the design task here is to make the knobs here much more relevant. If the generated summary “misses” a specific data point, the user should be able to “quickly point it out” and get a much more relevant output. Therein lies the work that needs to be done.
III. Unconstrained input and output
At last, we get to unbounded inputs and outputs. What would it mean for both the input and output to be unconstrained? The closest analogy today that I can think of are drawing apps. The user wants to make art. If the user is using a stylus, the input is constrained only to the physical limitations of the sensors. The software provides feedback and controls so that the user can achieve their intended effect. Using the same stroke of hand, for some it can look like a felt tip marker pressed on a piece of heavy paper. For others, it can be a mish-mash of patterns and colors.
With the advent of AI tools, this model may explode in availability and popularity. To take an AI agent like ChatGPT to its logical conclusion, a person may start with an idea of taking a vacation and ending up with flights and hotel bookings in Buenos Aires. The AI agent itself does not initially present a button to book a flight. It’s the back and forth with the AI agent that worked through the idea and produced an action that then can be acted upon.
Setting aside the integration and ecosystem question, the main characteristic of this interaction model is that content is no longer separate from the user interface. In fact, the input and output is now part of the user interface. There might be buttons presented to confirm the purchase of the ticket, but the UI encapsulates more of the user journey, and can take part in more of the decision making process.
We can model this user interface as a two way interaction. The software accepts an input, processes it, and provides a result and follow up actions. In response, the user can accept the result, take one of the actions, or pivot away from the current path. This repeats as often as necessary until the user is able to successfully get to a satisfactory outcome. This can be in the context of brainstorming, building a virtual world in 3D software, writing software, or tasks that require converting intent into a polished piece.
Chat agents are not the only incarnation of this UI model. To take a naive example of a research notebook, as the user is writing down ideas, the notebook can prompt the user with a series of buttons to pull in additional outside sources or link together similar ideas within the repository. If the user writes down some statistics, the AI system can generate a graph. Together with the AI system, the user can do a better job at preparing their notes in a shorter amount of time.
A cognitive shortcut we can take here is to imagine the task as it would be completed by two people. Historically, the software plays the facilitator for the collaboration and interaction. With AI systems however, software can stop being neutral and start to participate, doing more of the heavy lifting and bringing forth the benefits of collaboration. Once we can do more with less, we can do much more with more.
This has all felt really new and refreshing. If a fraction of what I think will happen happens, we will start to see dramatic shifts in not just how we use software, but how software integrates into our lives.
I’ve long argued that interaction design today has not been serving its users and designers well. And I’m excited to see what this change will bring.