The last few years have been a spectacle of wonder and dismay. Wonder because I’m consistently awestruck by the velocity and voracity of AI progress. A technology that is profoundly sophisticated while being deceptively simple. Occasionally capable of doing things that blow my mere mortal mind, and then stumbling down the gentlest steps. And above all it is a fascinating piece of technology that is proving its hype.
I have vivid memories of my intro to NLP class back in grad school 15 years ago. I did okay if I recall correctly; math was not my strong suit and we spent the entire semester solving various mathematical flavors of finite state machines. Towards the end of the semester, the professor mentioned something about state of the art statistical methods and how they had a roughly 10 point gain on the puny finite state machines that we’d been prodding the entire time. Little did I know then that 15 years later humongous NLP applications would be writing code better than the 75th percentile software engineer.
I bring that up to say that the one thing I failed to appreciate then was how much more than a “user interface” language is. I have always seen it as an alternative (or additive) to the GUI windows, buttons, and clicks model. When I saw the Humanized Enso launcher back in the day, I was convinced that this was the future of how we used computers. What language models demonstrated however is how much language can be used to wield computers.
What’s dismaying is that it feels blatantly obvious in hindsight. There are these things that we stare at that stare back at us that we don’t appreciate. I have long believed that the next big thing will come out of left field. I keep looking out for it, and yet I missed the gorilla walking through the frame. The NLP class was back in early 2011, and the AI labs formed circa 2015. A friend gave me a demo of GPT2 around 2020 when it was all around text completion. I scoffed. I obviously lack the acumen for predicting the future.
Language as action
I’m here today to rectify that. I’m currently working on a project called Sunnyday, which is the culmination of several ideas I have been exploring over the past few years: agentic harnesses, reliable repeatable AI workflows, and the human interface to use them effectively.
My favorite thing about computer programming is that you can build whatever you can dream of. AI feels like a much more expansive and accessible version of that idea. Natural language is native to how we think, which makes it far more expressive than the rigidity and formality of traditional programming languages.
The deeper shift is that language is not just how we describe things, it is also how we direct action. We ask, instruct, negotiate, refine, and coordinate through language. What makes AI powerful is that these language models can manipulate language itself and combine it with vast knowledge and the ability to use software, allowing us to turn mere words into orchestrated processes and outcomes.
We already see the broad use of these systems in apps like ChatGPT and Claude. They can take your words, infer your intent and turn those into documents, software, presentations, and actions across a growing number of integrations ticking up by the day. The gap between thought, instruction, and execution is rapidly shrinking.
Most of the popular AI agents today only work with a human in the loop. People who use these agents describe the experience as collaboration, and many of the product names reflect that – copilot, cowork, etc. Sunnyday takes the opposite view in this regard. I’m interested in the ergonomics of using an agentic AI system that runs in the background. The hard problems here are not AI; they are about creating agents that can gather sufficient context to reliably complete a task, and identifying situations where handoffs happen between human and agent.
Relatedly is thinking through the composition and control flow in these kinds of systems. Many of the systems today are unbounded work streams. These are continuous series of tasks and objectives with no clear delineation which want to consume more and more context. My thesis here is that for a system like Sunnyday, there is a Goldilocks unit of work that is small enough to be reliable, large enough to be useful, and composable into work that is larger, more sophisticated, and more alive.
The bet
All the work that I’m currently doing is at the harness level. The leverage points I’ve identified are in interaction design – setting users up to create successful agents, building good feedback loops, and clarifying how an agent gets invoked – and in control flow, where the questions are how we know a task succeeded and how we help the agent take the optimal action. In between the two layers are substantive design choices around prompt engineering, memory, and environment setup. The final layer is the models, but there are way better resourced labs working on this. Worth watching, not where I can contribute.
The bigger risk is finding big categories of work that Sunnyday is good at. It may feel intuitive that repeatable work is naturally in Sunnyday’s wheelhouse, but the shape of the work matters a lot. Repeatable and mundane tasks will work better and more reliably with a well-designed software process that is now cheap to build. For tasks that require human check points, a hands-off approach is not a good fit. Tasks that require computer use are premature at the moment and will continue to stay adversarial, especially with threatened SaaS providers.
The bet that I’m making here is that horizontal capabilities matter. Frontier labs are building models and harnesses, and focusing on big dollar opportunities. Startups are taking them into verticals – the most obvious play here at the moment of course. The horizontal layer for specifically shaped AI problems – such as a repeatable reliable agent – is an opportunity, and one that plays to my strengths in both AI and UX.
My hope is that in a few years, hands-off agentic processes are a first class primitive that people reach for, and the patterns for building them well will trace back to Sunnyday.
Closing thoughts
Lately, I keep thinking about the work that Terry Winograd and Fernando Flores did with the Language-Action Perspective in the previous artificial intelligence epoch. The core insight around all information is communication and that language is action is right but computers back then were limited in what they can do. In this era of language models, we are afforded a much broader canvas to use language to do things with. The challenge now is to create harnesses, structures, and scaffolds to help put them into productive forms.
This project is a culmination of multiple years of work. Sunnyday encapsulates Axle, my open-source agent library which was started 3 years ago to explore AI automations1. Sunnyday is, more or less, a hosted version of Axle with capabilities that make it work as a hosted web service.
Both Sunnyday and Axle stand on the shoulders of DSPy and the Vercel AI SDK. DSPy was pivotal in changing how I think about giving structure to language and the way language can be manipulated in a formal and structured manner. The way I thought about provider composition in Axle was influenced by the Vercel AI SDK.
No work stands alone and none can endure without the forge of the real world. I am immensely stoked about the work I’m doing with Sunnyday. If you got to this point in the article, I’d love to talk. There is a lot of work to be done here, and I’d love for you to be part of it.
-
There is an interesting story here: originally, Axle was conceived as a pipeline based LLM harness like GitHub Actions and other workflow runners. However, the shape never quite fit. As AI gained tool use and other agentic capabilities, it became a more and more awkward fit: when does the AI decide to handoff to the next stage versus completing it within its own context? The question prompted a reflection around linear versus recursive composition, and here we are. ↩