The automation graph agents need

If you’ve been working with coding harnesses for a while, you’ve probably noticed that once the harness is done writing the code, it has to figure out how to close the loop on its own. To do that it needs a bit of an understanding of how the project is wired up, and how the pieces are connected to the file it just touched. The classic question is something like, “I changed this source file, what are the tests associated with it?” Sometimes the answer is right there in the naming, one test file per source file, and the map is straightforward. In a lot of codebases it isn’t, and the agent ends up guessing, running more than it needs to, or skipping things it shouldn’t.

Underneath that little frustration there are two things you actually want. You want the agent to understand the dependencies well enough to close the loop fast, with less guessing and less looking around. And you want it to do that effectively, because if you run only a narrow handful of tests around your single change you might miss a lot, and if you run everything you’ve thrown away the speed that made the agent worth using in the first place. The way most teams paper over that today is by pushing changes to CI and letting CI be the place where everything happens. CI becomes this big container where you either run the full thing, linting, testing, compiling, formatting, deploying, or you try to be selective about a subset. There is something about that selection that doesn’t feel like it should be exclusive to a remote computer environment. It should be something an agent can reach for too, locally, in a sandbox, anywhere.

The question agents keep asking

This is where I think there is a real opportunity for an automation layer that agents can talk to. It’s the same question for them as for us, with the same easy cases and the same hard cases, and the surface around it has to be designed with the idea that an agent is going to be the one asking. The simple version of the question is the one I mentioned, “I’ve changed this file, what do I have to do?”, and I think that should be cheap to answer through a CLI command and an MCP server. But the interesting part is that the question doesn’t stop there. There are pieces of the answer that are not just commands to run.

When I think of Bazel, and of build systems in general, a lot of the design is built around the idea that you can codify the work in a graph. It’s true that the focus there is on the deterministic end-to-end automations, building, testing, linting, the things that can run without a human in the loop. What I find more interesting is the idea that you can also bring into the graph pieces that are not deterministic, and pieces where the right answer is actually for a human to participate. After you’ve done this, it’s important that someone takes a look at this critical part of the codebase. That sentence belongs in the graph too, not in a tribal memory or a checklist nobody reads.

So I find myself wanting a graph that goes beyond just building, testing, and linting and the classic things you already know. Something that, when you change a file, doesn’t necessarily automate anything, but does something quieter, like giving the agent additional context based on what was touched. A node in the graph might just be a piece of context, a decision record, an old discussion, an architectural note for that part of the system. I think this is the kind of graph that will become more and more important over time, and maybe in the future some of this knowledge will get codified into the model itself. But I keep being surprised by how bespoke large codebases get. The setup is rarely just “these are the conventions because it’s a big project”. It’s a particular team’s decisions, exceptions, scripts, ownership boundaries, and old workarounds. I would rather not ask the model to rediscover all of that. I would rather it could look at the graph and know what to do, and traverse it to figure out how to validate the work it just did.

Once, and doing things once

This is roughly the exploration I’m doing with Once, and I really like the name because it captures the idea pretty cleanly. You try to do things just once, and you let the rest of the system reuse the result. This is where hashing and fingerprinting become extremely valuable, because if a piece of work happened against a precise state of the inputs, in a known environment, you should not have to repeat it. Not the build, not the test run, not the validation, not the “did a human review this”, not the context that was materialized for the agent. None of it.

The reason I want to be deliberate about this is that the industry right now is very focused on compute as the differentiator. There are so many different flavors of CI runners, sandboxes, ephemeral environments, and backends for cloud agents that the ecosystem feels saturated. There’s very little innovation happening in that layer, and it doesn’t really make sense for us to step into it. It’s not that we don’t have the capability. We can run ephemeral environments, we can allocate cache inside them, the pieces are there. But if we take the infrastructure we already have and we push ourselves two or three years into the future, I don’t think the interesting question is which of us runs the fastest box. I’d rather imagine what a future might look like and build backwards from there, accepting the direction infrastructure seems to be taking and trying to be very well positioned by the time that new reality is actually here, instead of expanding the business one incremental step at a time and ending up in the crowded part of the market.

Co-location beyond the cache

A lot of what I’m looking at fits under co-location. We are already co-locating the caching servers and distributing cache across regions so that an artifact ends up near the place that wants to use it. That’s the obvious version, and it keeps mattering as agents do more of the work. But I think co-location goes further than that.

We are exploring what it would mean to co-locate working repositories with the automation layer, so an agent can be handed something closer to a materialized view of the code it needs, instead of having to pull and push the whole repository every time. If an agent is going to touch a small slice of a large codebase, why are we sending it the entire thing? Some companies are trying to solve that by making the machine stateful, by keeping the workspace warm and the clone alive, and that might be a perfectly valid option. I’m more interested in whether the automation layer can coordinate with the forge layer to pull just what is needed, which in many cases is a very small subset of the repo anyway. Once you know the graph, you know the subset, and once you know the subset you can be smart about what you fetch.

I think there is an opportunity in there to fold the icon of “the machine” away a little, and to bet that the model where the workspace is the unit will quietly get less important than it is today. That’s a guess, not a certainty, and the stateful-machine direction might end up winning in many shops. But I’d like the option, and the graph is what makes it possible to have it.

The open layer

What I like about this whole direction is that none of it has to belong to one provider. The graph should be something a project can define and any tool can consume. Tuist can be one provider of the infrastructure that lights it up, with caching, compute, and the pieces that already make sense for us, and other tools can play other parts. The value comes from the graph being useful across environments, not from trapping the project inside a single one.

This is a lot of fun, and I really like how the pieces are starting to fit together. It leads naturally to an open-source-shaped technology that anyone can use with anyone, with Tuist as one of the providers that makes some of those automations real, like caching, compute, and the things that come next. I don’t have a finished thing to point at yet, but the shape of it keeps getting clearer the more I poke at it, and the more I look at what large codebases and coding agents are actually asking for, the more I think the next interesting unlock isn’t another place to run code. It’s a better way to know what should happen after the code changes.