Thin Environment Layers for an Agentic Future

What is an environment from a developer's perspective? A place where code is built and tested. We've evolved from having just local environments where we develop and test code, and production environments where we make our apps accessible through the public internet, to having CI environments where our code is integrated, and most recently, agentic coding environments where agents execute the tasks we assign them.

They all have one thing in common: they take source code and one or multiple tasks as input, then produce some output. This might sound abstract, but at a very high level, that's what we do in them. To do their job, they all have to provision the environment with the tools necessary to complete the work, and you'll likely expect them to do it fast and reliably.

Now, if they all have to do the same thing, it sounds plausible that we could reuse the provisioning and optimization logic across all those environments. Unfortunately, that's not the case unless you're using Bazel, which for many organizations is sadly costly to adopt. Bazel makes the whole graph hermetic, and in its build graph you can also codify the installation of tools necessary for other tasks. In other words, with Bazel, you just need Bazel. This is great and there's a lot to learn from it, but there's a ceiling to the number of companies that will decide to go that route. Bazel, with all its benefits, also takes people away from their build systems and ecosystems, and this is a very high price to pay.

So if Bazel isn't an option, what can we do? You should start thinking of your environments and the interaction between them and your project as a layer that you want to be as thin as possible. The thinner it is, the more portable your project's automation and provisioning logic becomes, and the easier it is to run across the aforementioned environments and others that will emerge in the future.

Let's picture this: Your project has some automation in Ruby, which needs to be present in the system to run it. At the time, you decided that CI pipelines would install Ruby using a step from their marketplace. Because of that, when you task an agent to do some job, it misdetects the version, installs a different one, and the agent comes back to you with a failure. Compare that with using Mise, which we've been huge fans of. The agent just detects Mise and runs mise install, letting it install the right versions deterministically. Voilà! Every environment can just run mise install. It might seem like a subtle thing, but it's a change that makes your projects more environment-agnostic.

As I said, provisioning isn't the only thing you'll need. You'll also want those tasks to complete fast, and at the speed at which code is being written with AI, the volume of code we'll have to compile will grow faster than the speed of the processors we use to compile it. In other words, we'll need very sophisticated caching systems that run close to those environments. Once again, you can couple your project to a solution from a company that provides those environments, like your CI provider, or instead adopt a solution that's environment-agnostic, so that your coding agents can also benefit from it instead of just your CI environments. This is why I believe strongly that Tuist's model of offering cache in an environment-agnostic manner is the right way to go in an agentic world.

The last piece about environments has to do with telemetry. How do we know the job was executed, and how do we debug issues when they arise? If you think of a local or remote CI environment, that's usually through the standard output and error messages that the toolchains you use emit. This can go a long way, but in a world of agents, the lack of structure or visibility over internal tasks can be detrimental to agents' ability to debug their work. Take xcodebuild, for instance—its raw output is too verbose 95% of the time, but when issues happen, you (or the agent) wish you could go deeper. Or not just when things go wrong, but when you notice slowness.

Telemetry is sadly not as common as cache is becoming, but I think it'll come, and infrastructure will be needed for that. The only product I see moving in a good direction there is Dagger. There are many things I don't like about its concept, but how they're able to live-stream and present you with an HTML version of your build that works live, and where you can go deeper to understand what happened, is a much better version than scrolling through a pile of standard output messages.

Will this be a one-year thing? Or maybe two? Who knows... but I think the time will come and the infrastructure will be needed.

Thin Environment Layers for an Agentic Future

Read next

Building the Shipping Container for Developer Toolchains

Fluidity as DNA