Building the Shipping Container for Developer Toolchains

I’ve been thinking a lot about automation in software projects lately, particularly where the opportunities for a better developer experience lie. I believe many current limitations stem from how we answer the where, what, and how of automation separately, with different companies attempting to profit from each question independently.

Let’s start with the what. We can reduce automation to three core tasks that most projects adopt: build, test, and check, where “check” encompasses any validation that doesn’t fall into the first two categories, such as static code analysis or linting.

Build and test are typically defined by the toolchain of the programming language, while checks are heavily influenced by the community. Our editors have evolved to accommodate these toolchains, ensuring a zero-config experience for developers. It just works. This is a question whose answer presents no business opportunity. In fact, attempting to monetize it would be a bad idea, as the community would likely react negatively.

However, for many toolchains, the answer falls short at scale. This prompted the emergence of generic build systems like Bazel, Google’s open-source build tool that aims to make things work at scale. But like any fork from the standard toolchain, it comes with a high cost that only a few organizations can bear. Bazel’s design with rules captures this reality: you’re relying on a small community to keep them up to date, ensuring not just that workflows run locally, but that editors, debuggers, and everything else operates smoothly when developers interface with code through their tools.

This reminds me of the innovation adoption curve in web platform development. Abstractions emerge in frameworks until the platform evolves and pushes innovation down to the native layer, like how modern CSS features eliminated the need for many preprocessor capabilities. Personally, I think no forks are better than forks, but toolchains take time to evolve, so I understand their emergence as a necessary temporary solution.

Now the where, where these tasks run. Initially, workflows ran only in developer environments, then in CI environments, and most recently, in agentic coding environments like GitHub Copilot Workspace or Claude Code. This is where business opportunities emerged: from complete CI solutions like GitHub Actions and CircleCI to companies providing isolated runtime environments.

These solutions align on some conventions, like using YAML as a declaration language, but fragment when attempting to create vendor lock-in to prevent organizations from leaving. They introduce proprietary features like marketplace ecosystems for reusable steps or tightly coupled solutions like caching, as if these needs didn’t exist outside CI environments.

This CI-centric mindset presents organizations with solutions but also creates new challenges. Consider the portability problem: developers cannot easily debug pipeline executions locally. Or the latency problem with Content-Addressable Storage (CAS) caches, a mechanism for storing information that can be retrieved based on its content rather than its location. These caches are optimized for CI where you can co-locate cache servers and CI environments on the same network. But what about developers working remotely? They don’t run their environments in the same network where CI environments are hosted, resulting in higher latency and degraded experience.

These examples illustrate how business opportunities in remote environments created CI-centric solutions that disregard the diversity of environments we have today, and that will likely continue to evolve. Local environments aren’t as monetizable as remote ones, but leaving them out of the design creates a developer experience that’s far from perfect. Any innovation or attempt to fix this feels like incremental patching rather than fundamental rethinking.

Finally, the how. Automation runs in environments, but there’s little to no visibility into how it performs. This visibility is crucial for understanding where opportunities for optimization lie.

This area hasn’t seen as much progress as I’d hoped, with one notable exception: test runs, where consensus built around the JUnit XML format. But what about compilers or build systems? Bazel proposed the Build Event Protocol (BEP) for streaming build-time information, but it hasn’t been broadly adopted. Apple is working on a new build system but hasn’t designed it to stream build telemetry to external services. Teams remain reliant on standard output and error messages to understand their build processes, a limitation that becomes increasingly apparent as builds grow in complexity.

I believe this is going to change, especially with agents that treat standard pipelines more as an implementation detail in the communication between the agent and the environment. Companies offering runtime environments provide telemetry about infrastructure metrics like CPU and memory usage. While engineers appreciate seeing these metrics, and they can be helpful at times, high CPU usage isn’t necessarily problematic. It likely means the CPU is busy compiling your work, which is fine.

What teams actually need to know is whether their build graph (a directed acyclic graph representing the dependencies between build tasks) is making optimal use of available resources. This question can’t be answered without information about the build graph structure and how it’s being processed. The lack of this type of data leads many organizations to optimize the environment before optimizing the build process itself, environment before workflows, when optimizing the workflows would result in a higher ROI.

Where does this leave us? We need a more holistic approach to supporting build systems and test runners. One that’s not coupled to solving only for CI or CI runners. One that doesn’t embrace a mono build system as the solution to all problems, but instead embraces the diversity of build systems and patterns in which ecosystems find themselves.

I think we’re all coming to the same realization from different angles. At the end of the day, all these pieces are interconnected, and only by treating them as such can we provide the best experience. Otherwise, you end up with a highly indirect and convoluted setup like the one Reddit described in their engineering blog, where teams juggle several vendors and solutions while trying to create a cohesive developer experience.

Every company brings its own uniqueness to the solution, and this is something I’ve been thinking about deeply lately. There are two pieces that have been unique in how we’ve built Tuist that can play a crucial role in shaping infrastructure for modern build systems and test runners: community and open source.

Most solutions build proprietary technology and offer either hosting as a service or a license for self-hosting. But I believe there’s a better model that’s more likely to succeed long-term. If you draw a line between technology and service, treating the technology as an open-source commodity for the ecosystem, you’re much better positioned to serve that technology, and developers’ perceptions of you will be very different.

Would you pay Grafana (the company) for hosting an instance or pay someone else? Likely Grafana, right? This is the power of the open-core model, where core software is open source but additional features, services, or hosting are provided commercially. I see an opportunity to apply the same approach to build infrastructure, though I don’t yet know the full shape it will take.

I think the technology should act as a bridge between build systems and test runners to solve the MxN problem we’ll soon face: many build systems and test runners with similar capabilities but inconsistent contracts. This is a classic challenge in system design where connecting M systems to N systems results in M×N integration points, but a bridge or adapter pattern reduces this to M+N connections. We’re already seeing this fragmentation. It’s similar to the problem that had to be solved in global shipping with standardized shipping containers, a revolution that transformed logistics in the 1950s-60s by creating a universal interface regardless of what was being transported or who was transporting it.

I don’t think this bridge has to be Tuist itself, but rather a technology developed and maintained by Tuist. Think of PostgreSQL, a technology whose ecosystem Supabase contributes to and that they host as a service. Solving this problem requires starting from the ecosystems and moving toward infrastructure, not the other way around. Starting from infrastructure naturally leads to a degraded and fragmented developer experience where you’re trying to make your infrastructure work for existing ecosystems rather than serving them.

This approach also requires accepting that some ecosystems might not be ready yet and contributing to help them get there, something many companies see as a waste of resources. First, because the contribution to revenue isn’t direct. Second, because others could benefit from your investment, so many prefer to push proprietary solutions into ecosystems, something that developers are consistently unhappy about.

Ecosystems are perceived as commons, shared resources that a community manages collectively (a concept from Elinor Ostrom’s work on commons governance). They’re something everyone is responsible for improving, not something companies should try to privatize. This is the approach I see many companies taking, and it’s the wrong one. The alternative approach is harder to secure investment for because getting there requires time, and most investors prefer quick returns over long-term ones. But there are investors who understand this game and the sustainable competitive advantages it creates.

I’m spending part of my time at Tuist trying to understand toolchains beyond Xcode: Gradle, Bazel, Cargo, Metro, Vite, Pants. I’m working to identify their capabilities and the common pieces that can be generalized and turned into a commodity, the shipping container of developer toolchains. The Kubernetes of development environments.

I see Tuist’s role becoming two-fold: making ecosystems evolve and bridging the gap between build systems and infrastructure. We’ll first drop “Apple” from the market we’re associated with and expand to mobile development broadly. And why stop there? We’ll call ourselves build and test infrastructure for modern development toolchains. Step by step, embracing ecosystems’ pace and developers’ expectations.

I’ll hopefully start sharing the first experiments from this work soon. In the meantime, we’re getting ready to extend our “what” to Gradle and to solve the “where” for all those organizations that have decided to trust us to scale their development.

Building the Shipping Container for Developer Toolchains

Read next

Fluidity as DNA

A Tiny Nudge Can Move an Industry