The Case for Dynamic CI Pipelines
Something I've been thinking about lately is how static CI pipelines are, and how this makes it challenging to work in large codebases and monorepos, where you can easily be constrained by the resources of a single machine and might want to spawn multiple CI environments and aggregate the results upon completion. What am I talking about, you might wonder?
Picture this: At some point in the process of scaling your organization, you might find yourself with large codebases—for example, monorepos—with inter-dependencies between various components and the need to be selective about what you run and where you run it. During my time at Shopify, they spun up a team dedicated to test infrastructure whose role was to become more selective about these things. At some point, you'll be constrained by running things on a single machine, and you'll want to distribute the work across several pipelines as some kind of sharding. But you'll be very limited by the pipelines, which are quite static. You can't simply say, "From this CI job, spawn X jobs, which I've calculated based on data from my build system, and then aggregate the results back." The only company I know that has gotten close to this is Buildkite, but most others continue gravitating around the YAML-as-a-pipeline building block.
Take tests in iOS that require a simulator. There's a limit to how many simulators you can run per machine, and this is an OS limit because every iOS simulator runs in the system as just another process consuming system resources. There's a limit to how many file-system processes and file handles can be opened—a classic example of resource contention. On modern Apple Silicon, that number should be around 12, which means in the context of running e2e tests (which are slow by nature), you can run at most 12 simultaneously. That's not bad, but what if you have 40 or 80 tests? This adds up very quickly. To fix that, you'll consider sharding, and the sharding is likely codified in your YAML with some runtime logic. However, the number of shards is fixed and something you can't control dynamically, so you'll likely take the list of all tests and distribute them across shards following some logic. Wouldn't it be amazing if the number of machines were something you could control dynamically? It turns out that environment providers like Namespace can scale elastically, so if you have the budget for it, you can throw money at the problem and find your sweet spot dynamically.
Another interesting use case for dynamic pipelines is determining what needs to run based on changes and then distributing that work across different machine specs. Let's say a build system determines there are a few Ruby tests to run and some Rust ones based on changes in the monorepo. For simplicity, Ruby has a native piece built with Rust for some business logic that's CPU-bound. The build system could spin up two machines with different specs because the needs of those tasks are very different, so the machines should be too. But note that this requires the build system and CI provider to work in harmony, and this is quite tricky—a coordination problem that exemplifies Conway's Law, where the structure of your toolchain reflects your organizational boundaries.
Bazel has all the build information necessary to do this, and there are companies that provide infrastructure for Bazel. However, Bazel means replacing your build system, which is very costly for organizations. On the other side, there are companies like Namespace that have the infrastructure and could plug into build systems and monorepo build tools. But none of those tools have been designed to contract with remote executors in the same way Docker does, where you can decide to build remotely and it really feels local. Imagine if the build tool you use had this capability built in, where you wouldn't have to keep a YAML file up to date or push changes, but could interface with it directly from your local environment, simulating how the pipeline would be inferred and executed. Dagger is one of the closest to this model, but it builds on virtualization, which comes with its own set of tradeoffs and doesn't work well with models like iOS builds, where virtualization isn't cheap.
So I don't really know what the answer is, but it would be amazing if a protocol emerged between companies that provide environments, like Namespace, and the build systems that might want to run things remotely. Perhaps there's an opportunity here for companies to define a standard that would prevent forcing organizations into a one-stop-shop type of build system that's costly to adopt.