TNS
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
NEW! Try Stackie AI
Data / Edge Computing / Open Source

A Startup Complements Kubernetes, Docker and Wasm at the Edge

Expanso has created Bacalhau, an open source architecture that allows users to run compute jobs where the data is generated and stored.
May 4th, 2025 11:00am by
Featued image for: A Startup Complements Kubernetes, Docker and Wasm at the Edge
Featured image of 116 Pall Mall, site of Cloud Native Rejekts Europe 2025, by Alex Williams.

David Aronchick looked around the historic room at 116 Pall Mall in central London for the Cloud Native Rejekts conference — a must-attend if you attend a KubeCon conference. What a space, BTW.

Aronchick is the CEO of Expanso, a startup that provides distributed computing to workloads. Instead of moving the data, the compute goes to the data itself — increasingly relevant as enterprise customers look beyond the cloud for their computing needs.

He pointed out that all the devices in the room will someday have some rich computing architecture. The recognition dawns that GPUs will come to these devices. They, of course, won’t be water-cooled, honking massive GPUs. These tiny GPUs might cost $500, $100 or even much less.

Distributed-compute workloads are emerging in a real way. As we know, GPUs can process data sets of the most complex order, used in AI and machine learning environments.

Alice in Dataland

You start exploring the implications of these new data points, and it gets trippy. British author Lewis Carroll introduced us to the metaphor of looking down the rabbit hole. Down there in the land of distributed computing is an unknown world that may be more like Alice’s Wonderland than we care to realize.

But that’s the nature of computing everywhere: it’s like a wonderland. With trillions of devices, what will become of workloads? How will this trippy world get its compute?

Aronchick worked at Google, where he co-founded the Kubeflow project, which “was designed to help simplify the deployment of open source systems for [machine learning] applications on Kubernetes platforms at scale,” according to The New Stack’s 2018 post on the project.

For Aronchick, it’s about how we think of the different dimensions edge environments will require for developers to develop, deploy, and manage application architectures, whether standard software or AI models.

He and his team have developed an open source architecture they call Bacalhau. It’s a distributed compute-over-data model that complements Kubernetes, Docker and WebAssembly (Wasm). It allows users to run compute jobs where the data is generated and stored.

Aronchick said the ability to network data gets a bit dicey when working with more than a single zone in a cloud network with Kubernetes clusters. The cross-zone stuff, regional data movement and working across clouds — Kubernetes architectures need a better way to do this.

You can see it with the use cases that Aronchick proposes. Logs, for instance, can be processed at the edge. Machine learning (ML) inference can happen at the data’s source. With a scheduler, the data runs on local nodes without centralized data lakes.

It’s about using a remote server or an Internet of Things (IoT) device, whatever that may be, to process the data at the source before it is sent to a central location. The data may undergo some inference at the source. It could use Wasm to isolate and process the data at the source.

Bring On the Compute

Cloud development environments are the rule, not the exception. But with Kubernetes, Docker, and Wasm, the ability to use a server, let’s say, in a restaurant, becomes far more doable.

That still, however, leaves the challenge of getting compute to the data. According to its documentation, Bacalhau uses local storage, an S3 bucket, or other storage providers—reducing unnecessary data movement. A customer uses what Bacalahau calls jobs and executions. Jobs define the workflow, and executions run the jobs in parallel across the nodes. More here.

Gartner estimates that around 10% of enterprise-generated data is created and processed outside a traditional centralized data center or cloud. By 2025, Gartner predicts this figure will reach 75%.

Aronchick thinks of it this way: “I am an enterprise with 100 nodes spread around the world. I already own them, and they may already be in the cloud, but they’re not under a single controller like Databricks, Snowflake, Kubernetes, etc. You want to get a job to them — a data processing job, a new ML model, a new configuration, whatever — and can’t reliably do it today because you don’t have a single controller.”

That controller? That’s where Expanso could enter the scene. A customer can place one of Expanso’s agents on or near one of these nodes to give it the sense of a single control plane.

“We can run on a phone, sure!” Aronchick said. “But there are 100 million servers worldwide, and every time a business needs one, or a virtual machine, that isn’t in their central data center, they’re going to add to the challenge.”

For context, it’s a 49-millisecond ping time between Los Angeles and New York. The speed of light will never get faster. There’s nothing we can do about it.

Content delivery networks get this very well. Akamai now uses Fermyon’s Wasm technology to isolate data streams. Still, there’s the need to move the compute.

But getting the compute to the data is a massive cost, wrote Bogdan Kurnosov, now senior engineering manager at bunny.net, on The New Stack in 2024.

“Routing traffic in and out of data centers is a time and money drain,” he wrote. “Paired with the rising demand for increasingly personalized web experiences, we needed to explore a new approach to cloud computing. Fortunately, we’d already built a network of edge nodes with our content delivery network (CDN). Adding computation capabilities to our CDN nodes was the next logical step.”

Data will inevitably go to the edge, and that’s a shift from the super data center model. The compute will have to go to the edge, too.

Then, the question comes down to why a company like Expanso has a chance in this new world. For one, Bacalhau is an open source project. It can run across on NVIDIA, Intel or AMD. It’s hardware agnostic. Expanso technology can harness the computing capabilities of distributed GPUs.

That’s clear to me why open source offerings like what Expanso provides have a clear path as a core service as we accelerate toward a network of trillions of devices.

There you go, my take from Cloud Native Rejekts. See you in Atlanta.

Created with Sketch.
TNS owner Insight Partners is an investor in: Docker, fermyon, Databricks.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.