What engineering leaders get wrong about data stack consolidation
When IBM announced its $11 billion agreement to acquire Confluent (shortly after also absorbing DataStax), most of the commentary focused on what it means for Kafka’s roadmap, or whether IBM can execute on a cloud-native integration strategy. While reasonable questions, engineering leaders are probably asking them too narrowly.
What happens to engineering autonomy when the open source tools a team depends on stop being neutral?
We’ve seen this movie before
The pattern is familiar enough to probably deserve a name. An open-source technology emerges and gains widespread adoption because it’s genuinely neutral and highly useful. Nobody owns it, nor does anyone wholly control the roadmap. Your team’s expertise with the projects is transferable anywhere, and it becomes the industry standard. Then a large incumbent acquires the commercial entity behind it, and the technology is slowly (or not so slowly, often enough) pulled from community asset into platform feature.
If you’re of a certain age, you saw it happen with virtualization, and then more recently in the early cloud era. Now we’re watching it happen across streaming and database infrastructure simultaneously. The data tools your stack depends on are being absorbed into broader enterprise relationships, repositioned from best-of-breed choices into a bundled features.
To be clear, none of this is sinister. Consolidation makes business sense, and sometimes the engineering does genuinely get better with more resources behind it. But acquisition removes something that mattered even when a commercial vendor was building proprietary features on top of open source: the competitive pressure to keep customers able to leave. Before an acquisition, lock-in is a risk that a vendor manages very carefully. After one, it becomes a feature of the parent company’s platform strategy.
The mislabeled risk
When consolidation comes up in exec conversations, it usually lands as a procurement concern. Pricing leverage and contract terms matter, but they’re downstream of a more serious problem. What consolidation actually creates, over time and quietly, is what I’d call architectural debt. Devs talk about technical debt as shortcuts in code that need revisiting, but this is different. Architectural debt accumulates at the infrastructure layer, and it’s harder to see coming.
“What consolidation actually creates, over time and quietly, is what I’d call architectural debt.”
When a vendor integrates a tool like Apache Kafka into its proprietary cloud, it adds convenience layers. These can include custom APIs, purpose-built connectors, and security hooks that tie into their broader platform. Each one is reasonable in isolation, but taken together over 3-5 years, you end up with something that is no longer portable or standards-based infrastructure. You now have a custom implementation of that specific vendor’s platform, even if it was never designed to be.
This kind of debt won’t show up in code review or on your technical roadmap. It’ll reveal itself when you try to migrate or renegotiate, at which point the cost to unwind those dependencies can be staggering. Vendors, of course, understand this. A high switching cost is their moat, and it protects revenue regardless of whether they continue to innovate on the product you bought.
The dynamic exists in our market too, lest it seem like I’m picking on any one company. Any provider of managed services for data infrastructure that layers enough proprietary tooling on top of open-source technologies creates some version of this problem. The question for engineering leaders is whether the dependence is being built consciously or by default.
A cost your org chart can’t see
As architectural debt deepens, it pulls something else down with it.
In a neutral, open-source environment, your engineers learn the core technology. For example, they understand how Kafka works at the partition level, and how Cassandra handles consistency under load. It’s transferable knowledge that belongs to the engineer, not the platform.
Within a heavily managed, proprietary environment, though, teams stop learning streaming data and start learning Vendor X’s streaming service. With time, expertise in the underlying technology atrophies. The less capable a team is of operating the raw technology, the more dependent they are on the vendor’s managed layer, thereby further reducing exposure. The loop closes on itself. Your organization’s institutional knowledge is quietly migrating to your software provider’s support team. That risk rarely appears in any risk register, and it compounds over time.
What intentional neutrality actually looks like
This isn’t an argument against consolidation, or a case for every team running its own open-source infrastructure. Running Kafka at scale is genuinely hard, and managed services exist for good reasons.
What more leaders need to understand is that architectural neutrality used to come automatically with open source tools. That’s no longer true, and engineering leaders who treat it as a given are making a decision by not making one.
The distinction has real consequences. When your team is evaluating a managed data service, the question of whether the technology is good today is only part of the conversation. The more useful question is whether the integration points being built will still be portable in three years. One practical test is whether, if you had to leave this vendor in 18 months, your team could execute that migration without a multi-year effort. Significant hesitation is itself an answer.
Insisting on open standards over open APIs is part of this, as is the discipline to have your internal platforms define the interface to vendor tooling. If that relationship runs in the opposite direction, you’re building your architecture on someone else’s decisions.
A default, no more
The data stack will keep consolidating. That’s the nature of a maturing market, and engineering leaders will need to make decisions within that reality, not around it.
“Portability used to be an implicit feature of open source infrastructure. It isn’t anymore.”
Just understand there’s a difference between accepting consolidation as a fact and treating it as a design constraint you’ve actually thought through. Portability used to be an implicit feature of open source infrastructure. It isn’t anymore. The engineers and CTOs who act on that now will have options when the next round of acquisitions lands. The ones who don’t will be negotiating from a much smaller room.