What you'll take away
Zhamak Dehghani's 2019 data mesh paper identified a real problem and proposed a compelling solution. The centralised data team model — a single team of data engineers responsible for building and maintaining all data pipelines across the organisation — does not scale. As the organisation grows, the central team becomes a bottleneck: every domain that needs a new data product queues behind every other domain, and the central team spends most of its time servicing requests rather than building the data platform that would make those requests unnecessary.
The data mesh model addresses this by distributing data ownership to the domains that produce the data, establishing a federated governance model rather than a central one, and treating data as a product — something designed to be consumed, with SLAs, documentation, and a clear owner accountable for its quality.
The architecture is conceptually right. The implementation is where most organisations stumble.
Why Centralised Data Teams Stop Scaling
The centralised data team model has a structural ceiling. When there are three product domains and five data engineers, the central team can serve all of them effectively. When there are fifteen product domains and fifteen data engineers, the queue is permanent. Every domain waits weeks for a pipeline change that their own engineers could build in a day — if they had the platform access and the data infrastructure knowledge.
The pathologies are predictable: data engineers become experts in the organisational politics of prioritisation rather than in data engineering; domain teams build shadow data capabilities using spreadsheets, direct database queries, and CSV exports because working around the central team is faster than waiting for it; and the central team's backlog is permanently two to three times longer than its capacity to deliver.
The Four Data Mesh Principles
1. Domain Ownership
Data is owned and produced by the domain that creates it. The customer domain team owns customer data. The orders domain team owns order data. The product domain team owns product data. Each domain is responsible for building, maintaining, and guaranteeing the quality of the data assets it exposes to other domains — rather than handing raw data to a central team and delegating responsibility for quality and availability.
2. Data as a Product
Domain teams do not simply expose their data in whatever format is convenient for them. They treat data as a product: designed for consumption by other domains, documented for discoverability, tested for quality, versioned for stability, and served through a consistent interface. A data product has an owner, SLAs, a schema contract, and a consumer feedback mechanism — the same attributes that make a software API trustworthy.
3. Self-Serve Data Infrastructure
Domain teams cannot own data products if they need a central data team to build and operate the underlying infrastructure. The data mesh model requires a self-serve data platform — a set of tools, templates, and infrastructure abstractions that allow domain teams to build, test, deploy, and monitor data products without deep data engineering expertise. This is the most technically demanding component of data mesh: building a platform sophisticated enough that domain teams with software engineering skills but limited data infrastructure knowledge can use it effectively.
4. Federated Computational Governance
Distributing data ownership does not mean distributing data governance. Federated governance establishes the standards, policies, and contracts that every data product must comply with — data quality SLAs, access control models, privacy and compliance requirements, schema versioning standards — and enforces them computationally (through the platform) rather than organisationally (through a committee). The platform makes compliance the path of least resistance rather than a bureaucratic overhead.
Thoughtworks' 2024 Technology Radar assessed data mesh as 'Adopt' for large organisations with multiple data domains — while noting that successful implementations require significant investment in self-serve data infrastructure and organisational change management, and that premature adoption in smaller organisations creates unnecessary complexity without solving the problem data mesh is designed for.
Is your data stack slowing down your AI?
48-hour turnaround. No obligation.
The Three Implementation Failure Modes
1. Distributed Without a Platform
The most common data mesh failure is distributing data ownership to domain teams without building the self-serve platform that enables them to execute. Domain teams that own data but lack the infrastructure abstractions to build data products efficiently will build them inconsistently — different tools, different testing approaches, different documentation formats, no cross-domain discoverability. The result is a distributed mess rather than a distributed mesh.
2. Governance Without Teeth
Federated governance that is documented in a standards document but not enforced by the platform is aspirational rather than operational. Domain teams that are responsible for data quality but have no automated quality gate preventing them from publishing a data product that fails quality standards will, under delivery pressure, skip the quality work. Governance must be computational — enforced by the platform at publication time — not organisational.
3. Applying Data Mesh to the Wrong Organisation
Data mesh solves a specific scaling problem: the centralised data team bottleneck that emerges when an organisation has multiple distinct data-producing domains with different roadmaps and different consumers. It is not the right architecture for an organisation with a single product domain, a small data team, and straightforward reporting requirements. Applying data mesh to an organisation that does not have the scaling problem data mesh solves adds complexity without adding value — and this misapplication is responsible for a significant proportion of the failed implementations reported in practitioner communities.
A Realistic Adoption Path
- 1Start with domain ownership for one high-value domain: identify the domain with the most critical data assets and the most friction with the central team. Establish domain ownership there first, build the data product patterns, and generate the evidence base for broader adoption.
- 2Build the self-serve platform incrementally: start with templates and tooling that reduce the effort for domain teams to build, test, and document data products. The platform does not need to be complete before the first domain adopts ownership — it needs to be good enough to make domain ownership easier than waiting for the central team.
- 3Establish data contracts before data products: data contracts — formal agreements between producers and consumers about schema, quality SLAs, and change notification processes — are the foundation of the data product model and can be established independently of the broader architectural transformation.
- 4Measure the bottleneck before you solve it: if you cannot demonstrate that the central data team is actually a bottleneck — with data on queue time, request volume, and domain team satisfaction — you do not have the evidence base to justify the organisational and technical investment that data mesh requires.
GYSP's Data Engineering & Analytics practice has worked with organisations at both ends of the data mesh readiness spectrum — from those building the self-serve platform foundation to those piloting domain ownership for a single high-value domain. The most successful implementations start with the problem, not the architecture.
“Data mesh is not a technology decision. It is an organisational decision that requires technology to work. Organisations that treat it as a data architecture project and skip the organisational design work are building a technically sound foundation for a governance system that nobody will actually use.”
— Ankush, Chief Technology Officer — GYSP.tech
