What is a Data Mesh? History and Overview
Data Mesh is a sociotechnical approach to data architecture in which independent domain teams hold and maintain responsibility for managing their own data. With the transformation of raw data into highly relevant analytical models by local teams, Data Mesh eliminates large, centralized repositories of data and the complex pipelines connecting it to BI (business intelligence) users.
Data Mesh: History and background
Initially conceived in 2019 by Zhamak Dehghani, a software engineer and industry consultant, Dehghani sought to enable organizations to manage ever-growing data stores and extract greater value from them.
In Data Mesh design, each local domain’s Product Team, comprised of data producers, consumers, and IT, is responsible for transforming its data into interoperable data models, known as a Data Product. An incentive to make their data as discoverable, usable, and reliable as possible is created by casting users of their Data Product as its customers. These Product Teams now control not only their own data and data models, but the operational systems required to produce and share them.
These Data Products, created under a federated governance system, are quickly and easily utilized by analytics generalists, specialists, and other BI units, across an organization.
Data analytics are applied locally by each responsible Product Team domain team with the highest familiarity with the data. Developing these Data Products for their customers requires the domain team to ensure that their models possess distinct and prescribed capabilities and that their users are delighted with the Product. An incentive to make their data as discoverable, usable, and reliable as possible is created by casting users of their Data Product as its customers. This symbiotic relationship dramatically reduces lead times for use and experimentation with the data.
Data Mesh: 4 Key Principles
The structural elements of Data Mesh are founded on four fundamental principles.
Domain Ownership of Data
In a Data Mesh, data is decentralized and maintained by the individual domain Product Teams. The domain teams are responsible for producing analytical data models with their data and for the quality, usability, and curation of those models. They are also accountable for important KPIs regarding the quantitative use of the models they create and the level of satisfaction that BI units experience when using them.
Unlike centralized IT architectures, Product Teams are embedded within and are dedicated to the individual domains. Producing their Data Product, however, requires understanding how the data might be accessed and used in other areas of the organization.
Data Product Development
When designing Data Products, Product Teams must consider how their customers will use them. How will the data be used? What tools will customers want to use in accessing or consuming the Data Product?
This approach requires Product Teams to incorporate Product and Data Engineering to prepare their data to meet customer expectations. Data Product is an aggregation of the code that transforms, serves, and shares the data and the data itself. As a result, the Product Teams deliver higher-level abstractions to benefit its users across the organization.
These multimodal Data Products enjoy easy integration with any other Data Product, which leads to the Data Mesh effect. This process data architecture design facilitates reduced lead time for a data user to find, understand, and experiment with the required data from one, or many interconnected Data Products, leading to richer analytical insights.
Focusing on the user’s experience, presumably, but not limited to, data analysts or data scientists, Data Mesh marks a fundamental shift in data architecture theory.
In a Data Mesh architecture, focusing on producing relevant Data Products in prescribed formats engenders a self-service approach for data users. They can access and utilize relevant Data Products quickly and efficiently whenever and from wherever necessary, using their native tools and processes. Furthermore, they can confidently expect that the underlying data is trustworthy, interoperable, and secure.
Collaborative Data Governance
With each domain’s Product Team owning its data, creating its Data Product models, and being responsible for sharing them, strong governance standards are required.
A federated governance team with representation from each domain and other invested parties works collaboratively to design blueprints that all Data Product models and their users must follow. Establishing these standards ensures data interoperability, security, and compliance across the platform.
When using a Data Mesh approach, data is decentralized and owned by the Product Teams – those that best understand their data. Combining this decentralization of data with strong, federated governance, entirely new domains with their Product Teams and Data Models can be added at any time. This critical scalability allows organizations to expand their operations quickly, efficiently, and seamlessly as business needs rapidly expand or change.
The Role of a Data Catalog in the Data Mesh
Integrated into the governance plane of Data Mesh is the Data Catalog. The Data Catalog provides access to all the Data Products produced by independent domain Product Teams. Supported and enabled by the organization’s collaboratively designed governance, the Data Catalog provides a consumer-friendly platform to access data.
Solutions like DataGalaxy make this possible. Our Data Catalog offers your data knowledge assets on a powerful and user-friendly platform. Companies can become data-centric, data-driven, and readily scalable through curation, classification, governance, and knowledge crowd-sourcing.
The Core Benefits of Data Mesh
There are several key benefits of a Data Mesh data architecture design.
Better data for more agile decision-making. With independent domain teams creating high-quality Data Models, BI units can freely access data faster and make nimble, better-informed decisions.
Improved quality control. Data governance guidelines are designed collaboratively by all domain Product teams and business units. These policies enable teams to produce and deliver high-quality data in an easy-to-access, standardized fashion.
Cross-functional collaboration. Data mesh puts local domain experts and product owners in closer contact and cooperation with the teams they serve. With ever more eyes on the data, all domains and users are incentivized to explore every possible data-use case. With clear and concise governance established and employed across an organization, scalability becomes faster and simple, leading to greater competitiveness and nimbler operations.
Reduced Data logjams. In a decentralized data architecture, demand for IT services is distributed across individual domains. Data teams become agile, independent, and incentivized to produce data efficiently and provide ready access.
Why Self-Service Data Is Important
Self-service data and analytics offer compelling advantages for an organization.
BI users can access required data and analytics whenever and wherever needed. This immediate access avoids untimely delays associated with centralized repositories, software integration, and access approvals. Self-service data users also achieve more reliable insights from clean and accurate data.
Since domains know their data best, creating Data Product models offers efficiency and quality at scale. Instead of multiple BI units formatting and verifying the same raw data, this process is performed once for the benefit of everyone.
Data Mesh vs. Data Fabric
Data Mesh is primarily an organizational approach to data architecture. It is a domain-centric and decentralized approach that allows business units to easily access, understand, and utilize their organization’s data. In many organizations, this speed can prove critical in competitive environments.
Data fabric is a technical design that places an integrated data layer over centralized data. Using Data Fabric’s network-based architecture, this approach allows an organization to create a layer of abstraction over its underlying data components.
Why use a Data Mesh?
Organizations using a Data Mesh architecture enjoy fast and reliable data-driven decision-making. Local domain teams produce data models in a prescribed fashion and build them to enable and satisfy BI users. Reducing common barriers to access and use of data that centralized data architectures present results in greater efficiency and innovation.
Under a collaborative data governance system, quality, interoperability, and security are each enhanced. Additionally, responding to changes in compliance dictates can be made more seamlessly across the organization.
A Data Mesh architecture eliminates over-tasking centralized data teams by retaining data within its functional domains. Organizations can perform cross-domain analysis, extract more data, quickly scale operations, and maximize business use of information.
Data Mesh represents the next generation of data architecture strategy. Implementing a Data Mesh organizational approach may not make sense for every organization, but its benefits are clear. Designing a more collaborative approach to data and analysis while eliminating complex pipelines enables agility and innovation that Data Warehouses and Data Lakes can not.
Replacing a monolithic, highly centralized approach to data allows an organization to unburden itself from costly duplications of effort in processing and analyzing data.
Creating a symbiotic relationship between domain teams producing quality Data Products and highly satisfied BI units consuming the data leads to increased productivity, improved data flow, and swifter data consumption.
And with a robust and collaborative data governance system in place, Data Mesh ensures that the entire organization monitors and maintains data compliance, security, and interoperability.