Data mesh history & background
Initially conceived in 2019 by Zhamak Dehghani, a software engineer and industry consultant, Dehghani sought to enable organizations to manage ever-growing data stores and extract greater value from them. In data mesh design, each local domain’s product team, comprised of data producers, consumers, and IT, is responsible for transforming its data into interoperable data models, known as a data product. An incentive to make their data as discoverable, usable, and reliable as possible is created by casting users of their Data product as its customers. These product teams now control not only their own data and data models but also the operational systems required to produce and share them. These data products, created under a federated governance system, are quickly and easily utilized by analytics generalists, specialists, and other business intelligence units across an organization. Data analytics are applied locally by each responsible product Team domain team with the highest familiarity with the data. Developing these data products for their customers requires the domain team to ensure that their models possess distinct and prescribed capabilities and that their users are delighted with the product. An incentive to make their data as discoverable, usable, and reliable as possible is created by casting users of their Data product as its customers. This symbiotic relationship dramatically reduces lead times for use and experimentation with the data.Data mesh: Four key principles
The structural elements of data mesh are founded on four fundamental principles.Domain ownership of data
In a data mesh, data is decentralized and maintained by the individual domain product teams. The domain teams are responsible for producing analytical data models with their data and for the quality, usability, and curation of those models. They are also accountable for important KPIs regarding the quantitative use of the models they create and the level of satisfaction that BI units experience when using them. Unlike centralized IT architectures, product Teams are embedded within and are dedicated to the individual domains. Producing their data product, however, requires understanding how the data might be accessed and used in other areas of the organization.Data product development
When designing data products, product teams must consider how their customers will use them. How will the data be used? What tools will customers want to use in accessing or consuming the data product? This approach requires product teams to incorporate product and data engineering to prepare their data to meet customer expectations. Data product is an aggregation of the code that transforms, serves, and shares the data and the data itself. As a result, the product teams deliver higher-level abstractions to benefit its users across the organization. These multimodal data products enjoy easy integration with any other data product, which leads to the data mesh effect. This process data architecture design facilitates reduced lead time for a data user to find, understand, and experiment with the required data from one, or many interconnected data products, leading to richer analytical insights. Focusing on the user's experience, presumably, but not limited to, data analysts or data scientists, data mesh marks a fundamental shift in data architecture theory.Self-service data
In a data mesh architecture, focusing on producing relevant data products in prescribed formats engenders a self-service approach for data users. They can access and utilize relevant data products quickly and efficiently whenever and from wherever necessary, using their native tools and processes. Furthermore, they can confidently expect that the underlying data is trustworthy, interoperable, and secure.Collaborative data governance
With each domain’s product team owning its data, creating its data product models, and being responsible for sharing them, strong governance standards are required. A federated governance team with representation from each domain and other invested parties works collaboratively to design blueprints that all data product models and their users must follow. Establishing these standards ensures data interoperability, security, and compliance across the platform. When using a data mesh approach, data is decentralized and owned by the product teams - those that best understand their data. Combining this decentralization of data with strong, federated governance, entirely new domains with their product teams and data models can be added at any time. This critical scalability allows organizations to expand their operations quickly, efficiently, and seamlessly as business needs rapidly expand or change.The role of a data catalog in data mesh
Integrated into the governance plane of data mesh is the data catalog. The data catalog provides access to all the data products produced by independent domain product teams. Supported and enabled by the organization’s collaboratively designed governance, the data catalog provides a consumer-friendly platform to access data. Solutions like DataGalaxy make this possible. Our Data Knowledge Catalog offers your data knowledge assets on a powerful and user-friendly platform. Companies can become data-centric, data-driven, and readily scalable through curation, classification, governance, and knowledge crowd-sourcing.The core benefits of data mesh
There are several key benefits of a data mesh data architecture design.- Better data for more agile decision-making: With independent domain teams creating high-quality Data Models, BI units can freely access data faster and make nimble, better-informed decisions.
- Improved quality control: Data governance guidelines are designed collaboratively by all domain product teams and business units. These policies enable teams to produce and deliver high-quality data in an easy-to-access, standardized fashion.
- Cross-functional collaboration: Data mesh puts local domain experts and product owners in closer contact and cooperation with the teams they serve. With ever more eyes on the data, all domains and users are incentivized to explore every possible data-use case. With clear and concise governance established and employed across an organization, scalability becomes faster and simpler, leading to greater competitiveness and nimbler operations.
- Reduced data jams: In a decentralized data architecture, demand for IT services is distributed across individual domains. Data teams become agile, independent, and incentivized to produce data efficiently and provide ready access.