Data Mesh: Understanding Decentralized Domain Ownership
Data and analytics professionals understand that semantics matter. We have semantic layers, semantic models, and semantic analytics. We know that communicating meaning effectively requires a shared understanding of words, and phrases. If the words we use can have multiple meanings depending on the context, then misunderstanding can occur.
Data Mesh Definition of Domain
One of the four core principles of data mesh is decentralized domain ownership. The word “domain” in the context of data management typically refers to customers, products, suppliers, employees, and other domains of business entities. In the context of data governance “domain” can refer to both business entities, as well as policy domains like ESG (environmental, social, and governance), privacy, and financial reporting regulations.
The data mesh meaning of domain, comes from the software practice of domain-driven design. In this context, domain refers to business capabilities and the activities and entities they contain. The domain has a specific business function and outcomes that it is optimizing for. Some examples include.
Marketing and Sales:
- Customer segmentation and targeting
- Digital marketing campaigns and promotions
- Sales forecasting and analysis
- Product catalog management
- Product pricing and discounting
- Product recommendations and personalization
- Order processing and tracking
- Inventory management and replenishment
- Order routing and fulfillment
- Customer inquiries and support
- Returns and refunds management
- Complaints and feedback management
Payments and Billing:
- Payment processing and fraud detection
- Billing and invoicing management
- Subscription management
Logistics and Shipping:
- Carrier and shipping method selection
- Shipping cost calculation and optimization
- Customs and regulatory compliance
This difference in domain meaning in the context of data mesh has caused some confusion among data and analytics professionals. Hopefully, the explanation above clears up the misunderstanding. On the positive side, in data mesh terminology, business entities – customers, products, suppliers, etc – have the same meaning as they do in data management and governance terminology. So, a simple reorientation of phasing from domain to business entities can improve communication and understanding. Not that we have a high-level definition of domain in data mesh, let’s have a closer look at domain boundaries and entity relationships.
Domain Boundaries and Cross-Domain Entity Relationships
Data mesh uses the term “bounded context,” which also comes from the software practice of domain-driven design. It is simply the boundary where a particular domain model applies. If we use the Order Fulfillment domain as an example, the core domain function is focused on optimizing In-Full and On-Time Order rates. There are two domain subprocesses, Pick and Pack and Delivery required to execute order fulfillment. By mapping business entity attributes to the subprocesses and other domains that share those attributes, we might get a relationship diagram that looks something like the following.
Order Fulfillment Domain Example
We see that the Order Fulfillment domain requires business entities, like customers, products, third-party shippers, and vehicles, that are also used in other domains.
- Customer Management shares account number and ships to the address.
- Order Management shares purchase order number, products, and quantity.
- Warehouse Management shares product/Item location, quantity on hand, and packing instructions.
- Fleet Management shares vehicle availability and route tracking.
- Third-Party Logistics Management shares region, mode of transportation, and customs/duty requirements.
And that data is used to create a “Data Product” for monitoring and managing On-Time and In-Full Delivery rates. Data product or data as a product is another principle of data mesh. The easiest way to think about it is an analytics component that encapsulates all the functionality required to solve an analytics problem. This could be data pipelines, curated data sets, machine learning algorithms, and visualizations.
However, the attributes representing those entities differ based on domain needs. For example, the Fleet Management domain will need to represent many physical characteristics of vehicles, such as their maintenance history, mileage, age, model number, performance characteristics, and so on. But when it’s time to schedule a delivery, the Order Fulfillment domain only needs to know whether a vehicle is available and the estimated arrival time for pickup and delivery.
Decentralized Domain Ownership
Trying to model data and business logic gets progressively more complex as the modeling scope expands to more business domain areas. Part of the problem is that different domains use different vocabularies with different contexts and attributes for business entities. It would be unnecessarily complex if we tried to create a single domain model for Order Fulfillment, Customer Management, Order Management, Warehouse Management, Fleet Management, and Third-Party Logistics Management. As well as being impractical to design and implement, it would also be difficult to manage over time, because any changes must account for multiple domain needs.
With data mesh, responsibility and accountability for data modeling, management, and governance are distributed to domain teams that best understand the business needs and context. The model for a domain only needs to include the necessary business capabilities and activities for the domain. And each model only contains the relevant business entities and attributes within the domain context.
The word “decentralized” in the phrase decentralized domain ownership is another word that is causing misunderstandings as some people are interpreting it as the domain teams have authority to do whatever they want for their domain without any consideration for what other domain teams are doing. While there is local domain autonomy in data mesh, domain ownership includes ensuring interoperability across domains. This is done through another core principle of data mesh federated computational governance.
Best Practices for Domain Design and Management
One of the biggest challenges of data mesh is to design the boundaries of individual domains. The general rule is that a domain should be designed around one business capability — but putting that rule into practice requires careful thought.
- Start by analyzing the analytical requirements for the business domain. What are the business metrics or business outcomes you are trying to optimize?
- Next, define the required entities and attributes, as well as aggregates and hierarchy needs. This will enable you to create the bounded context for the domain model.
- Then map the connections to other domains to create a relationship graph of shared entities and attributes.
During this phase, don’t get overly concerned with technologies or implementation details. Just note the places where domains will need to interoperate. Remembering this is an iterative, ongoing process is also important. Domain boundaries aren’t fixed in stone. As your business evolves, you may expand or shrink the bounded context of domains.
Another major hurdle is managing the decentralized domain ownership model. Again, decentralized doesn’t mean domain teams have the authority to do whatever they want.
- Start by defining clear boundaries of what the central governance team and the individual domain teams are accountable for.
- Document processes and workflow that will be used to coordinate activities between teams.
- Create a communication process to provide greater visibility across teams and trust between teams.
The goal is to create an effective and efficient operating model that increases quality, consistency and interoperability of data products created across domain teams.
The whole premise of data mesh is to help organizations make business decisions faster by shifting responsibility and accountability for data and analytics to the domain teams that best understand the business situation and context. Well-designed domain boundaries and decentralized operating models enable speed and flexibility at the domain level, and interoperability and reusability at the global level.