How to model a glossary?

Sep 30, 2021 | Data Governance, Data Catalog


Which data are available on our suppliers?

What’s the difference between a lead and a customer?

Is the formula of this indicator correct?

If, like me, you’ve heard this type of question more than often, chances are you might be considering building your data glossary! You might even asked yourself “I’m hearing everywhere people talk about the importance of data. If that’s so, why don’t I still have a referential to understand them all?”

During my first attempts to create a glossary, I was faced with many challenges : listing and defining the objects I wanted to map, giving them a context… But the most complicated challenge was how to represent this data in a way that was an abstraction of the technical reality but made sense to the users?

This article is intended for those who wish to start modelling a glossary : it aims to list and compare the advantages and disadvantages of 2 glossary modelling methods.

The vertical data approach

If the first question at the beginning of this article sounds familiar, this approach may be right for you! 

This involves defining the major objects managed by the organisation : Employees, suppliers, items. To these objects, we will link different items hierarchically. For example, business terms such as : email, phone, name… In this way, we can create a list of all the elements we have on the large objects and describe them in the granularity requested by our use case. 

To get a consolidated view, business terms do not necessarily have to be derived from a single system: we can mix information from different sources and even link business terms to different sources. Again, your use case is the only driver of your modelling.

Let’s take the example of the supplier: you may have a large amount of information about them in different systems: the legal form and type of company in your ERP,accountant’s name and mail in your treasury system, the delivery point in your WMS and TMS… The best way is to create clusters under the large object in order to be able to organize all this information: data related to the company itself (type of company, legal information…), sales, accounting. It might look like this


This approach can have different levels of complexity. For example : should I create as many “name” business terms? this might lead to a complex modeling, hard to understand. Or should I create levels of specificity? 

The biggest advantage of this approach is that you consider your data without taking into consideration 

  • Management complexities : many systems can manage these data.
  • Multiple organisational domains can work on these data as producer, consomator… etc…

Nevertheless, two limits can be listed for this approach  

  • Related object management : how can we describe the link between different concepts, different objects ? Let’s say the link between a lead and a customer? 
  • Change management : people are used to represent their data into silos. Breaking this habit can be hard ! 

Urbanization lead approach

This approach is based on the urbanization guidelines built by data architects to model data.

4 main layers can be described :

  • The business layer dealing with the business process of the organisation 
  • The functional layer which represents the IS from a functional perspective
  • The application layer which represents the various software bricks making up the IS service layer
  • The technical view : the technical elements required to make the application layer work but also allow data exchange. It’s the dictionary of databases, the catalog of processings… 


In this approach, your Use Case is still the key for the granularity required : you can choose to implement all the layers described or some. To get started, the functional and technical layer seems enough to me. With the functional layer we will be able to describe the data from a business point of view.

Zones can be used to describes clusters : 

  • Activities zones : operations, support, steering… This zones can be divided into smaller ones : (descending view) neighborhood, block, data
  • Referential : for stable and transverse data: master data, nomenclature…. They can be used by the different activities zones



I can see two main benefits of this approach : 

  • The data mapping sticks to a functional point of view while highlighting the mains data silos.
  • A easier data mapping and limited change management as we don’t challenge traditional way of modeling data

Thus being said, I think this approach might suffer from this last point. People used to this kind of representation tend to neglect the difficulty of understanding it for newcomers. You can have a good representation from an architect’s point of view but not for business people or governance evangelists. 


The glossary is one of the main entry points to data for many users. That’s why modeling is a crucial stake for your cataloging and governance approach. 

The glossary must be 

  • Easy to understand
  • Comprehensive yet not complex
  • Be sufficiently autonomous from your dictionary so that it can bring more added value to end users. 

Both approaches explained in this article allow a quick appropriation and a fairly accurate understanding by companies. These are two of the main goals of our users when modelling data. Depending on your needs, other elements can be taken into consideration for your modelling, for example :   

  • Maintenance : Is it easy to make this model evolve? 
  • Description : What elements do I need to describe my objects? A simple text description is a good start but it quickly reach its limits: analysis, evolution… without taking into account the limits in terms of information harmonization

Do not underestimate the context of your glossary: as we mentioned at the beginning of this article, the glossary must be linked to its environment, both business and technical. The main risk is to have a purely theoretical vision that cannot meet the needs of the business and cannot support changes. You need to have

  • Internal with other glossary objects so that you can create and manage relations 
  • Transversal links with other objects, outside of the glossary : where the data is implemented, if it’s linked to a business process, used in dashboards… 


To conclude, I have a preference for the Urbanization lead approach: I think it’s more comprehensive and can be integrated in a broader approach: where is the data implemented? how is it transformed? how do we create value from it? 

But it’s this comprehensiveness and the links with enterprise architecture that makes it rigid. and hard to escape from a preconstructed point of view… often the same that leads us to create a glossary… Taking this into account the vertical data approach can be challenging to create and be accepted but can also challenge, for the good, the way you represent your data and interact on it.