Active metadata management: What is it, and why is it key to a modern data stack?
Active metadata management is a new type of data management in which active metadata is used to give clear and understandable information to support business decisions.
Gartner defines active metadata management as the “continuous analysis of all user, system, and infrastructure reports and data governance that enable alignment and exception cases between data and their actual experiences.”
Active metadata enables an intelligent, always-on, action-oriented data ecosystem. In short, active metadata allows you to make the most of a modern data stack.
Active metadata example
Let’s look at Netflix for a good, everyday illustration of the intelligent use of metadata.
When you log into Netflix, the algorithm shows you recommendations based on what you’ve already watched. It uses the metadata (film/series/documentary? Thriller/romance/action? Virginie Efira/Ben Affleck? Year of release?) associated with each piece of video content to assign it a compatibility score (from 0% to 99%) based on your profile.
The score assigned to each piece of content varies from user to user.
Instead of simply categorizing its content in a static way (by chronological order, for example), Netflix uses the same set of metadata, along with artificial intelligence, to activate and produce new information to keep each and every subscriber happy through a one-of-a-kind, personalized feed.
Active metadata management benefits
Now essential for describing and managing large volumes of data, active metadata management is the basis for modern governance and management of collected information.
There are five ways to use this process:
#1: Purging outdated or unused data
Active data management can be used to systematically determine the date of last use of a document or a batch of data and the number of people who have used it. This may come in the form of a spreadsheet, a database, an autogenerated dashboard, and so on. A resource is automatically archived if it has not been used in the last 60 days. And if no one has touched it in the last 90 or 120 days, it is purged completely.
#2: Allocating data processing resources dynamically
Let’s suppose that 90% of users log into a business intelligence (BI) tool during the final week of a fiscal quarter. Active Metadata Management can be used to automatically increase IT resources leading up to that week and then reduce them afterward.
#2: Enriching the user experience in BI tools
Instead of switching between a BI tool and a data catalog, Active Metadata Management can be used to bring context to dashboards. Relevant metadata (such as business terms, descriptions, owners, and history) can be integrated into the BI tool.
This means that when an end-user views a table, they can understand who it belongs to, where the data comes from, and more. This information can even be used as tags for automatically generated reports.
#4 Automatically classifying sensitive data for easier governance and compliance
Data can truly be democratized when users have visibility into all existing data. But this doesn’t mean that sensitive information should be compromised. Active Metadata Management allows you to automatically classify sensitive data, hide some of it, and make it visible only to authorized users.
Such a solution paves the way for automatic compliance with regulations by customizing access policies according to the company’s governance strategy.
#5 Identifying the most frequently used assets
Active metadata management can create a custom popularity score for each resource. This score can be based on usage information from sources such as query logs, data provenance and BI dashboards.
The most popular and relevant resources should then appear more frequently in search results and be checked regularly for data quality issues.
#6 Alerting downstream end-users to resolve issues quickly
There is nothing worse than a CEO sending you a screenshot of a dysfunctional dashboard before your data team has even noticed. Use active metadata management to be directly notified when a database is modified and when a potential anomaly is detected.
For example, when you crawl a database, you can instantly compare the differences between new and old metadata. If there is a discrepancy (for example, an extra or missing column), you can quickly trace it back to the end-user who made the change and then notify them of the error or correct it yourself.
The role of metadata in driving business
Metadata gives you the context to find the information you need more easily and use it more effectively. This explains why many data-driven companies have moved from a data management strategy to a metadata management strategy that offers much broader and more precise data analysis possibilities.
There are two types of metadata:
Type #1: Passive metadata
Passive metadata is purely technical. This is the basic information about the data, such as the data profile or the data’s operational features (who accesses what, how often, etc.).
Passive metadata remains static and is ultimately not very helpful in providing much visibility into your data pipeline or allowing you to organize your data catalog in a meaningful way.
Type #2: Active metadata
Active metadata allows data to flow quickly and easily across all levels of the IS by introducing even richer contextual elements at all levels of the data stack. Active metadata is generally more complex than passive metadata, as it spans operational, business, and social metadata, as well as basic technical metadata.
When you use active metadata, you’ll have a better understanding of where your information is going in your data stack and how it is being used.
Active metadata makes your data more meaningful, allowing you to spotlight it (through data storytelling) to make the best possible decisions.
Turning metadata into new, actionable knowledge
Combining active metadata with passive metadata will allow you to tell and reveal the story behind your information to go beyond its static profile.
Think of it as a dynamic metadata management mode that shows how and where data flows in a data infrastructure, including all modifications, data transformations, and calculations made up to that point.
The advent of modern data stacks has enabled the generation of business, operational, and social metadata. Today, thanks to artificial intelligence (AI) and machine learning algorithms, it is possible to automatically list, tag, classify, and inform the origin of data—a process known as data lineage.
You’ll be able to use this information to uncover new patterns and identify blind spots in your data stack to fix them before they become a potential problem for your organization.
Active metadata management for the data-driven enterprise
The best way to implement an active metadata management strategy in your organization is to deploy a data catalog and ensure that it is well integrated with your data management processes.
A data catalog is an organized directory of all your data. It uses this information as fuel to help the data team collect, organize, access, and enrich metadata.