Data terms

Explore foundational concepts in data governance, quality, architecture, and strategy, essential for building a trusted data ecosystem.

Metadata & cataloging

Understand how data is classified, described, and connected to improve discoverability, lineage, and governance across your ecosystem.

Active Metadata Management
Active metadata management goes beyond passive cataloging by continuously collecting, analyzing, and pushing metadata into workflows, automation tools, and data products — enabling real-time decision-making and system interoperability.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/active-metadata-management/
Asset Inventory

An asset inventory is a complete, centralized listing of all data assets — tables, reports, dashboards, pipelines, etc. It supports governance, discovery, and cataloging initiatives.
Business Glossary
A business glossary is a centralized collection of standardized definitions for key business terms and concepts. It ensures all teams speak the same language and reduces ambiguity across reports, metrics, and data usage.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/how-to-model-a-glossary/
  - https://www.datagalaxy.com/en/blog/data-catalog-vs-glossary-dictionary/
Data Catalog
A data catalog is an organized inventory of data assets that helps users find, understand, and trust data. It includes metadata, lineage, and business context to break down silos, boost collaboration, and support faster, smarter decisions.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/organizing-your-data-with-data-catalog/
  - https://www.datagalaxy.com/en/blog/what-is-a-data-catalog/
- Want to see a modern data catalog in action?
Data Classification

Data classification is the process of organizing data into categories based on its sensitivity, value, or regulatory requirements (e.g., public, internal, confidential). It’s critical for data security, compliance, and lifecycle management.
Data Contract

A data contract is a formal agreement between data producers and consumers that defines expectations around data structure, quality, and delivery. It helps reduce breakages and miscommunication in modern data pipelines.
Data Dictionary
A data dictionary provides detailed metadata for each field in a dataset — including definitions, data types, allowed values, and descriptions. It complements the business glossary.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/data-catalog-vs-glossary-dictionary/
Data Discovery
Data discovery is the process of finding, exploring, and understanding data assets across an organization. It enables faster analysis, better governance, and improved trust in data.
- Want your teams to find and request data as easily as shopping online?
- Want to see a modern data catalog in action?
Data Ownership

Data ownership defines who is accountable for the quality, usage, and security of a specific dataset or asset. Clear ownership ensures data is trusted, maintained, and aligned with business goals.
Data Processing

Data processing includes all steps involved in collecting, transforming, validating, and storing data. It spans ETL/ELT workflows, pipeline orchestration, and real-time or batch execution.
Data Versioning

Data versioning tracks changes to datasets over time — enabling rollback, reproducibility, and auditability. It’s especially valuable for analytics, machine learning, and collaborative environments.
Diagrams

Diagrams are visual representations of data flows, schemas, lineage, or relationships. They help teams understand complex systems quickly and collaborate more effectively
Enterprise Metadata Management
The strategic approach and systems used by enterprises to collect, manage, and use metadata across business units, tools, and platforms — supporting governance, analytics, compliance, and AI readiness.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/guide-to-enterprise-metadata-management/
Golden Record

A golden record is the most accurate, complete, and trusted version of a data entity (like a customer or product). It resolves duplicates and inconsistencies across multiple sources.
Knowledge Graph

A knowledge graph represents data as entities and relationships — connecting concepts in a network. It enables semantic search, AI readiness, and deeper business context.
Metadata
- Metadata is data about data — it describes properties such as source, owner, format, lineage, and usage. Metadata is the foundation of modern data governance and discovery.
  - ? Want to go deeper? Check out:
    - https://www.datagalaxy.com/en/blog/what-is-metadata-management/
    - https://www.datagalaxy.com/en/blog/top-reasons-why-metadata-matters-for-businesses/
  - Want to see a modern data catalog in action?
Metadata Management Solutions
- Software tools and platforms that help catalog, organize, monitor, and use metadata effectively across systems — enabling lineage, quality tracking, impact analysis, and AI readiness.
  - ? Want to go deeper? Check out:
    - https://www.datagalaxy.com/en/blog/metadata-management-best-practices/
Ontology

An ontology defines a shared vocabulary and relationships between concepts in a domain. It supports semantic consistency and advanced reasoning, especially in data and AI contexts.
Schema
- A schema defines the structure of a dataset — including fields, data types, and relationships. Schemas are essential for validation, transformation, and understanding of data assets.
  - Want to see a modern data catalog in action?
Semantic Layer
The semantic layer sits between raw data and users — translating technical structures into consistent business terms and metrics. It enables clarity, reuse, and self-service analytics.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/utilizing-the-semantic-layer/
Taxonomy

A taxonomy is a hierarchical classification of concepts, often used to group related terms or topics. It helps organize content, standardize language, and improve searchability.
Technical Metadata
Technical metadata refers to system-level details about data assets, such as schema, file size, storage location, lineage, and refresh frequency. It enables observability and root cause analysis.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/what-is-metadata-management/
  - https://www.datagalaxy.com/en/blog/guide-to-enterprise-metadata-management/
- Want to see a modern data catalog in action?

Governance & compliance

Explore key terms that help ensure data is secure, trusted, and compliant with internal policies and external regulations.

Access control

Access control refers to the mechanisms used to regulate who can view or use resources in a system — based on roles, groups, or contexts. It’s essential for privacy, security, and compliance.
BCBS 239 (Basel Committee on Banking Supervision – Principle 239)

BCBS 239 is a set of principles issued by the Basel Committee to improve risk data aggregation and reporting in banks. Applicable to systemically important financial institutions, it aims to enhance governance, data architecture, accuracy, and timeliness of risk reporting for better decision-making and regulatory compliance.
Compliance Framework
A compliance framework is a structured set of controls, policies, and processes that help organizations meet legal, regulatory, and ethical standards (e.g., HIPAA, GDPR, SOX, ISO 27001).
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/data-governance-for-new-eu-ai-act-compliance/
- Curious how we turn governance into real-world impact?
CPRA (California Privacy Rights Act)

The CPRA is a California state law that expands and amends the CCPA (California Consumer Privacy Act). Effective from January 2023, it enhances privacy rights for California residents, including the right to correct personal data, limit its use, and opt out of automated decision-making.
Data Access Policy
A data access policy defines who can view, edit, or manage specific datasets within an organization. It ensures the right people access the right data — and only that data — based on role, context, or compliance needs.
- Curious how we turn governance into real-world impact?
Data Audit

A data audit is a structured review of how data is collected, processed, accessed, and governed. It helps identify gaps, ensure compliance, and improve data quality and accountability.
Data Governance
Data governance ensures data is accurate, secure, and responsibly used by defining rules, roles, and processes. It includes setting policies, assigning ownership, and establishing standards for managing data throughout its lifecycle.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/what-is-data-governance/
  - https://www.datagalaxy.com/en/blog/data-governance-framework-principles/
Data Policy

A data policy is a formal set of rules and guidelines that govern how data is managed, used, protected, and shared across an organization. It often includes standards around classification, retention, access, and compliance.
Data Security

Data security refers to the practices, tools, and policies used to protect digital information from unauthorized access, corruption, or theft. It encompasses encryption, access controls, threat detection, and compliance with regulations to ensure the confidentiality, integrity, and availability of data.
FISMA (Federal Information Security Management Act)

FISMA is a U.S. federal law enacted in 2002 (updated in 2014 as the Federal Information Security Modernization Act) that mandates government agencies and their contractors to implement comprehensive information security programs. It aims to protect federal data and systems from cyber threats through risk management, continuous monitoring, and compliance with NIST standards.
GDPR (General Data Protection Regulation)

The GDPR (General Data Protection Regulation) is a European Union regulation that governs the collection, processing, storage, and sharing of personal data. Enforced since May 2018, it aims to protect the privacy and rights of individuals within the EU and imposes strict requirements on organizations that handle EU residents’ personal data, including transparency, user consent, data minimization, and breach notification.
GDPR / CCPA / Data Compliance
These are regulatory frameworks (like GDPR in the EU and CCPA in California) that define how personal data must be collected, stored, and handled. Data compliance ensures that your practices align with legal obligations to avoid penalties and protect user trust.
- Curious how we turn governance into real-world impact?
HIPAA (Health Insurance Portability and Accountability Act)

HIPAA is a U.S. federal law enacted in 1996 that establishes national standards for protecting sensitive patient health information. It applies to healthcare providers, insurers, and their business associates, requiring safeguards for data privacy, security, and breach notification.
PII (Personally Identifiable Information)

PII refers to data that can identify an individual — such as name, email, ID number, or IP address. Protecting PII is central to privacy regulations and data security practices.
Shadow Data

Shadow data is data created, copied, or used outside of sanctioned systems or governance processes — often without oversight. It poses risks to compliance, security, and decision-making.
Trust Score

A trust score is a rating that reflects how reliable, complete, and compliant a dataset is. It’s used to guide decisions about whether to use or share a given asset.
Risk Management
Risk management in data governance involves identifying, assessing, and mitigating threats to data security, quality, or compliance. It ensures that data practices align with business goals and legal requirements.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/ai-risk-management/
- Curious how we turn governance into real-world impact?
Solvency II

Solvency II is a European regulatory framework for insurance companies, in force since 2016. It sets out capital requirements and risk management standards to ensure insurers remain financially stable and can meet their obligations to policyholders, while also promoting market transparency and consumer protection.

AI & machine learning

Learn the foundational concepts that power machine learning models, from training data to algorithmic transparency and operationalization.

AI audit trail

An AI audit trail is a complete record of model activity — from training data to decisions made in production. It helps teams trace outcomes, explain results, and comply with regulatory standards.
AI Governance
AI governance refers to the framework of policies, practices, and regulations that guide the responsible development and use of artificial intelligence. It ensures ethical compliance, data transparency, risk management, and accountability—critical for organizations seeking to scale AI securely and align with evolving regulatory standards.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/building-an-ai-governance-framework/
  - https://www.datagalaxy.com/en/blog/ai-governance-framework-considerations/
AI Risk Management

AI risk management involves identifying and mitigating risks introduced by machine learning models — including bias, drift, compliance breaches, and reputational harm. It’s essential for safe and scalable AI adoption.

? Learn how DataGalaxy enables AI Governance
ML Metadata

ML metadata refers to the data that describes machine learning artifacts — including training datasets, model parameters, evaluation metrics, and deployment details. Managing this metadata is key to reproducibility and operational visibility.
Model Governance

Model governance is the framework of processes, policies, and tools used to manage and oversee machine learning models. It ensures models are accountable, explainable, compliant, and aligned with business goals.
Model Lineage

Model lineage tracks the full lifecycle of a model — from data sources and training steps to deployments and updates. It enables auditability, reproducibility, and trust in model-driven decisions.
Model Registry

A model registry is a centralized system for managing versions of machine learning models, including metadata, approval stages, and deployment status. It ensures traceability, collaboration, and lifecycle control.
Responsible AI
Responsible AI refers to the practice of building and deploying AI systems that are ethical, transparent, inclusive, and aligned with societal values. It spans fairness, bias mitigation, privacy, and accountability.
- Curious how we turn governance into real-world impact?

Quality & observability

Dive into the vocabulary of data reliability — covering accuracy, completeness, freshness, and how to monitor data at scale.

Augmented Data Quality
Augmented data quality leverages AI and machine learning to automate data profiling, anomaly detection, cleansing, and rule enforcement — improving accuracy and reliability at scale.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/top-data-quality-tools/
  - https://www.datagalaxy.com/en/blog/improving-data-quality-in-8-steps/
Data Accuracy

Data accuracy reflects how closely data values align with real-world facts. Inaccurate data can lead to faulty reports, bad decisions, and lost trust.
Data Completeness

Data completeness measures whether all required data is present — with no missing values, rows, or fields. Incomplete data often leads to blind spots or broken workflows.
Data Consistency

Data consistency ensures that data values are uniform across systems — for example, “USD” vs “US Dollar” or duplicate entries in customer tables. It prevents confusion and mistrust.
Data Lineage
Data lineage traces data’s journey—its origin, movement, and transformations—across systems. It helps track errors, ensure accuracy, and support compliance by providing transparency. This boosts trust, speeds up troubleshooting, and strengthens governance.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/what-is-data-lineage/
  - https://www.datagalaxy.com/en/blog/data-lineage-a-step-by-step-guide/
- Want to see a modern data catalog in action?
Data Observability
Data observability is the ability to monitor the health of your data pipelines using metrics like freshness, volume, schema changes, and lineage. It helps detect issues early and maintain trust in your data.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/top-data-observability-tools/
  - https://www.datagalaxy.com/en/blog/3-observability-metrics-data-pipelines/
Data Profiling

Data profiling analyzes the structure, content, and quality of a dataset — such as value distributions, null rates, and pattern mismatches — to uncover issues and assess usability.
Data Quality
Data quality refers to how well data meets the needs of its users — based on dimensions like accuracy, completeness, consistency, and timeliness. It’s essential for analytics, compliance, and decision-making
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/improving-data-quality-in-8-steps/
  - https://www.datagalaxy.com/en/blog/understanding-data-governance-and-data-quality/
Data Readiness
Data readiness measures how prepared your data is to support AI and analytics — across completeness, structure, quality, documentation, and business meaning..
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/data-readinessdata-governance/
Data Stewardship
Data stewardship is the practice of managing data assets responsibly. Stewards ensure data is documented, high quality, and used appropriately across teams.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/data-stewardship-modern-organizations/
Data Timeliness

Data timeliness refers to how up-to-date data is relative to when it’s needed. Timely data enables responsive decision-making and real-time use cases.
Data Validation

Data validation involves checking whether data meets defined standards, rules, or constraints. It’s often applied during ingestion to catch issues early.
Quality Rules Engine

A quality rules engine is a tool or framework that applies logic to automatically evaluate datasets against quality standards — like detecting duplicates, nulls, or schema mismatches.

Architecture & infrastructure

Get familiar with the systems, layers, and tooling that support enterprise-scale data operations — from pipelines to cloud platforms.

Cloud Data Platform

A cloud data platform is a suite of cloud-native tools for storing, processing, and analyzing data at scale — often combining storage (e.g., lakehouse), compute, and governance layers.
Data and Analytics Governance Platforms
These platforms provide integrated tools and frameworks to manage data governance, data quality, metadata, policies, and analytics lifecycle — helping enterprises implement scalable governance strategies.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/implementing-data-governance-warehouse/
Data Fabric
A data fabric is an architectural approach that connects data across disparate systems through a unified metadata and governance layer — enabling seamless access, integration, and observability.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/data-mesh-vs-data-fabric/
Data Lake
A data lake is a centralized repository for storing large volumes of structured and unstructured data in its raw format. It supports flexible analytics, AI/ML, and big data workloads.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/data-mesh-vs-data-lake/
  - https://www.datagalaxy.com/en/blog/data-swamp-vs-data-lake/
Data Mart

A data mart is a subject-specific subset of a data warehouse, tailored to a particular business line or team — such as finance, marketing, or HR — to improve access and performance.
Data Mesh
Data mesh decentralizes data ownership to domain teams, letting them manage and serve data as products. It fosters collaboration and accountability, supported by shared standards, self-serve tools, and governance to ensure data is interoperable and trustworthy across the organization.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/data-mesh/
  - https://www.datagalaxy.com/en/blog/data-mesh-from-data-as-a-byproduct-to-data-as-a-product/
Data Orchestration

Data orchestration coordinates the execution of data workflows across different systems, ensuring tasks run in the right order and data dependencies are respected.
Data Pipeline
A data pipeline is the set of processes that move, transform, and load data from source to destination. It’s the backbone of any data integration or analytics strategy.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/automated-data-lineage-key-benefits/
Data Product
A data product is a well-defined data asset — such as a dashboard, dataset, or API — that delivers value to end users. It has clear ownership, SLAs, documentation, and is treated like a product, not a project.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/unraveling-data-products/
  - https://www.datagalaxy.com/en/blog/creating-successful-data-products/
- Want to manage data and AI products like real business assets?
Data Product Governance

Data product governance ensures that data assets treated as products — with defined owners, SLAs, and quality standards — are discoverable, trusted, and aligned with business outcomes.
Data Stack

A data stack is the set of tools and technologies that power data collection, processing, storage, and analysis — from ingestion tools to warehouses and BI platforms.
Data Warehouse
A data warehouse is a centralized, structured repository optimized for querying and reporting. It integrates data from multiple sources to support BI and analytics.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/unleashing-the-power-of-data-warehouses/
  - https://www.datagalaxy.com/en/blog/data-lake-vs-data-warehouse/
ETL / ELT

ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are two common approaches to data integration. ETL transforms data before loading, while ELT transforms it within the destination system — often used in modern cloud platforms.
Master Data Management (MDM)

MDM is the process of defining and managing core business entities (e.g., customers, products) to ensure consistency across systems. It supports data quality, reporting, and operational efficiency.
Reference Data

Reference data includes standardized, non-transactional values used across systems — like country codes, currency formats, or product categories. It ensures semantic consistency.

Strategy & culture

Unpack the organizational, cultural, and strategic dimensions of data — including ownership models, literacy, and change management.

Augmented FinOps

The use of AI-driven tools and techniques to optimize financial operations in cloud computing environments — including resource allocation, spend forecasting, and performance tracking across data infrastructure.
Data Democratization
Data democratization means giving everyone in an organization — not just technical users — access to data they can understand and use. It supports self-service, collaboration, and faster decision-making.
- ? Want your teams to find and request data as easily as shopping online?
Data Enablement
Data enablement ensures that users have the right tools, training, and access to use data effectively. It connects data strategy with daily operations.
- ? Want your teams to find and request data as easily as shopping online?
Data Initiative Prioritization
This involves ranking data projects based on impact, feasibility, and alignment with business goals — to ensure resources are focused on what matters most.
- Curious how we solve this at scale? Explore the product.
Data Literacy

Data literacy is the ability to read, understand, question, and communicate with data. It’s essential for creating a data-driven culture across all levels of an organization.
Data Portfolio Management
Data portfolio management applies portfolio thinking to data products and initiatives — balancing investments, risks, and value across multiple data assets or programs.
- Curious how we solve this at scale? Explore the product.
Data Product Portfolio
This refers to the full collection of data products (like dashboards, APIs, certified datasets) managed as strategic assets with owners, SLAs, and business goals.
- Curious how we solve this at scale? Explore the product.
Data Strategy

A data strategy defines how an organization will manage and use data to achieve business objectives. It aligns people, processes, and platforms with measurable outcomes.
Digital Transformation

Digital transformation is the broader business shift toward using digital tools and data-driven processes to improve operations, customer experience, and innovation.
Investment Alignment
Investment alignment ensures that data spending is targeted at initiatives with measurable and strategic return — rather than isolated, tech-driven projects.
- Curious how we solve this at scale? Explore the product.
Outcome-Driven Governance
Outcome-driven governance focuses governance efforts on measurable business outcomes — rather than just compliance or control — enabling agility and strategic relevance.
- Curious how we solve this at scale? Explore the product.
Stakeholder Alignment

Stakeholder alignment means ensuring all key parties — from executives to data teams — are aligned on goals, expectations, and definitions around data initiatives.
Value Governance
Value governance is the practice of ensuring data and AI initiatives are aligned with strategic business objectives and deliver measurable outcomes. It connects governance efforts to ROI by prioritizing investments, tracking value realization, and enabling data-driven decision-making at scale.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/value-governance-business-value/
- Curious how we solve this at scale? Explore the product.
Value Management

Value management in data refers to systematically planning, measuring, and optimizing the business impact of data initiatives. It shifts the conversation from technical delivery to business outcomes.
Value Tracking
Value tracking monitors the business outcomes associated with data use — such as revenue growth, operational efficiency, or risk reduction — to demonstrate ROI.
- Curious how we solve this at scale? Explore the product.

Data & AI products

Understand how data-driven and AI-powered products are built, governed, and evolved to deliver business value at scale.

AI Product
An AI product is a software solution powered by AI or machine learning — such as a recommendation engine, chatbot, or fraud detection system — that solves specific business problems and evolves over time.
- Want to manage data and AI products like real business assets?
AI Product Lifecycle
The AI product lifecycle includes problem framing, data collection, model training, deployment, monitoring, and iteration. Managing this lifecycle is key to responsible and successful AI adoption.
- Want to manage data and AI products like real business assets?
Data and Analytics Product Management
The discipline of applying product management principles to the lifecycle of data and analytics products — including design, governance, value delivery, and continuous iteration across data teams.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/creating-successful-data-products/
Data as a product

“Data as a product” is the mindset of managing data with the same rigor as customer-facing products — with clear purpose, usability, trust, and business ownership.
Data Product
A data product is a well-defined data asset — such as a dashboard, dataset, or API — that delivers value to end users. It has clear ownership, SLAs, documentation, and is treated like a product, not a project.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/unraveling-data-products/
  - https://www.datagalaxy.com/en/blog/creating-successful-data-products/
- Want to manage data and AI products like real business assets?
Data Product Lifecycle
The data product lifecycle describes the stages a data product goes through: from design and development to launch, maintenance, and retirement. Governance ensures quality and traceability at every step.
- ? Want to go deeper? Check out:
  - https://www.datagalaxy.com/en/blog/creating-successful-data-products/
  - https://www.datagalaxy.com/en/blog/iterating-adopting-data-products/
- Want to manage data and AI products like real business assets?
Marketplace

In a data context, a marketplace is a centralized hub where users can browse, access, and request certified data products — often integrated with governance workflows, access controls, and business metadata.

? Want to see how a data marketplace works in practice?

Product Update February

Explore DataGalaxy Catalog

Explore DataGalaxy Portfolio

6 most popular data lineage use cases for businesses

6 steps to develop your data governance framework