AI Policy

Preamble

This page complements our Privacy Policy and outlines DataGalaxy’s use of Artificial Intelligence (“AI”) systems that visitors and customers may encounter on our websites and services.

As a signatory of AI Pact pledges, DataGalaxy is making every effort to comply with AI regulations.

DataGalaxy reserves the right to update or modify this page in accordance with product evolution, legal requirements, and regulatory changes.

Security and Confidentiality

At DataGalaxy, we are fully committed to ensuring the confidentiality and security of your data, in compliance with all applicable regulations.

Any processing of personal data (if applicable) will be governed by our Privacy Policy.

Mission of AI at DataGalaxy

Our mission is to make data accessible and actionable, optimizing business processes for our clients. Our focus extends beyond data catalogs—we strive to enhance all data-related workflows within an organization, starting with the DataGalaxy platform and extending to every domain where high-quality data plays a critical role.

We prioritize AI implementation not for its novelty but to drive real efficiency, save time, and improve business effectiveness. Our approach is centered on strategically applying AI to deliver meaningful enhancements to data processes and user workflows.

Our Models

We employ both general-purpose and narrow AI systems, leveraging narrow AI for specific tasks where it outperforms general-purpose AI. None of our models qualify as high-risk under Chapter 3 of EU Regulation 2024/1689 on Artificial Intelligence.

Key Principles:

We do not share any client data with third parties.
We strive for high accuracy in our AI-powered features.
Our models are continuously improved without disrupting client operations.
A risk mitigation plan is in place for all AI models.

GenAI-Based Capabilities

We utilize the following models:

Llama 3.1 70B – for multilingual data catalogs and automated description generation.
Qwen 72B – for conversational AI (chatbot and natural language search).

Note: This list is subject to updates as AI technology evolves.

We follow a self-hosted approach to LLMs, ensuring that no customer data is shared with third parties.

Multilingual Data Catalog

DataGalaxy’s multilingual feature breaks language barriers, enabling team members to access data catalog information in their native language.

Security-first: Customer data is never sent to an external AI provider.
High accuracy: Industry-leading models achieve 95% translation accuracy, with user-editable translations.
Instant updates: New objects are translated within 1-2 seconds. Historical data translations are completed in 6 hours on average, with a maximum of 48 hours for large catalogs.
Customizable lexicons: Clients can enhance translation quality by adding custom terminology.
Currently available in 7 languages, with additional languages available upon request.
Admin-controlled feature: Must be enabled by an administrator.

Automated Description Generation

This feature enhances catalog completeness by automatically generating descriptions for objects.

Supports multiple languages.
Categorizes descriptions as business or technical, based on module and object type.
Two modes of generation:
1. Manual validation – Users can review, accept, dismiss, or regenerate AI-generated suggestions.
2. Fully automated generation – AI generates descriptions automatically when sufficient related information is available.

Conversational AI

To make data knowledge easily accessible, DataGalaxy integrates conversational AI in its chatbot and natural language search.

Multi-agent system: Detects user intent, classifies it, dispatches it to the appropriate search/analysis agent, and delivers relevant information.

AI Guards at DataGalaxy

Ethical AI is at the core of our approach. DataGalaxy implements a three-layered protection system comprising Prompt Guard, NeMo Guardrails, and Llama Guard, ensuring a secure and ethical AI experience.

Prompt Guard: Securing and Structuring Inputs

Purpose: Ensures that user inputs are safe, compliant, and structured before reaching the LLM.

Prevents attacks: Detects and blocks prompt injection attempts.
Eliminates biases: Cleans inputs to remove unintended errors.
Improves accuracy: Reformats and structures queries for better responses.

When a user submits a query, Prompt Guard analyzes the text, reformats it if necessary, and ensures it meets security and compliance standards before passing it to the LLM.

NeMo Guardrails: Managing Responses and Conversation Flow

Purpose: Enforces strict behavioral rules to ensure that the LLM’s responses remain accurate, relevant, and safe.

Regulates topics: Controls allowed and restricted subject areas.
Prevents harmful content: Filters out hallucinations, toxic responses, and misinformation.
Redirects off-topic responses: Ensures that AI-generated replies stay relevant and within scope.

After the LLM generates a response, NeMo Guardrails reviews and modifies it as needed, ensuring that replies follow predefined ethical and business rules.

Llama Guard: Validating and Filtering Outputs

Purpose: Acts as the final layer of security, ensuring that all AI-generated content meets ethical and compliance standards.

Removes inappropriate content: Filters out violence, discrimination, and misinformation.
Eliminates hallucinations: Ensures that responses are factually accurate.
Enforces security policies: Applies necessary compliance checks before presenting the final output to the user.

Once NeMo Guardrails refines the response, Llama Guard performs a final security and accuracy check before delivering the information to the user.

Narrow AI-Based Capabilities

Automated Link Generation

One of the key aspects of governance is making data assets comprehensible and accessible. DataGalaxy uses proprietary machine learning models to detect lexical similarities and suggest meaningful links between objects.

Implementation links: Between glossary objects and dictionary objects.
Usage links: Between glossary objects and usage objects.
Available to users with editing rights in relevant sections.

Data Classification: PII Detection

Our AI helps classify Personally Identifiable Information (PII) based on an object’s metadata (technical label, description, summary, tags, and keywords).

AI categorizes objects into four PII groups.
Users can accept or reject classifications, helping refine model accuracy.

Tag Suggestions

Based on an object’s metadata, AI automatically generates tags for business classification.

Automatic Glossary Generation

AI-driven automation simplifies the process of creating and managing glossary entries.

Detects and adds objects from data dictionaries and usage applications.
Establishes links between glossary terms and relevant objects.
Identifies duplicate objects and suggests merges for consistency.

AI Capabilities Disclaimer

All AI functionalities are designed as automated assistance tools to support users by providing information based on:

A client’s data catalog
Other client-approved information
DataGalaxy’s documentation
Best practices and open-source knowledge

While we strive for accuracy, we cannot guarantee that all AI-generated responses are complete, precise, or up-to-date. Users should treat AI suggestions as assistance tools rather than definitive answers.

Bias Mitigation and AI Governance

At DataGalaxy, we are committed to ensuring the reliable and fair use of artificial intelligence. Our systems are designed to inherently minimize bias and are supported by continuous monitoring and automated correction mechanisms.

Our key measures include:

Ongoing Model Monitoring: We leverage specialized tools to continuously track LLM performance, detect potential drifts, and ensure consistent, high-quality responses.
Automated Correction via AI Guards: Our protection systems (Prompt Guard, NeMo Guardrails, Llama Guard) actively filter and adjust model outputs to mitigate unintended biases and enhance response accuracy.
Monitoring of General AI Models: As we do not train LLMs ourselves, we closely monitor vendor updates and release notes to identify and anticipate any potential bias or non-compliant behavior.
Periodic Retraining for Narrow AI: For our domain-specific AI systems (classification, PII detection, link suggestions, and glossary management), we conduct periodic retraining sessions to resample data and prevent bias accumulation due to evolving business contexts.

These mechanisms are embedded into our system design, ensuring a trustworthy AI aligned with best practices and compliant with regulatory requirements for fairness and transparency.

Explainability for AI Decisions

At DataGalaxy, we prioritize transparency in AI-driven processes and provide users with the means to:

Access AI Decision Rationales: we provide the explanation on how the model works and what are the criterias taken into account.
Request Human Review: If an automated decision affects a critical business process, users can request a review by a DataGalaxy expert. Please contact our support team to be redirected to an expert.
Stay Informed on Model Updates: Any major changes to AI models and their potential impact will be communicated to affected users.

User Responsibility

Users are solely responsible for any actions taken based on AI-generated suggestions. It is essential to interpret recommendations with caution and sound judgment.

Contact Us

For questions, comments, or requests regarding our AI tools, data processing, or privacy, you can contact our Data Protection Officer:

By email: dpo@datagalaxy.com
By mail: DataGalaxy, Data Protection Officer 47 rue Vivienne, 75002 Paris, France

As per applicable regulations, proof of identity may be required. Please include your full name, email address, and the nature of your request. Responses will be provided within one (1) month, extendable to two (2) months for complex inquiries.

Product Update February

Explore DataGalaxy Catalog

Explore DataGalaxy Portfolio

6 most popular data lineage use cases for businesses

6 steps to develop your data governance framework