Skip to main content
Version: 95.2

Unified Data Discovery & Governance with OpenMetadata

This stack deploys OpenMetadata, an all-in-one open-source platform for data discovery, data lineage, data quality, observability, and governance. It provides a centralized metadata store, enabling teams to get a comprehensive view of their data assets. This entire environment is deployed under the omt profile.

📌 Description​

This architecture is designed to create a single source of truth for all your data. OpenMetadata actively pulls metadata from a vast ecosystem of connectors (databases, data warehouses, BI tools, etc.) to build a rich, interconnected map of your data landscape.

Unlike systems that focus purely on lineage, OpenMetadata is a comprehensive data workspace. It allows users to not only see how data is created and used (lineage) but also to search for data across the entire organization (discovery), understand its meaning through a business glossary (governance), and verify its trustworthiness through data quality tests. It's the central hub for collaboration and understanding around data.


🔑 Key Components​

This stack is organized under a single profile. To launch it, use the command docker compose --profile omt -f ./compose-metadata.yml up -d.

🔵 OpenMetadata Platform (--profile omt)​

  • omt-server (openmetadata/server): The heart of the platform. This service hosts the OpenMetadata UI, the central metadata API, and all the core logic for managing data assets.
    • UI is exposed on port: 8585
  • omt-ingestion (openmetadata/ingestion): An Airflow-based service dedicated to running metadata ingestion workflows. You configure and trigger these workflows from the OpenMetadata UI to connect to your data sources (like databases, dashboards, or messaging systems) and pull their metadata into the platform.
    • Airflow UI (for ingestion debugging) is on port: 8080
  • omt-db (postgresql): The primary metadata store. This PostgreSQL database persists all the metadata for your data assets, including schemas, descriptions, tags, ownership, lineage information, and data quality test results.
  • omt-es (elasticsearch): The search engine that powers OpenMetadata's discovery features. It indexes all metadata to provide a fast, powerful search experience, allowing users to quickly find relevant data assets.
  • omt-migrate (openmetadata/server): An initialization container that runs once upon the first startup. Its job is to prepare and migrate the PostgreSQL database schema to the version required by the omt-server, ensuring the application starts correctly.

🧰 Use Cases​

Centralized Data Discovery​

  • Empower data consumers (analysts, scientists, etc.) to find relevant and trusted data assets through a powerful, user-friendly search interface, regardless of where the data is physically located.

End-to-End Data Lineage​

  • Automatically ingest and visualize lineage from a wide range of sources (e.g., Flink, Spark, dbt, BI tools). This allows you to understand data dependencies, perform impact analysis for changes, and debug data issues from source to destination.

Data Governance and Collaboration​

  • Establish a Business Glossary with standardized definitions for key business terms.
  • Assign clear ownership to data assets to promote accountability.
  • Use Tags and Classifications to categorize data, identify sensitive information (PII), and enforce access policies.

Data Quality Monitoring​

  • Define and schedule data quality tests directly within the OpenMetadata UI. Monitor the health and reliability of your most critical datasets over time and build trust in your data.

Enhanced Data Documentation​

  • Create a collaborative environment where users can enrich metadata with descriptions, comments, and documentation, turning tribal knowledge into a shared, durable asset for the entire organization.