Skip to main content
Version: 95.2

Data Lineage & Observability with OpenLineage and Prometheus

This Docker Compose stack provides two independent but complementary environments: a data lineage service powered by Marquez, and a complete telemetry and observability suite featuring Prometheus, Grafana, and Alertmanager. Through the use of Docker Compose profiles, you can launch either stack on its own, allowing you to use only the resources you need.

📌 Description​

This architecture is designed for data professionals who need to understand their data's journey (lineage) and monitor the health of their systems (telemetry). It separates these concerns into two distinct profiles:

  • The lineage profile deploys Marquez, the reference implementation of the OpenLineage standard. It collects metadata to build a living map of how datasets are produced and consumed, which is invaluable for impact analysis, data governance, and debugging complex data pipelines.
  • The telemetry profile deploys the industry-standard Prometheus/Grafana stack for comprehensive metrics, visualization, and alerting, ensuring the reliability and performance of your entire platform.

🔑 Key Components​

This stack is organized into two independent profiles. To launch a specific stack, use the --profile flag.

🧬 Data Lineage Stack (--profile lineage)​

Launch this stack with docker compose --profile lineage -f ./compose-obsv.yml up -d.

  • marquez-api (marquezproject/marquez:0.51.1): The core Marquez backend service. It provides a RESTful API compliant with the OpenLineage standard, allowing it to receive metadata from integrated tools like Flink, Spark, and Airflow.
  • marquez-web (marquezproject/marquez-web:0.51.1): The web interface for Marquez, which visualizes the collected OpenLineage data. It allows users to explore interactive data lineage graphs and trace the journey of their data.
    • UI is exposed on port: 3003
  • marquez-db (postgres:14): A PostgreSQL database that serves as the backend for Marquez, storing all metadata on jobs, datasets, historical runs, and their relationships.

📊 Telemetry & Observability Stack (--profile telemetry)​

Launch this stack with docker compose --profile telemetry -f ./compose-obsv.yml up -d.

  • prometheus (prom/prometheus:v3.5.0): A time-series database that collects and stores metrics by scraping configured endpoints. It's configured to scrape metrics from other services in the ecosystem (like Kpow or Flink).
    • UI is exposed on port: 19090
  • grafana (grafana/grafana:12.1.1): A leading visualization platform for creating dashboards from the metrics stored in Prometheus. This service is pre-configured to automatically provision datasources and dashboards.
    • UI is exposed on port: 3004 (admin/admin)
  • alertmanager (prom/alertmanager:v0.28.1): Manages alerts sent by Prometheus. It handles deduplicating, grouping, and routing alerts to notification channels like email or Slack.
    • UI is exposed on port: 19093

💡 Note: You can launch both stacks using:

docker compose --profile lineage --profile telemetry -f ./compose-obsv.yml up -d

🧰 Use Cases​

For the Data Lineage Stack​

  • Automated Data Lineage & Provenance Tracking: Use OpenLineage integrations to automatically capture lineage metadata from your data pipelines. Visualize the origin, movement, and transformations of data across your ecosystem in the Marquez UI.
  • Impact and Root Cause Analysis: When a pipeline fails or data quality issues arise, use the lineage graph in Marquez to quickly identify the upstream root cause and assess the downstream impact.
  • Data Governance and Compliance: Maintain a detailed, historical record of dataset versions, schema changes, and job execution history, which is essential for auditing and understanding the data lifecycle.

For the Telemetry Stack (Prometheus & Grafana)​

  • Centralized System Health Monitoring: Utilize Prometheus and Grafana to monitor the performance and health of your data services. Create dashboards to track API latency, database connections, and resource utilization.
  • Proactive Alerting on System Issues: Configure alerts in Prometheus and Alertmanager to be notified of potential problems, such as high CPU usage, low disk space, or failed service health checks, before they impact your data consumers.
  • Performance Analysis: Dive deep into time-series metrics to understand the performance characteristics of your data platform, identify bottlenecks, and optimize resource allocation.