Skip to main content
Version: 95.1

Factor House Local Overview

Factor House Local is a collection of pre-configured, modular Docker Compose environments that demonstrate modern data platform architectures. Each setup is purpose-built around a specific use case, incorporating a wide array of best-in-class technologies.

This includes event streaming with Kafka, data processing with Flink and Spark, a unified data lakehouse built with Iceberg, and a choice of real-time data stores like Pinot, ClickHouse, and StarRocks. The platform is rounded out by essential governance and observability tools, including OpenLineage by Marquez for data lineage, OpenMetadata for data discovery, and the Prometheus stack for system telemetry.

These environments are further enhanced by enterprise-grade tools from Factor House: Kpow, for Kafka management and control, and Flex, for seamless integration with Flink.

Factor House Local

Kafka Development & Monitoring with Kpow

Build and manage event-driven systems with a full-featured, Zookeeper-less Kafka environment. This stack includes a three-node KRaft-based cluster, Schema Registry, and Kafka Connect. The setup is complemented by Kpow, which provides deep visibility, streamlined management, and advanced monitoring capabilities for your entire Kafka ecosystem.

This integrated environment combines an Apache Flink cluster for real-time stream processing with an Apache Spark engine for large-scale batch computation. Both operate on a unified data lakehouse built with Apache Iceberg and MinIO object storage. A central Hive Metastore, backed by PostgreSQL, acts as a shared catalog, enabling seamless interoperability and consistent data access. The Flink environment is enhanced by Flex for enterprise-grade management, making this stack ideal for building end-to-end pipelines on a single, ACID-compliant platform.

Real-time Data Stores

Deploy a high-performance, real-time data store tailored to your needs. This stack offers a choice between three leading OLAP engines, launched via Docker profiles: Apache Pinot for user-facing, low-latency analytics; ClickHouse for general-purpose, high-speed BI and log analysis; and StarRocks for modern, high-concurrency data warehousing. Each provides a powerful foundation for building fast, interactive data applications.

Observability & Data Lineage

Establish a robust foundation for understanding your data and systems. This stack is divided into two selectable profiles: Data Lineage (lineage) featuring Marquez, the OpenLineage reference implementation, to track data provenance end-to-end; and Telemetry (telemetry) featuring the Prometheus, Grafana, and Alertmanager suite for comprehensive system metrics, dashboards, and alerting.

Data Discovery & Governance with OpenMetadata

Create a single source of truth for all your data assets with OpenMetadata. This all-in-one platform provides a centralized hub for data discovery, end-to-end lineage, data quality monitoring, and collaborative governance. By ingesting metadata from across your data ecosystem, it empowers your teams to find, understand, and trust your data.