Real-time Data Stores: Pinot, ClickHouse & StarRocks
This stack provides a flexible development environment featuring a choice of three powerful, high-performance data stores: Apache Pinot, ClickHouse, and StarRocks. Designed to be a showcase of modern data storage solutions, this configuration uses Docker Compose profiles to allow you to select and launch only the specific system you need for your real-time analytics workloads.
📌 Description​
This architecture is designed for developers and engineers to explore and build applications on top of leading data technologies. The name compose-store.yml reflects its purpose as a catalog of specialized storage systems for enabling low-latency analytical queries on large datasets.
- Apache Pinot is purpose-built for user-facing, real-time analytics requiring sub-second query latency.
- ClickHouse is a general-purpose, high-performance OLAP database, deployed here with the HyperDX stack which provides a convenient web UI and ingestion endpoint to accelerate development.
- StarRocks delivers a modern, high-concurrency OLAP experience for a wide range of analytical workloads.
A standalone ZooKeeper service is included to provide the necessary coordination for the Apache Pinot cluster.
🔑 Key Components​
This stack is organized into three independent profiles. To launch a specific system, use the --profile flag (e.g., docker compose --profile pinot -f ./compose-store.yml up).
🔵 Apache Pinot (--profile pinot)​
A distributed, real-time OLAP data store designed for ultra-low-latency analytics at scale.
- ZooKeeper (
zookeeper): Manages cluster state and coordination for Pinot. - Pinot Controller (
pinot-controller): The brain of the cluster, handling administration and schema management.- Admin UI/API:
http://localhost:19000
- Admin UI/API:
- Pinot Broker (
pinot-broker): The query gateway for clients.- Query Endpoint:
http://localhost:18099
- Query Endpoint:
- Pinot Server (
pinot-server): Hosts data segments and executes query fragments.
🟡 ClickHouse for Real-Time Analytics (--profile clickhouse)​
Deploys the powerful ClickHouse OLAP engine with a convenient UI and ingestion layer for building analytics applications.
- ClickHouse Server (
ch-server): The core of this profile. A high-performance, column-oriented DBMS designed for real-time analytical queries on massive datasets.- HTTP API:
http://localhost:8123
- HTTP API:
- HyperDX App (
ch-app): Provides a web UI and exploration tool for interacting with ClickHouse. Use it to run SQL queries, visualize results, and manage your analytics environment. - HyperDX OTEL Collector (
ch-otel-collector): An example of a high-performance data ingestion endpoint (OpenTelemetry Collector) capable of feeding real-time data into ClickHouse. - MongoDB (
ch-db): A necessary dependency that acts as a metadata store for the HyperDX UI.
🟢 StarRocks (--profile starrocks)​
A next-generation, high-performance analytical data store designed for a wide range of low-latency OLAP scenarios.
- StarRocks Frontend (FE) (
starrocks-fe): Manages metadata, query planning, and client connections.- Admin UI:
http://localhost:8030 - MySQL-compatible SQL Endpoint:
localhost:9030
- Admin UI:
- StarRocks Backend (BE) (
starrocks-be): Responsible for storing data and executing the physical query plans with high performance.
🧰 Use Cases​
For Apache Pinot​
- Real-Time Dashboards: Power interactive dashboards requiring millisecond query latency on constantly updating datasets.
- User-Facing Analytics: Embed analytics directly into applications where users can explore data with immediate feedback.
- Anomaly & Threat Detection: Query streaming event data in near real-time to identify patterns and outliers quickly.
For ClickHouse​
- Real-Time Business Intelligence (BI): Power interactive dashboards and reports directly on raw, large-scale data without pre-aggregation.
- Interactive Big Data Analytics: Run complex, ad-hoc analytical queries on terabytes of data (e.g., web clickstreams, IoT sensor data, financial transactions) and get results in seconds.
- Log and Event Data Analysis: Build powerful search and aggregation systems on machine-generated data for operational intelligence or security analytics.
- Streaming Data Analytics: Ingest data from real-time sources like Kafka and make it available for immediate querying.
For StarRocks​
- Modern Data Warehousing: Serves as a high-performance core for a modern data warehouse, enabling fast BI reporting and low-latency dashboards.
- Ad-Hoc & Interactive Analytics: Empower analysts to run complex, exploratory queries on large datasets without long wait times.
- Unified Analytics: Query data from various sources, including data lakes, from a single, fast engine.