Skip to main content
Version: 95.2

Real-time Data Stores: Pinot, ClickHouse & StarRocks

This stack provides a flexible development environment featuring a choice of three powerful, high-performance data stores: Apache Pinot, ClickHouse, and StarRocks. Designed to be a showcase of modern data storage solutions, this configuration uses Docker Compose profiles to allow you to select and launch only the specific system you need for your real-time analytics workloads.

📌 Description​

This architecture is designed for developers and engineers to explore and build applications on top of leading data technologies. The name compose-store.yml reflects its purpose as a catalog of specialized storage systems for enabling low-latency analytical queries on large datasets.

  • Apache Pinot is purpose-built for user-facing, real-time analytics requiring sub-second query latency.
  • ClickHouse is a general-purpose, high-performance OLAP database, deployed here with the HyperDX stack which provides a convenient web UI and ingestion endpoint to accelerate development.
  • StarRocks delivers a modern, high-concurrency OLAP experience for a wide range of analytical workloads.

A standalone ZooKeeper service is included to provide the necessary coordination for the Apache Pinot cluster.


🔑 Key Components​

This stack is organized into three independent profiles. To launch a specific system, use the --profile flag (e.g., docker compose --profile pinot -f ./compose-store.yml up).

🔵 Apache Pinot (--profile pinot)​

A distributed, real-time OLAP data store designed for ultra-low-latency analytics at scale.

  • ZooKeeper (zookeeper): Manages cluster state and coordination for Pinot.
  • Pinot Controller (pinot-controller): The brain of the cluster, handling administration and schema management.
  • Pinot Broker (pinot-broker): The query gateway for clients.
    • Query Endpoint: http://localhost:18099
  • Pinot Server (pinot-server): Hosts data segments and executes query fragments.

🟡 ClickHouse for Real-Time Analytics (--profile clickhouse)​

Deploys the powerful ClickHouse OLAP engine with a convenient UI and ingestion layer for building analytics applications.

  • ClickHouse Server (ch-server): The core of this profile. A high-performance, column-oriented DBMS designed for real-time analytical queries on massive datasets.
    • HTTP API: http://localhost:8123
  • HyperDX App (ch-app): Provides a web UI and exploration tool for interacting with ClickHouse. Use it to run SQL queries, visualize results, and manage your analytics environment.
  • HyperDX OTEL Collector (ch-otel-collector): An example of a high-performance data ingestion endpoint (OpenTelemetry Collector) capable of feeding real-time data into ClickHouse.
  • MongoDB (ch-db): A necessary dependency that acts as a metadata store for the HyperDX UI.

🟢 StarRocks (--profile starrocks)​

A next-generation, high-performance analytical data store designed for a wide range of low-latency OLAP scenarios.

  • StarRocks Frontend (FE) (starrocks-fe): Manages metadata, query planning, and client connections.
  • StarRocks Backend (BE) (starrocks-be): Responsible for storing data and executing the physical query plans with high performance.

🧰 Use Cases​

For Apache Pinot​

  • Real-Time Dashboards: Power interactive dashboards requiring millisecond query latency on constantly updating datasets.
  • User-Facing Analytics: Embed analytics directly into applications where users can explore data with immediate feedback.
  • Anomaly & Threat Detection: Query streaming event data in near real-time to identify patterns and outliers quickly.

For ClickHouse​

  • Real-Time Business Intelligence (BI): Power interactive dashboards and reports directly on raw, large-scale data without pre-aggregation.
  • Interactive Big Data Analytics: Run complex, ad-hoc analytical queries on terabytes of data (e.g., web clickstreams, IoT sensor data, financial transactions) and get results in seconds.
  • Log and Event Data Analysis: Build powerful search and aggregation systems on machine-generated data for operational intelligence or security analytics.
  • Streaming Data Analytics: Ingest data from real-time sources like Kafka and make it available for immediate querying.

For StarRocks​

  • Modern Data Warehousing: Serves as a high-performance core for a modern data warehouse, enabling fast BI reporting and low-latency dashboards.
  • Ad-Hoc & Interactive Analytics: Empower analysts to run complex, exploratory queries on large datasets without long wait times.
  • Unified Analytics: Query data from various sources, including data lakes, from a single, fast engine.