Skip to main content

Prerequisites

Install Docker

The local cluster runs with Docker Compose, so you will need to install Docker.

Once Docker is installed, clone this repository and run the following commands from the base path.

Clone this repository

git clone git@github.com:factorhouse/factorhouse-local.git

Change into the repository directory

cd factorhouse-local

Downloading dependencies

Core services like Flink, Spark, and Kafka Connect are designed to be modular and do not come bundled with the specific connectors and libraries needed to communicate with other systems like the Hive Metastore, Apache Iceberg, or S3.

setup-env.sh automates the process of downloading all the required dependencies and organizing them into a local deps directory. When the services are started with docker-compose, this directory is mounted as a volume, injecting the libraries directly into each container's classpath.

View all downloaded dependencies

Kafka connectors

  • Confluent S3 Connector: For streaming data from Kafka to S3.
  • Debezium PostgreSQL Connector: For capturing row-level changes from a PostgreSQL database.
  • Amazon MSK Data Generator: A tool for generating sample data for testing.
  • Iceberg Kafka Connect: For sinking Kafka records into Apache Iceberg tables.
  • Kafka SQL Connector: Enables Flink to read from and write to Kafka topics using SQL.
  • Avro Confluent Registry: Allows Flink to work with Avro schemas stored in Confluent Schema Registry.
  • Flink Faker: A connector for generating fake data streams within Flink, useful for development and testing.
  • Hive SQL Connector: Allows Flink to connect to a Hive Metastore and query Hive tables.
  • Supporting Libraries: Includes Hive Exec, Antlr, and Thrift libraries necessary for the Hive integration to function.
  • Iceberg Flink Runtime: The core library for Flink to read from and write to Apache Iceberg tables.
  • Iceberg AWS Bundle: Provides AWS-specific integrations for Iceberg, like S3 file I/O.
  • Parquet SQL Formatter: Enables Flink to handle the Parquet file format.

Hadoop/Hive Metastore dependencies

  • Hadoop Libraries: A collection of core Hadoop modules (hadoop-common, hadoop-aws, hadoop-auth) and their dependencies (aws-java-sdk-bundle, guava, etc.) required for interacting with HDFS-compatible file systems like S3.
  • PostgreSQL JDBC Driver: Required for the Hive Metastore to communicate with its PostgreSQL backend database.

Spark Iceberg dependencies

  • Iceberg Spark Runtime: The core library for Spark to read from and write to Apache Iceberg tables.
  • Iceberg AWS Bundle: Provides AWS-specific integrations for Spark, enabling it to work with Iceberg tables on S3.
./resources/setup-env.sh
▶️ Downloading Kafka connectors...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 27s!

▶️ Downloading Flink connectors...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 4s!

▶️ Downloading Flink Hive dependencies...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 11s!

▶️ Downloading Flink Iceberg/Parquet dependencies...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 12s!

▶️ Downloading Hadoop/Hive Metastore dependencies...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 36s!

▶️ Downloading Spark Iceberg dependencies...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 11s!

Update Kpow and Flex licenses

Both Kpow and Flex require valid licenses to run. You can get started in one of two ways:

For managing Kpow and Flex licenses effectively, it's strongly recommended to store the license files externally from your main configuration or version control system (like Git). This approach prevents accidental exposure of sensitive license details and makes updating or swapping licenses much simpler.

The Docker Compose files facilitates this by allowing you to specify the path to your license file using environment variables on your host machine before launching the services. Specifically, they are configured to look for these variables and use their values to locate the appropriate license file via the env_file directive. If an environment variable is not set, a default path (usually within the resources directory) is used as a fallback.

Regardless of the edition, only a single licence file is expected for Kpow and Flex.

  • KPOW_LICENSE: Specifies the path to the Kpow license file.
  • FLEX_LICENSE: Specifies the path to the Flex license file.

Example usage:

Imagine your Kpow license is stored at /home/<username>/.factorhouse/kpow-license.env. To instruct Docker Compose to use this specific file, you would set the environment variable on your host before running the compose command:

# Set the environment variable (syntax may vary slightly depending on your shell)
export KPOW_LICENSE=/home/<username>/.factorhouse/kpow-license.env

# Now run Docker Compose - it will use the path set above
docker compose -p kpow -f compose-kpow.yml up -d

By default, it is configured to deploy the Enterprise edition. See Running the Platform for instructions on how to configure it to run the Community edition instead.

License file example
LICENSE_ID=<license-id>
LICENSE_CODE=<license-code>
LICENSEE=<licensee>
LICENSE_EXPIRY=<license-expiry>
LICENSE_SIGNATURE=<license-signature>
License mapping details
# compose-kpow-trial.yml
services:
kpow:
...
env_file:
- resources/kpow/config/trial.env
- ${KPOW_TRIAL_LICENSE:-resources/kpow/config/trial-license.env}

# compose-kpow-community.yml
services:
kpow:
...
env_file:
- resources/kpow/config/community.env
- ${KPOW_COMMUNITY_LICENSE:-resources/kpow/config/community-license.env}

# compose-flex-trial.yml
services:
flex:
...
env_file:
- resources/flex/config/trial.env
- ${FLEX_TRIAL_LICENSE:-resources/flex/config/trial-license.env}

# compose-flex-trial.yml
services:
flex:
...
env_file:
- resources/flex/config/local-community.env
- ${KPOW_COMMUNITY_LICENSE:-resources/flex/config/community-license.env}