Prerequisites
Install Docker
The local cluster runs with Docker Compose, so you will need to install Docker.
Once Docker is installed, clone this repository and run the following commands from the base path.
Clone this repository
git clone git@github.com:factorhouse/factorhouse-local.git
Change into the repository directory
cd factorhouse-local
Downloading dependencies
Core services like Flink, Spark, and Kafka Connect are designed to be modular and do not come bundled with the specific connectors and libraries needed to communicate with other systems like the Hive Metastore, Apache Iceberg, or S3.
setup-env.sh
automates the process of downloading all the required dependencies and organizing them into a local deps
directory. When the services are started with docker-compose, this directory is mounted as a volume, injecting the libraries directly into each container's classpath.
View all downloaded dependencies
Kafka connectors
- Confluent S3 Connector: For streaming data from Kafka to S3.
- Debezium PostgreSQL Connector: For capturing row-level changes from a PostgreSQL database.
- Amazon MSK Data Generator: A tool for generating sample data for testing.
- Iceberg Kafka Connect: For sinking Kafka records into Apache Iceberg tables.
Flink connectors
- Kafka SQL Connector: Enables Flink to read from and write to Kafka topics using SQL.
- Avro Confluent Registry: Allows Flink to work with Avro schemas stored in Confluent Schema Registry.
- Flink Faker: A connector for generating fake data streams within Flink, useful for development and testing.
Flink Hive dependencies
- Hive SQL Connector: Allows Flink to connect to a Hive Metastore and query Hive tables.
- Supporting Libraries: Includes
Hive Exec
,Antlr
, andThrift
libraries necessary for the Hive integration to function.
Flink Iceberg/Parquet dependencies
- Iceberg Flink Runtime: The core library for Flink to read from and write to Apache Iceberg tables.
- Iceberg AWS Bundle: Provides AWS-specific integrations for Iceberg, like S3 file I/O.
- Parquet SQL Formatter: Enables Flink to handle the Parquet file format.
Hadoop/Hive Metastore dependencies
- Hadoop Libraries: A collection of core Hadoop modules (
hadoop-common
,hadoop-aws
,hadoop-auth
) and their dependencies (aws-java-sdk-bundle
,guava
, etc.) required for interacting with HDFS-compatible file systems like S3. - PostgreSQL JDBC Driver: Required for the Hive Metastore to communicate with its PostgreSQL backend database.
Spark Iceberg dependencies
- Iceberg Spark Runtime: The core library for Spark to read from and write to Apache Iceberg tables.
- Iceberg AWS Bundle: Provides AWS-specific integrations for Spark, enabling it to work with Iceberg tables on S3.
./resources/setup-env.sh
▶️ Downloading Kafka connectors...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 27s!
▶️ Downloading Flink connectors...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 4s!
▶️ Downloading Flink Hive dependencies...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 11s!
▶️ Downloading Flink Iceberg/Parquet dependencies...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 12s!
▶️ Downloading Hadoop/Hive Metastore dependencies...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 36s!
▶️ Downloading Spark Iceberg dependencies...
⏳ Progress : [##################################################] 100%
✅ Download complete in 0m 11s!
Update Kpow and Flex licenses
Both Kpow and Flex require valid licenses to run. You can get started in one of two ways:
-
Request a free Community License for non-commercial use:
-
Or request a 30-day Trial License for commercial evaluation - this license unlocks all enterprise features:
For managing Kpow and Flex licenses effectively, it's strongly recommended to store the license files externally from your main configuration or version control system (like Git). This approach prevents accidental exposure of sensitive license details and makes updating or swapping licenses much simpler.
The Docker Compose files facilitates this by allowing you to specify the path to your license file using environment variables on your host machine before launching the services. Specifically, they are configured to look for these variables and use their values to locate the appropriate license file via the env_file
directive. If an environment variable is not set, a default path (usually within the resources
directory) is used as a fallback.
Regardless of the edition, only a single licence file is expected for Kpow and Flex.
KPOW_LICENSE
: Specifies the path to the Kpow license file.FLEX_LICENSE
: Specifies the path to the Flex license file.
Example usage:
Imagine your Kpow license is stored at /home/<username>/.factorhouse/kpow-license.env
. To instruct Docker Compose to use this specific file, you would set the environment variable on your host before running the compose command:
# Set the environment variable (syntax may vary slightly depending on your shell)
export KPOW_LICENSE=/home/<username>/.factorhouse/kpow-license.env
# Now run Docker Compose - it will use the path set above
docker compose -p kpow -f compose-kpow.yml up -d
By default, it is configured to deploy the Enterprise edition. See Running the Platform for instructions on how to configure it to run the Community edition instead.
License file example
LICENSE_ID=<license-id>
LICENSE_CODE=<license-code>
LICENSEE=<licensee>
LICENSE_EXPIRY=<license-expiry>
LICENSE_SIGNATURE=<license-signature>
License mapping details
# compose-kpow-trial.yml
services:
kpow:
...
env_file:
- resources/kpow/config/trial.env
- ${KPOW_TRIAL_LICENSE:-resources/kpow/config/trial-license.env}
# compose-kpow-community.yml
services:
kpow:
...
env_file:
- resources/kpow/config/community.env
- ${KPOW_COMMUNITY_LICENSE:-resources/kpow/config/community-license.env}
# compose-flex-trial.yml
services:
flex:
...
env_file:
- resources/flex/config/trial.env
- ${FLEX_TRIAL_LICENSE:-resources/flex/config/trial-license.env}
# compose-flex-trial.yml
services:
flex:
...
env_file:
- resources/flex/config/local-community.env
- ${KPOW_COMMUNITY_LICENSE:-resources/flex/config/community-license.env}