Further configuration
Custom dependency loading
The Flink services (jobmanager
, taskmanager
, and sql-gateway
) do not have their dependencies baked into the Docker image. Instead, all required JAR files are dynamically loaded at runtime using a combination of Docker volume mounts and a Flink classpath discovery mechanism.
Dependencies for Hadoop, Hive, Iceberg, and Parquet are first downloaded to the local ./resources/deps
directory on the host machine. These directories are then mounted as volumes directly into the Flink containers at specific paths (e.g., /tmp/hadoop
, /tmp/iceberg
).
The Flink runtime is then instructed to scan these directories by the CUSTOM_JARS_DIRS
environment variable. This variable contains a semicolon-separated list of paths ("/tmp/hadoop;/tmp/hive;/tmp/iceberg;/tmp/parquet"
) that Flink will automatically search for JARs to add to its classpath upon startup.
However, the SQL Gateway has a special requirement. In addition to the standard dependencies, it needs access to specific Flink SQL connectors (like Kafka, Avro, etc.) which are mounted into the /tmp/connector
directory. Since the SQL Gateway does not support custom JAR loading through its own configuration, the CUSTOM_JARS_DIRS
environment variable for this specific service must be updated to include this path. This ensures the Gateway can successfully load the connectors required to execute SQL queries against external systems.
...
x-common-environment: &flink_common_env_vars
AWS_REGION: us-east-1
HADOOP_CONF_DIR: /opt/flink/conf
HIVE_CONF_DIR: /opt/flink/conf
CUSTOM_JARS_DIRS: "/tmp/hadoop;/tmp/hive;/tmp/iceberg;/tmp/parquet" # <-- Add ;/tmp/connector for SQL Gateway
x-common-flink-volumes: &flink_common_volumes
...
- ./resources/deps/hadoop:/tmp/hadoop
- ./resources/deps/flink/hive/flink-sql-connector-hive-3.1.3_2.12-1.20.1.jar:/tmp/hive/flink-sql-connector-hive-3.1.3_2.12-1.20.1.jar
- ./resources/deps/flink/hive/antlr-runtime-3.5.2.jar:/tmp/hive/antlr-runtime-3.5.2.jar
- ./resources/deps/flink/iceberg:/tmp/iceberg
- ./resources/deps/flink/parquet:/tmp/parquet
- ./resources/deps/flink/connector:/tmp/connector
services:
...
jobmanager:
<<: *flink_image_pull_policy_config
container_name: jobmanager
command: jobmanager
...
environment:
<<: *flink_common_env_vars
volumes: *flink_common_volumes
PyFlink support
PyFlink is supported by setting the FLINK_SUFFIX
environment variable before launching the services.
By exporting export FLINK_SUFFIX="-py"
, you instruct Docker Compose to modify its build process for the Flink services. This change directs it to use the resources/flink/Dockerfile-py
file instead of the default one. During the image build, this specific Dockerfile extends the base Flink image by installing Python, pip
, and the apache-flink
Python package.
As a result, the jobmanager
and taskmanager
containers will be fully equipped with the necessary environment to develop and execute PyFlink jobs. You can inspect the Dockerfile-py
for the exact commands used.
You can also build the image before starting the Docker service by
docker build -t fh-flink-1.20.1${FLINK_SUFFIX:-} ./resources/flink
x-common-flink-config: &flink_image_pull_policy_config
image: fh-flink-1.20.1${FLINK_SUFFIX:-} # ${FLINK_SUFFIX} is either unset (blank) or -py
build:
context: ./resources/flink/
dockerfile: Dockerfile${FLINK_SUFFIX:-}
pull_policy: never
...
services:
...
jobmanager:
<<: *flink_image_pull_policy_config
container_name: jobmanager
command: jobmanager
...
environment:
<<: *flink_common_env_vars
volumes: *flink_common_volumes
Alternatively, you can build the image manually before starting the services. This gives you more control and can speed up subsequent launches. To do this, run the docker build
command from your terminal, making sure the tag (-t
) exactly matches the image name specified in the docker-compose.yml
file.
To build the PyFlink image:
# Set the suffix for the tag and Dockerfile name
export FLINK_SUFFIX="-py"
# Build the image
docker build \
-t fh-flink-1.20.1${FLINK_SUFFIX} \
-f ./resources/flink/Dockerfile${FLINK_SUFFIX} \
./resources/flink
Because the image is now built and tagged locally, docker compose up
will use it directly instead of attempting to build it again.