Overview
Kpow's Prometheus egress endpoints follow the OpenMetrics standard.
This allows you to integrate Kpow to your favorite observability tools such as Prometheus, New Relic or Grafana for long-term reporting and alerting.
To get started see our how-to blogpost on alerting and monitoring with Kpow, Prometheus, and AlertManager.
Configuration
To enable Prometheus endpoints set the following environment variable:
PROMETHEUS_EGRESS=true
Once enabled, Kpow will log the available metric endpoints at startup:
* GET /metrics/v1 - all metrics
* GET /metrics/v1/cluster/:cluster-id - metrics for a specific cluster-id
* GET /metrics/v1/connect/:connect-id - metrics for a specific connect-id
* GET /metrics/v1/schema/:schema-id - metrics for a specific schema-id
* GET /metrics/v1/ksqldb/:ksqldb-id - metrics for a specific ksqldb-id
* GET /group-offsets/v1 - all group offset metrics
* GET /offsets/v1 - all topic offset metrics
* GET /streams/v1 - all Kafka streams metrics
* GET /streams/v1/state - all Kafka streams state metrics
The endpoint URLs are available on the same hostname and port that is configured to serve Kpow's user interface.
See Endpoints for more detailed documentation on each metric endpoint available.
Authentication
Prometheus Endpoints are not secure by default.
To secure all metric endpoints you can configure basic authentication:
PROMETHEUS_USERNAME=foo
PROMETHEUS_PASSWORD=bar
Metric names
The Prometheus metric name and label format specifies [a-zA-Z_][a-zA-Z0-9_]* as valid characters. Where Kafka resource names (e.g. groups, topics) contain characters outside of that range Kpow will convert non-matching characters to _.
Metric types
Each metric in the metrics glossary has a corresponding type. Below is an explanation of how to work with each type.
Gauge
A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.
Examples include: broker_bytes_disk and group_count
All metrics in the glossary marked as a meter are also represented as a gauge in the metrics endpoints.
Histogram
A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.
Examples include: group_offset_delta, broker_offset_lag and simple_broker_offset_delta
Note: a lot of times histograms have been used to represent aggregate metrics (such as group_offset_lag) where topic partition is the bucket. In such cases the histogram values can be used as follows:
group_offset_lag_sum- the actual aggregate lag of the consumer groupgroup_offset_lag_count- the number of topic partitions used to calculate thesumgroup_offset_lag- represents the percentiles (egquantile="0.95"in the metadata) and can be used to show the average lag distributed across topic partitions. In most cases you would probably usegroup_offset_lag_sumover this value.
Note: the bucket-as-partition pattern applies to all examples listed above. Histograms have been used in this case to reduce the overall cardinality of aggregate metrics, while still providing some useful stats about the individual topic partitions.
Endpoints
Kpow provides Prometheus endpoints for all metrics, topic and group offsets, and streams.
Base metrics
The base /metrics/v1 endpoint, without an added path, returns all metrics found in the metrics glossary for all Kafka clusters and resources.
https://HOSTNAME:PORT/metrics/v1
If you want only base metrics about a specific Kafka cluster, or resource append the following to the path:
https://HOSTNAME:PORT/metrics/v1/cluster/CLUSTER_ID
https://HOSTNAME:PORT/metrics/v1/schema/SCHEMA_ID
https://HOSTNAME:PORT/metrics/v1/ksqldb/KSQLDB_ID
https://HOSTNAME:PORT/metrics/v1/connect/CONNECT_ID
Topic offset metrics
The /offsets/v1 endpoint returns topic offset information at a topic partition level.
https://HOSTNAME:PORT/offsets/v1
Available metrics (topic partition granularity):
partition_startpartition_endtopic_end_sum
Group offset metrics
The /group-offsets/v1 endpoint returns group offset information for assigned topic partitions.
https://HOSTNAME:PORT/group-offsets/v1
Available metrics (group assignment granularity):
group_assignment_deltagroup_assignment_first_observedgroup_assignment_last_readgroup_assignment_offset
Kafka Streams metrics
Note: these endpoints collect and expose all running Kpow streams agent client metrics.
Base metrics
The /streams/v1 endpoint returns all Kafka streams metrics for all configured Kafka Streams agents.
Note: only metrics allowed by the configured io.factorhouse.kpow.MetricFilter of the agent will appear in the Prometheus endpoint.
https://HOSTNAME:PORT/streams/v1
State metrics
The /streams/v1/state endpoint returns Kafka streams state information for all configured Kafka Streams agents.
This maps to the KafkaStreams.State enum.
https://HOSTNAME:PORT/streams/v1/state
Sample scraper configuration
Sample Prometheus scraper configuration that we use to test Kpow:
scrape_configs:
- job_name: 'kpow'
metrics_path: '/metrics/v1'
static_configs:
- targets: ['host.docker.internal:3000']
- job_name: 'kpow_streams'
metrics_path: '/streams/v1'
static_configs:
- targets: ['host.docker.internal:3000']
- job_name: 'kpow_offsets'
metrics_path: '/offsets/v1'
static_configs:
- targets: ['host.docker.internal:3000']