Running Kafka with Autoscale and Prometheus

aviram · March 19, 2019, 8:52am

Hi all,

I used a setup for CP-Kafka on K8S using Helm chart.
I noticed when I enable the JMX and Kafka Exporter containers for Prometheus, My HPA shows my Metric current status for memory and CPU as UKNOWN.

If I remove both the containers it works but then i Dont get any Metrics in the Prometheus.

My values.yaml:

Default values for cp-kafka.

This is a YAML-formatted file.

Declare variables to be passed into your templates.

------------------------------------------------------

Kafka

------------------------------------------------------

Number of Kafka replicas

replicas: 3
maxReplicas: 10

Image Info

image: confluentinc/cp-kafka
imageTag: 5.1.0

Specify a imagePullPolicy

imagePullPolicy: IfNotPresent

Specify an array of imagePullSecrets.

Secrets must be manually created in the namespace.

imagePullSecrets:

StatefulSet Config

Start and stop pods in Parallel or OrderedReady (one-by-one.)

podManagementPolicy: OrderedReady

The StatefulSet Update Strategy which Kafka will use when changes are applied: OnDelete or RollingUpdate

updateStrategy: RollingUpdate

Kafka Server properties

configurationOverrides:
“offsets.topic.replication.factor”: “3”

“default.replication.factor”: 3

“min.insync.replicas”: 2

“auto.create.topics.enable”: false

Options required for external access via NodePort

Advertised listeners will use the firstListenerPort value as it’s default unless overridden here.

Setting “advertised.listeners” here appends to “PLAINTEXT://${POD_IP}:9092,”

“advertised.listeners”: |-

EXTERNAL://{HOST_IP}:((31090 + ${KAFKA_BROKER_ID}))

#“advertised.listeners”: |-

PLAINTEXT://{POD_IP}:9092,EXTERNAL://{HOST_IP}:((31090 + {KAFKA_BROKER_ID##*-}))

“listener.security.protocol.map”: |-
PLAINTEXT:PLAINTEXT,EXTERNAL:PLAINTEXT

Additional env variables

customEnv: {}

persistence:
enabled: true
storageClass: “nfs-client”

The size of the PersistentVolume to allocate to each Kafka Pod in the StatefulSet. For

production servers this number should likely be much larger.

size: 1Gi

Kafka data Persistent Volume Storage Class

If defined, storageClassName:

If set to “-”, storageClassName: “”, which disables dynamic provisioning

If undefined (the default) or set to null, no storageClassName spec is

set, choosing the default provisioner. (gp2 on AWS, standard on

GKE, AWS & OpenStack)

storageClass: “”

disksPerBroker: 1

Kafka JVM Heap Option

heapOptions: “-Xms512M -Xmx512M”

resources:

We usually recommend not to specify default resources and to leave this as a conscious

choice for the user. This also increases chances charts run on environments with little

resources, such as Minikube. If you do want to specify resources, uncomment the following

lines, adjust them as necessary, and remove the curly braces after ‘resources:’.

limits:
cpu: “400m”
memory: “800Mi”
requests:
cpu: “200m”
memory: “600Mi”

Custom pod annotations

podAnnotations: {}

Node labels for pod assignment

nodeSelector: {}

Taints to tolerate on node assignment:

tolerations: {}

Monitoring

Kafka JMX Settings

jmx:
port: 5555

Prometheus Exporter Configuration

prometheus:

JMX Exporter Configuration

jmx:
enabled: true
#image: solsson/kafka-prometheus-jmx-exporter
#imageTag: 1.0.0
image: solsson/kafka-prometheus-jmx-exporter
imageTag: latest
port: 5556
## Resources configuration for the JMX exporter container.
## See the resources documentation above for details.
resources: {}

Prometheus Kafka Exporter: exposes complimentary metrics to JMX Exporter

kafka:
enabled: false
## The image to use for the metrics collector
image: danielqsj/kafka-exporter
## The image tag to use for the metrics collector
imageTag: v1.2.0
## Interval at which Prometheus scrapes metrics, note: only used by Prometheus Operator
interval: 10s
## Port kafka-exporter exposes for Prometheus to scrape metrics
port: 9308
## Resource limits
resources: {}
# limits:
# cpu: 200m
# memory: 1Gi
# requests:
# cpu: 100m
# memory: 100Mi

nodeport:
enabled: true
servicePort: 19092
firstListenerPort: 31090

topics:

name: error
partitions: 10
replicationFactor: 2
#defaultConfig: “segment.bytes,segment.ms”
config: “cleanup.policy=compact,delete.retention.ms=86400000”
name: source
partitions: 10
replicationFactor: 2
#defaultConfig: “segment.bytes,segment.ms”
config: “cleanup.policy=compact,delete.retention.ms=86400000”

------------------------------------------------------

Zookeeper

------------------------------------------------------

cp-zookeeper:

If true, install the cp-zookeeper chart alongside cp-kafka

ref: …/cp-zookeeper

enabled: true
servers: 3
persistence:
enabled: false
storageClass: “nfs-client”
dataDirSize: 5Gi
dataLogDirSize: 5Gi

If the Zookeeper Chart is disabled a URL and port are required to connect

url: “”

My HPA.yaml:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: kafka-{{ template “cp-kafka.name” . }}
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: kafka-{{ template “cp-kafka.name” . }}
minReplicas: {{ .Values.replicas }}
maxReplicas: {{ .Values.maxReplicas }}
metrics:

type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75

My JMX-exporter.yaml

{{- if and .Values.prometheus.jmx.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template “cp-kafka.fullname” . }}-jmx-configmap
labels:
app: {{ template “cp-kafka.name” . }}
chart: {{ template “cp-kafka.chart” . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
data:
jmx-kafka-prometheus.yml: |+
#jmxUrl: service:jmx:rmi:///jndi/rmi://localhost:{{ .Values.jmx.port }}/jmxrmi
jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:{{ .Values.jmx.port }}/jmxrmi
lowercaseOutputName: true
lowercaseOutputLabelNames: true
ssl: false
rules:
- pattern : kafka.server<type=ReplicaManager, name=(.+)><>(Value|OneMinuteRate)
name: “cp_kafka_server_replicamanager_$1”
- pattern : kafka.controller<type=KafkaController, name=(.+)><>Value
name: “cp_kafka_controller_kafkacontroller_$1”
- pattern : kafka.server<type=BrokerTopicMetrics, name=(.+)><>OneMinuteRate
name: “cp_kafka_server_brokertopicmetrics_$1”
- pattern : kafka.network<type=RequestMetrics, name=RequestsPerSec, request=(.+)><>OneMinuteRate
name: “cp_kafka_network_requestmetrics_requestspersec_$1”
- pattern : kafka.network<type=SocketServer, name=NetworkProcessorAvgIdlePercent><>Value
name: “cp_kafka_network_socketserver_networkprocessoravgidlepercent”
- pattern : kafka.server<type=ReplicaFetcherManager, name=MaxLag, clientId=(.+)><>Value
name: “cp_kafka_server_replicafetchermanager_maxlag_$1”
- pattern : kafka.server<type=KafkaRequestHandlerPool, name=RequestHandlerAvgIdlePercent><>OneMinuteRate
name: “cp_kafka_kafkarequesthandlerpool_requesthandleravgidlepercent”
- pattern : kafka.controller<type=ControllerStats, name=(.+)><>OneMinuteRate
name: “cp_kafka_controller_controllerstats_$1”
- pattern : kafka.server<type=SessionExpireListener, name=(.+)><>OneMinuteRate
name: “cp_kafka_server_sessionexpirelistener_$1”
{{- end }}

My Kafka-exporter.yaml

{{- if .Values.prometheus.kafka.enabled }}
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: {{ template “cp-kafka.fullname” . }}-exporter
labels:
app: {{ template “cp-kafka.name” . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
chart: {{ .Chart.Name }}-{{ .Chart.Version | replace “+” “_” }}
spec:
replicas: 1
selector:
matchLabels:
app: {{ template “cp-kafka.name” . }}-exporter
release: {{ .Release.Name }}
template:
metadata:
annotations:
{{- if and .Values.prometheus.kafka.enabled }}
prometheus.io/scrape: “true”
prometheus.io/port: {{ .Values.prometheus.kafka.port | quote }}
{{- end }}
labels:
app: {{ template “cp-kafka.name” . }}-exporter
release: {{ .Release.Name }}
spec:
containers:
- image: “{{ .Values.prometheus.kafka.image }}:{{ .Values.prometheus.kafka.imageTag }}”
name: kafka-exporter
args:
- --kafka.server={{ template “cp-kafka.fullname” . }}:9092
- --web.listen-address=:{{ .Values.prometheus.kafka.port }}
ports:
- containerPort: {{ .Values.prometheus.kafka.port }}
resources:
{{ toYaml .Values.prometheus.kafka.resources | indent 10 }}
{{- if .Values.prometheus.kafka.tolerations }}
tolerations:
{{ toYaml .Values.prometheus.kafka.tolerations | indent 8 }}
{{- end }}
{{- if .Values.prometheus.kafka.affinity }}
affinity:
{{ toYaml .Values.prometheus.kafka.affinity | indent 8 }}
{{- end }}
{{- if .Values.prometheus.kafka.nodeSelector }}
nodeSelector:
{{ toYaml .Values.prometheus.kafka.nodeSelector | indent 8 }}
{{- end }}
{{- end }}

Topic		Replies	Views
Horizontal auto-scaling is not working properly General Discussions	3	920	May 31, 2019
CPU and memory metrics not working General Discussions	5	3707	January 28, 2020
Autoscaling of cluster on my own servers General Discussions	6	1102	July 3, 2019
Geting error while autoscaling with heapster General Discussions development	0	626	November 4, 2018
Trying to set up hpa in a kind context -- metrics not exposed General Discussions development	0	976	December 29, 2022

Running Kafka with Autoscale and Prometheus

Default values for cp-kafka.

This is a YAML-formatted file.

Declare variables to be passed into your templates.

------------------------------------------------------

Kafka

------------------------------------------------------

Number of Kafka replicas

Image Info

Specify a imagePullPolicy

Specify an array of imagePullSecrets.

Secrets must be manually created in the namespace.

StatefulSet Config

Start and stop pods in Parallel or OrderedReady (one-by-one.)

The StatefulSet Update Strategy which Kafka will use when changes are applied: OnDelete or RollingUpdate

Kafka Server properties

“default.replication.factor”: 3

“min.insync.replicas”: 2

“auto.create.topics.enable”: false

Options required for external access via NodePort

Advertised listeners will use the firstListenerPort value as it’s default unless overridden here.

Setting “advertised.listeners” here appends to “PLAINTEXT://${POD_IP}:9092,”

“advertised.listeners”: |-

EXTERNAL://{HOST_IP}:((31090 + ${KAFKA_BROKER_ID}))

PLAINTEXT://{POD_IP}:9092,EXTERNAL://{HOST_IP}:((31090 + {KAFKA_BROKER_ID##*-}))

Additional env variables

The size of the PersistentVolume to allocate to each Kafka Pod in the StatefulSet. For

production servers this number should likely be much larger.

Kafka data Persistent Volume Storage Class

If defined, storageClassName:

If set to “-”, storageClassName: “”, which disables dynamic provisioning

If undefined (the default) or set to null, no storageClassName spec is

set, choosing the default provisioner. (gp2 on AWS, standard on

GKE, AWS & OpenStack)

storageClass: “”

Kafka JVM Heap Option

We usually recommend not to specify default resources and to leave this as a conscious

choice for the user. This also increases chances charts run on environments with little

resources, such as Minikube. If you do want to specify resources, uncomment the following

lines, adjust them as necessary, and remove the curly braces after ‘resources:’.

Custom pod annotations

Node labels for pod assignment

Taints to tolerate on node assignment:

Monitoring

Kafka JMX Settings

Prometheus Exporter Configuration

JMX Exporter Configuration

Prometheus Kafka Exporter: exposes complimentary metrics to JMX Exporter

------------------------------------------------------

Zookeeper

------------------------------------------------------

If true, install the cp-zookeeper chart alongside cp-kafka

ref: …/cp-zookeeper

If the Zookeeper Chart is disabled a URL and port are required to connect

Related topics