Running Kafka with Autoscale and Prometheus

Hi all,

I used a setup for CP-Kafka on K8S using Helm chart.
I noticed when I enable the JMX and Kafka Exporter containers for Prometheus, My HPA shows my Metric current status for memory and CPU as UKNOWN.

If I remove both the containers it works but then i Dont get any Metrics in the Prometheus.

My values.yaml:

Default values for cp-kafka.

This is a YAML-formatted file.

Declare variables to be passed into your templates.

------------------------------------------------------

Kafka

------------------------------------------------------

Number of Kafka replicas

replicas: 3
maxReplicas: 10

Image Info

image: confluentinc/cp-kafka
imageTag: 5.1.0

Specify a imagePullPolicy

imagePullPolicy: IfNotPresent

Specify an array of imagePullSecrets.

Secrets must be manually created in the namespace.

imagePullSecrets:

StatefulSet Config

Start and stop pods in Parallel or OrderedReady (one-by-one.)

podManagementPolicy: OrderedReady

The StatefulSet Update Strategy which Kafka will use when changes are applied: OnDelete or RollingUpdate

updateStrategy: RollingUpdate

Kafka Server properties

configurationOverrides:
“offsets.topic.replication.factor”: “3”

“default.replication.factor”: 3

“min.insync.replicas”: 2

“auto.create.topics.enable”: false

Options required for external access via NodePort

Advertised listeners will use the firstListenerPort value as it’s default unless overridden here.

Setting “advertised.listeners” here appends to “PLAINTEXT://${POD_IP}:9092,”

“advertised.listeners”: |-

EXTERNAL://{HOST_IP}:((31090 + ${KAFKA_BROKER_ID}))

#“advertised.listeners”: |-

PLAINTEXT://{POD_IP}:9092,EXTERNAL://{HOST_IP}:((31090 + {KAFKA_BROKER_ID##*-}))

“listener.security.protocol.map”: |-
PLAINTEXT:PLAINTEXT,EXTERNAL:PLAINTEXT

Additional env variables

customEnv: {}

persistence:
enabled: true
storageClass: “nfs-client”

The size of the PersistentVolume to allocate to each Kafka Pod in the StatefulSet. For

production servers this number should likely be much larger.

size: 1Gi

Kafka data Persistent Volume Storage Class

If defined, storageClassName:

If set to “-”, storageClassName: “”, which disables dynamic provisioning

If undefined (the default) or set to null, no storageClassName spec is

set, choosing the default provisioner. (gp2 on AWS, standard on

GKE, AWS & OpenStack)

storageClass: “”

disksPerBroker: 1

Kafka JVM Heap Option

heapOptions: “-Xms512M -Xmx512M”

resources:

We usually recommend not to specify default resources and to leave this as a conscious

choice for the user. This also increases chances charts run on environments with little

resources, such as Minikube. If you do want to specify resources, uncomment the following

lines, adjust them as necessary, and remove the curly braces after ‘resources:’.

limits:
cpu: “400m”
memory: “800Mi”
requests:
cpu: “200m”
memory: “600Mi”

Custom pod annotations

podAnnotations: {}

Node labels for pod assignment

nodeSelector: {}

Taints to tolerate on node assignment:

tolerations: {}

Monitoring

Kafka JMX Settings

jmx:
port: 5555

Prometheus Exporter Configuration

prometheus:

JMX Exporter Configuration

jmx:
enabled: true
#image: solsson/kafka-prometheus-jmx-exporter
#imageTag: 1.0.0
image: solsson/kafka-prometheus-jmx-exporter
imageTag: latest
port: 5556
## Resources configuration for the JMX exporter container.
## See the resources documentation above for details.
resources: {}

Prometheus Kafka Exporter: exposes complimentary metrics to JMX Exporter

kafka:
enabled: false
## The image to use for the metrics collector
image: danielqsj/kafka-exporter
## The image tag to use for the metrics collector
imageTag: v1.2.0
## Interval at which Prometheus scrapes metrics, note: only used by Prometheus Operator
interval: 10s
## Port kafka-exporter exposes for Prometheus to scrape metrics
port: 9308
## Resource limits
resources: {}
# limits:
# cpu: 200m
# memory: 1Gi
# requests:
# cpu: 100m
# memory: 100Mi

nodeport:
enabled: true
servicePort: 19092
firstListenerPort: 31090

topics:

  • name: error
    partitions: 10
    replicationFactor: 2
    #defaultConfig: “segment.bytes,segment.ms”
    config: “cleanup.policy=compact,delete.retention.ms=86400000”

  • name: source
    partitions: 10
    replicationFactor: 2
    #defaultConfig: “segment.bytes,segment.ms”
    config: “cleanup.policy=compact,delete.retention.ms=86400000”

------------------------------------------------------

Zookeeper

------------------------------------------------------

cp-zookeeper:

If true, install the cp-zookeeper chart alongside cp-kafka

ref: …/cp-zookeeper

enabled: true
servers: 3
persistence:
enabled: false
storageClass: “nfs-client”
dataDirSize: 5Gi
dataLogDirSize: 5Gi

If the Zookeeper Chart is disabled a URL and port are required to connect

url: “”

My HPA.yaml:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: kafka-{{ template “cp-kafka.name” . }}
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: kafka-{{ template “cp-kafka.name” . }}
minReplicas: {{ .Values.replicas }}
maxReplicas: {{ .Values.maxReplicas }}
metrics:

  • type: Resource
    resource:
    name: cpu
    target:
    type: Utilization
    averageUtilization: 75
  • type: Resource
    resource:
    name: memory
    target:
    type: Utilization
    averageUtilization: 75

My JMX-exporter.yaml

{{- if and .Values.prometheus.jmx.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template “cp-kafka.fullname” . }}-jmx-configmap
labels:
app: {{ template “cp-kafka.name” . }}
chart: {{ template “cp-kafka.chart” . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
data:
jmx-kafka-prometheus.yml: |+
#jmxUrl: service:jmx:rmi:///jndi/rmi://localhost:{{ .Values.jmx.port }}/jmxrmi
jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:{{ .Values.jmx.port }}/jmxrmi
lowercaseOutputName: true
lowercaseOutputLabelNames: true
ssl: false
rules:
- pattern : kafka.server<type=ReplicaManager, name=(.+)><>(Value|OneMinuteRate)
name: “cp_kafka_server_replicamanager_$1”
- pattern : kafka.controller<type=KafkaController, name=(.+)><>Value
name: “cp_kafka_controller_kafkacontroller_$1”
- pattern : kafka.server<type=BrokerTopicMetrics, name=(.+)><>OneMinuteRate
name: “cp_kafka_server_brokertopicmetrics_$1”
- pattern : kafka.network<type=RequestMetrics, name=RequestsPerSec, request=(.+)><>OneMinuteRate
name: “cp_kafka_network_requestmetrics_requestspersec_$1”
- pattern : kafka.network<type=SocketServer, name=NetworkProcessorAvgIdlePercent><>Value
name: “cp_kafka_network_socketserver_networkprocessoravgidlepercent”
- pattern : kafka.server<type=ReplicaFetcherManager, name=MaxLag, clientId=(.+)><>Value
name: “cp_kafka_server_replicafetchermanager_maxlag_$1”
- pattern : kafka.server<type=KafkaRequestHandlerPool, name=RequestHandlerAvgIdlePercent><>OneMinuteRate
name: “cp_kafka_kafkarequesthandlerpool_requesthandleravgidlepercent”
- pattern : kafka.controller<type=ControllerStats, name=(.+)><>OneMinuteRate
name: “cp_kafka_controller_controllerstats_$1”
- pattern : kafka.server<type=SessionExpireListener, name=(.+)><>OneMinuteRate
name: “cp_kafka_server_sessionexpirelistener_$1”
{{- end }}

My Kafka-exporter.yaml

{{- if .Values.prometheus.kafka.enabled }}
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: {{ template “cp-kafka.fullname” . }}-exporter
labels:
app: {{ template “cp-kafka.name” . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
chart: {{ .Chart.Name }}-{{ .Chart.Version | replace “+” “_” }}
spec:
replicas: 1
selector:
matchLabels:
app: {{ template “cp-kafka.name” . }}-exporter
release: {{ .Release.Name }}
template:
metadata:
annotations:
{{- if and .Values.prometheus.kafka.enabled }}
prometheus.io/scrape: “true”
prometheus.io/port: {{ .Values.prometheus.kafka.port | quote }}
{{- end }}
labels:
app: {{ template “cp-kafka.name” . }}-exporter
release: {{ .Release.Name }}
spec:
containers:
- image: “{{ .Values.prometheus.kafka.image }}:{{ .Values.prometheus.kafka.imageTag }}”
name: kafka-exporter
args:
- --kafka.server={{ template “cp-kafka.fullname” . }}:9092
- --web.listen-address=:{{ .Values.prometheus.kafka.port }}
ports:
- containerPort: {{ .Values.prometheus.kafka.port }}
resources:
{{ toYaml .Values.prometheus.kafka.resources | indent 10 }}
{{- if .Values.prometheus.kafka.tolerations }}
tolerations:
{{ toYaml .Values.prometheus.kafka.tolerations | indent 8 }}
{{- end }}
{{- if .Values.prometheus.kafka.affinity }}
affinity:
{{ toYaml .Values.prometheus.kafka.affinity | indent 8 }}
{{- end }}
{{- if .Values.prometheus.kafka.nodeSelector }}
nodeSelector:
{{ toYaml .Values.prometheus.kafka.nodeSelector | indent 8 }}
{{- end }}
{{- end }}