Hello everyone,
I’m trying to set up node-problem detector (npd) in my cluster which would send logs about the nodes status and the pod status. I’m following this article: Monitor Node Health | Kubernetes. And since it mentions creating a ConfigMap, I have used the config folder from https://github.com/kubernetes/node-problem-detector.
However, I’m facing a few issues:
- There are no logs in any of the pods of npd. I tried deleting pods from the node and even restart the node. None of the events have been logged.
- There was a mention of using Kubernetes Exporter, however, I do not have much information about how to set up the exporter.
Cluster information:
Kubernetes version: 1.21
Cloud being used: public cloud (IBM Cloud)
npd.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-problem-detector
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: npd-binding
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:node-problem-detector
subjects:
- kind: ServiceAccount
name: node-problem-detector
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: npd-v0.8.9
namespace: kube-system
labels:
k8s-app: node-problem-detector
version: v0.8.9
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
selector:
matchLabels:
k8s-app: node-problem-detector
version: v0.8.9
template:
metadata:
labels:
k8s-app: node-problem-detector
version: v0.8.9
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: node-problem-detector
image: k8s.gcr.io/node-problem-detector/node-problem-detector:v0.8.9
command:
- "/bin/sh"
- "-c"
- "exec /node-problem-detector --logtostderr --config.system-log-monitor=/config/kernel-monitor.json,/config/docker-monitor.json,/config/systemd-monitor.json --config.custom-plugin-monitor=/config/kernel-monitor-counter.json,/config/systemd-monitor-counter.json --config.system-stats-monitor=/config/system-stats-monitor.json >>/var/log/node-problem-detector.log 2>&1"
securityContext:
privileged: true
resources:
limits:
cpu: "200m"
memory: "100Mi"
requests:
cpu: "20m"
memory: "20Mi"
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: log
mountPath: /var/log
- name: config
mountPath: /config
readOnly: true
- name: localtime
mountPath: /etc/localtime
readOnly: true
volumes:
- name: log
hostPath:
path: /var/log/
- name: config
configMap:
name: node-problem-detector-config
- name: localtime
hostPath:
path: /etc/localtime
type: "FileOrCreate"
serviceAccountName: node-problem-detector
tolerations:
- operator: "Exists"
effect: "NoExecute"
- key: "CriticalAddonsOnly"
operator: "Exists"