K8s in-place功能异常报错

操作系统信息

物理机,Centos7.5,4C/8G

Linux 172.30.94.201 6.3.4-1.el7.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 23 18:41:06 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes版本信息

WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:20:07Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}

Kustomize Version: v5.0.1

Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:13:28Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}

容器运行时


Client: Docker Engine - Community

Version: 24.0.1

API version: 1.43

Go version: go1.20.4

Git commit: 6802122

Built: Fri May 19 18:06:42 2023

OS/Arch: linux/amd64

Context: default

Server: Docker Engine - Community

Engine:

Version: 24.0.1

API version: 1.43 (minimum version 1.12)

Go version: go1.20.4

Git commit: 463850e

Built: Fri May 19 18:05:43 2023

OS/Arch: linux/amd64

Experimental: false

containerd:

Version: 1.6.21

GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8

runc:

Version: 1.1.7

GitCommit: v1.1.7-0-g860f061

docker-init:

Version: 0.19.0

GitCommit: de40ad0

问题是什么

按照官方文档,https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/,
测试in-place功能,在创建完一个pod后,对其limit资源进行修改kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"cpu":"800m"}, "limits":{"cpu":"800m"}}}]}}'

pod状态变为RunContainerError,

错误日志

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Scheduled 2m41s default-scheduler Successfully assigned qos-example/qos-demo-5 to 172.30.94.201

Normal Pulled 2m38s kubelet Successfully pulled image "nginx" in 2.468850418s (2.468866698s including waiting)

Normal Started 2m38s kubelet Started container qos-demo-ctr-5

Normal Killing 27s kubelet Container qos-demo-ctr-5 definition changed, will be restarted

Normal Pulled 25s kubelet Successfully pulled image "nginx" in 2.412743103s (2.412766469s including waiting)

Normal Pulled 22s kubelet Successfully pulled image "nginx" in 2.484909047s (2.484925388s including waiting)

Normal Pulling 9s (x4 over 2m41s) kubelet Pulling image "nginx"

Normal Created 7s (x4 over 2m38s) kubelet Created container qos-demo-ctr-5

Warning Failed 7s (x3 over 25s) kubelet Error: failed to start container "qos-demo-ctr-5": Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: failed to write "80000": write /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-poda24bc212_b6c2_4fd4_af47_111b6b937b10.slice/qos-demo-ctr-5/cpu.cfs_quota_us: invalid argument: unknown

Normal Pulled 7s kubelet Successfully pulled image "nginx" in 2.314366627s (2.314379589s including waiting)

自我分析

  1. 错误信息里显示修改cgroup失败,参数不正确,第一反应就是路径问题,查看了/sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-poda24bc212_b6c2_4fd4_af47_111b6b937b10.slice路径后发现确实没有qos-demo-ctr-5这个路径;
  2. 同时,这个pod的容器的cgroup信息存在于路径/sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-pod228d1abb_f434_469a_bedd_07e80085d20a.slice/5767c96db6ec24d60d59c07ac27fa77d69933a07b9cd69bceb8594ccc8ddab2d