Hi,
i have a kubernetes cluster on the ovh cloud.
Cluster information:
Kubernetes version: 1.24.16-0
Today nginx responded with a 503 error all of a sudden when calling the website.
I then checked the Kubernetes cluster with kubectl get pods
and could see that all pods associated with a particular volume were no longer ready.
All pods have FailedAttachVolume and FailedMount errors displayed in the events.
As an example the event log of a pod:
Warning FailedAttachVolume 15m attachdetach-controller AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-b1820e9f-935b-442e-b68e-efe7de0feb35)"}}
Warning FailedAttachVolume 12m (x2 over 14m) attachdetach-controller AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-c6e51d31-8646-44a2-ba75-7069e3ed87fa)"}}
Warning FailedAttachVolume 10m attachdetach-controller AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-a5c9c89e-5578-4b6c-8722-acf583dea1a8)"}}
Warning FailedMount 3m18s (x4 over 12m) kubelet Unable to attach or mount volumes: unmounted volumes=[files-volume], unattached volumes=[kube-api-access-q9pjc files-volume]: timed out waiting for the condition
Warning FailedMount 63s (x3 over 14m) kubelet Unable to attach or mount volumes: unmounted volumes=[files-volume], unattached volumes=[files-volume kube-api-access-q9pjc]: timed out waiting for the condition
Warning FailedAttachVolume 11s (x5 over 8m21s) attachdetach-controller (combined from similar events): AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-39554882-3f5c-40e9-aad2-482ff427c632)"}}
The pvc is integrated in all deployments as follows:
spec:
...
template:
spec:
...
containers:
- name: ...
...
volumeMounts:
- name: files-volume
mountPath: /files
...
volumes:
- name: files-volume
persistentVolumeClaim:
claimName: pv-files-claim
The pvc looks like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-files-claim
spec:
storageClassName: csi-cinder-high-speed
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
How can this error occur and how do I fix it? And how can I prevent the error in the future?
In the meantime, all pods except for one have reconnected to the volume by themselves. However, it does not seem to work with one pod.