Asking for help? Comment out what you need so we can get more information to help you!
Cluster information:
Kubernetes version: 1.26
Cloud being used: (put bare-metal if not on a public cloud)
Installation method: EKS
Host OS:
CNI and version: Calico
CRI and version: Crio
You can format your yaml by highlighting it and pressing Ctrl-Shift-C, it will make your output easier to read.
EBS isn’t supported across multiple availability zones which means if a POD in zone A crashes there’s a possibility that it may get recreated in Zone B so will not be able to bind to it’s PVC and start. The EBS storage driver does not have this functionality (only the Karpenter add-on does). For example to schedule two redis pods on two nodes in Zone A and prevent them from running on the same nodes you could label two nodes with “zone=a” (increase # of replicas to 2, add zone label, add anti-affinity)
spec:
replicas: 2
template:
spec:
nodeSelector:
zone: a
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
Which is similar to what’s mentioned here:
# EKS Data Plane
To operate high-available and resilient applications, you need a highly-available and resilient data plane. An elastic data plane ensures that Kubernetes can scale and heal your applications automatically. A resilient data plane consists of two or more worker nodes, can grow and shrink with the workload, and automatically recover from failures.
You have two choices for worker nodes with EKS: [EC2 instances](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) and [Fargate](https://docs.aws.amazon.com/eks/latest/userguide/fargate.html). If you choose EC2 instances, you can manage the worker nodes yourself or use [EKS managed node groups](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html). You can have a cluster with a mix of managed, self-managed worker nodes, and Fargate.
EKS on Fargate offers the easiest path to a resilient data plane. Fargate runs each Pod in an isolated compute environment. Each Pod running on Fargate gets its own worker node. Fargate automatically scales the data plane as Kubernetes scales pods. You can scale both the data plane and your workload by using the [horizontal pod autoscaler](https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html).
The preferred way to scale EC2 worker nodes is by using [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md), [EC2 Auto Scaling groups](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html) or community projects like [Atlassian’s Esclator](https://github.com/atlassian/escalator).
## Recommendations
### Use EC2 Auto Scaling Groups to create worker nodes
It is a best practice to create worker nodes using EC2 Auto Scaling groups instead of creating individual EC2 instances and joining them to the cluster. Auto Scaling Groups will automatically replace any terminated or failed nodes ensuring that the cluster always has the capacity to run your workload.
### Use Kubernetes Cluster Autoscaler to scale nodes
Cluster Autoscaler adjusts the size of the data plane when there are pods that cannot be run because the cluster has insufficient resources, and adding another worker node would help. Although Cluster Autoscaler is a reactive process, it waits until pods are in *Pending* state due to insufficient capacity in the cluster. When such an event occurs, it adds EC2 instances to the cluster. Whenever the cluster runs out of capacity, new replicas - or new pods - will be unavailable (*in Pending state*) until worker nodes are added. This delay may impact your applications' reliability if the data plane cannot scale fast enough to meet the demands of the workload. If a worker node is consistently underutilized and all of its pods can be scheduled on other worker nodes, Cluster Autoscaler terminates it.
This file has been truncated. show original
…but what if you have an sts with three replicas,one in each AZ which binds to a respective PV in each Zone… how do we make sure each replica stays in the zone it belongs to in case the node crashes ? Would this be where topology constraints come into the picture ?
Thx for any advice in advance !!
Looks like the EBS CSI driver handles this automatically if binding mode is “WaitForFirstConsumer”
As per the following:
# CreateVolume (`StorageClass`) Parameters
## Supported Parameters
There are several optional parameters that may be passed into `CreateVolumeRequest.parameters` map, these parameters can be configured in StorageClass, see [example](../examples/kubernetes/storageclass). Unless explicitly noted, all parameters are case insensitive (e.g. "kmsKeyId", "kmskeyid" and any other combination of upper/lowercase characters can be used).
The AWS EBS CSI Driver supports [tagging](tagging.md) through `StorageClass.parameters` (in v1.6.0 and later).
| Parameters | Values | Default | Description |
|------------------------------|----------------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| "csi.storage.k8s.io/fstype" | xfs, ext2, ext3, ext4 | ext4 | File system type that will be formatted during volume creation. This parameter is case sensitive! |
| "type" | io1, io2, gp2, gp3, sc1, st1, standard, sbp1, sbg1 | gp3* | EBS volume type. |
| "iopsPerGB" | | | I/O operations per second per GiB. Can be specified for IO1, IO2, and GP3 volumes. |
| "allowAutoIOPSPerGBIncrease" | true, false | false | When `"true"`, the CSI driver increases IOPS for a volume when `iopsPerGB * <volume size>` is too low to fit into IOPS range supported by AWS. This allows dynamic provisioning to always succeed, even when user specifies too small PVC capacity or `iopsPerGB` value. On the other hand, it may introduce additional costs, as such volumes have higher IOPS than requested in `iopsPerGB`. |
| "iops" | | | I/O operations per second. Can be specified for IO1, IO2, and GP3 volumes. |
| "throughput" | | 125 | Throughput in MiB/s. Only effective when gp3 volume type is specified. If empty, it will set to 125MiB/s as documented [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html). |
| "encrypted" | true, false | false | Whether the volume should be encrypted or not. Valid values are "true" or "false". |
| "blockExpress" | true, false | false | Enables the creation of [io2 B
This file has been truncated. show original
“The EBS CSI Driver supports the WaitForFirstConsumer volume binding mode in Kubernetes. When using WaitForFirstConsumer binding mode the volume will automatically be created in the appropriate Availability Zone and with the appropriate topology. The WaitForFirstConsumer binding mode is recommended whenever possible for dynamic provisioning”