AWS: EBS & Zones

John_Gallagher · October 20, 2023, 6:08pm

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version: 1.26
Cloud being used: (put bare-metal if not on a public cloud)
Installation method: EKS
Host OS:
CNI and version: Calico
CRI and version: Crio

You can format your yaml by highlighting it and pressing Ctrl-Shift-C, it will make your output easier to read.

EBS isn’t supported across multiple availability zones which means if a POD in zone A crashes there’s a possibility that it may get recreated in Zone B so will not be able to bind to it’s PVC and start. The EBS storage driver does not have this functionality (only the Karpenter add-on does). For example to schedule two redis pods on two nodes in Zone A and prevent them from running on the same nodes you could label two nodes with “zone=a” (increase # of replicas to 2, add zone label, add anti-affinity)

spec:
replicas: 2
template:
spec:
nodeSelector:
zone: a
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:

labelSelector:
matchLabels:
pod-name:redis-ha-0
topologyKey: kubernetes.io/hostname

Which is similar to what’s mentioned here:

github.com

aws/aws-eks-best-practices/blob/master/content/reliability/docs/dataplane.md#ensure-capacity-in-each-az-when-using-ebs-volumes

# EKS Data Plane

To operate high-available and resilient applications, you need a highly-available and resilient data plane. An elastic data plane ensures that Kubernetes can scale and heal your applications automatically. A resilient data plane consists of two or more worker nodes, can grow and shrink with the workload, and automatically recover from failures.

You have two choices for worker nodes with EKS: [EC2 instances](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) and [Fargate](https://docs.aws.amazon.com/eks/latest/userguide/fargate.html). If you choose EC2 instances, you can manage the worker nodes yourself or use [EKS managed node groups](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html). You can have a cluster with a mix of managed, self-managed worker nodes, and Fargate. 

EKS on Fargate offers the easiest path to a resilient data plane. Fargate runs each Pod in an isolated compute environment. Each Pod running on Fargate gets its own worker node. Fargate automatically scales the data plane as Kubernetes scales pods. You can scale both the data plane and your workload by using the [horizontal pod autoscaler](https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html).

The preferred way to scale EC2 worker nodes is by using [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md), [EC2 Auto Scaling groups](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html) or community projects like [Atlassian’s Esclator](https://github.com/atlassian/escalator).

## Recommendations 

### Use EC2 Auto Scaling Groups to create worker nodes

It is a best practice to create worker nodes using EC2 Auto Scaling groups instead of creating individual EC2 instances and joining them to the cluster. Auto Scaling Groups will automatically replace any terminated or failed nodes ensuring that the cluster always has the capacity to run your workload. 

### Use Kubernetes Cluster Autoscaler to scale nodes

Cluster Autoscaler adjusts the size of the data plane when there are pods that cannot be run because the cluster has insufficient resources, and adding another worker node would help. Although Cluster Autoscaler is a reactive process, it waits until pods are in *Pending* state due to insufficient capacity in the cluster. When such an event occurs, it adds EC2 instances to the cluster. Whenever the cluster runs out of capacity, new replicas - or new pods - will be unavailable (*in Pending state*) until worker nodes are added. This delay may impact your applications' reliability if the data plane cannot scale fast enough to meet the demands of the workload. If a worker node is consistently underutilized and all of its pods can be scheduled on other worker nodes, Cluster Autoscaler terminates it.

This file has been truncated. show original

…but what if you have an sts with three replicas,one in each AZ which binds to a respective PV in each Zone… how do we make sure each replica stays in the zone it belongs to in case the node crashes ? Would this be where topology constraints come into the picture ?

Thx for any advice in advance !!

John_Gallagher · October 23, 2023, 3:16pm

Looks like the EBS CSI driver handles this automatically if binding mode is “WaitForFirstConsumer”

As per the following:

github.com

kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/parameters.md#volume-availability-zone-and-topologies

# CreateVolume (`StorageClass`) Parameters

## Supported Parameters
There are several optional parameters that may be passed into `CreateVolumeRequest.parameters` map, these parameters can be configured in StorageClass, see [example](../examples/kubernetes/storageclass). Unless explicitly noted, all parameters are case insensitive (e.g. "kmsKeyId", "kmskeyid" and any other combination of upper/lowercase characters can be used).

The AWS EBS CSI Driver supports [tagging](tagging.md) through `StorageClass.parameters` (in v1.6.0 and later). 

| Parameters                   | Values                                             | Default | Description                                                                                                                                                                                                                                                                                                                                                                                    |
|------------------------------|----------------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| "csi.storage.k8s.io/fstype"  | xfs, ext2, ext3, ext4                              | ext4    | File system type that will be formatted during volume creation. This parameter is case sensitive!                                                                                                                                                                                                                                                                                              |
| "type"                       | io1, io2, gp2, gp3, sc1, st1, standard, sbp1, sbg1 | gp3*    | EBS volume type.                                                                                                                                                                                                                                                                                                                                                                               |
| "iopsPerGB"                  |                                                    |         | I/O operations per second per GiB. Can be specified for IO1, IO2, and GP3 volumes.                                                                                                                                                                                                                                                                                                             |
| "allowAutoIOPSPerGBIncrease" | true, false                                        | false   | When `"true"`, the CSI driver increases IOPS for a volume when `iopsPerGB * <volume size>` is too low to fit into IOPS range supported by AWS. This allows dynamic provisioning to always succeed, even when user specifies too small PVC capacity or `iopsPerGB` value. On the other hand, it may introduce additional costs, as such volumes have higher IOPS than requested in `iopsPerGB`. |
| "iops"                       |                                                    |         | I/O operations per second. Can be specified for IO1, IO2, and GP3 volumes.                                                                                                                                                                                                                                                                                                                     |
| "throughput"                 |                                                    | 125     | Throughput in MiB/s. Only effective when gp3 volume type is specified. If empty, it will set to 125MiB/s as documented [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html).                                                                                                                                                                                      |
| "encrypted"                  | true, false                                        | false   | Whether the volume should be encrypted or not. Valid values are "true" or "false".                                                                                                                                                                                                                                                                                                             |
| "blockExpress"               | true, false                                        | false   | Enables the creation of [io2 B

This file has been truncated. show original

“The EBS CSI Driver supports the WaitForFirstConsumer volume binding mode in Kubernetes. When using WaitForFirstConsumer binding mode the volume will automatically be created in the appropriate Availability Zone and with the appropriate topology. The WaitForFirstConsumer binding mode is recommended whenever possible for dynamic provisioning”

Topic		Replies	Views
New to k8s - Trying to deploy stateful application on AWS EKS with dockerized postgres General Discussions	8	3152	June 23, 2021
Multiple Replicas of a Pod with a Large Dataset General Discussions	3	958	February 15, 2020
Kubernetes.io Blog: Topology-Aware Volume Provisioning in Kubernetes General Discussions	0	630	October 12, 2018
EKS Cluster - Windows Node - Pods deployment Windows	2	2814	June 4, 2020
Out of the Clouds onto the Ground: How to Make Kubernetes Production Grade Anywhere General Discussions k8s-blog	2	2431	November 18, 2018

AWS: EBS & Zones

Cluster information:

Related topics