
Overview
vSphere 8 introduced zonal supervisor cluster deployments in order to improve Tanzu workload resiliency, by enabling TKG clusters deployments across 3 vSphere clusters (where each cluster is mapped to a zone) providing wider faulty domains than just single vSphere cluster as was the case vSphere 7. In vSphere 8 HA zones, the supervisor cluster is stretched across 3 clusters and when TKG clusters are deployed on top of that. The stretched supervisor clusters allows users to control the zone placement of their workloads to achieve the most optimum availability for their distributed applications and mimics the same multi-zone experience as in the public cloud.
In a previous blog post (HERE) I enabled multi-zone (zonal) supervisor cluster deployment based on NSX-T as networking layer, in this post I will deploy a multi-zonal high available TKG cluster and eventually will run and test a distributed kubernetes hosted application on top.
Lab Inventory
For software versions I used the following:
-
- VMware ESXi 8.0 IA
- vCenter server version 8.0 IA
- VMware NSX 4.0.1.1
- TrueNAS 12.0-U7 used to provision NFS data stores to ESXi hosts.
- VyOS 1.4 used as lab backbone router and DHCP server.
- Ubuntu 20.04.2 LTS as DNS and internet gateway.
- Windows Server 2012 R2 Datacenter as management host for UI access.
For virtual hosts and appliances sizing I used the following specs:
-
- 7 x ESXi hosts each with 12 vCPUs, 2 x NICs and 128 GB RAM.
- vCenter server appliance with 2 vCPU and 24 GB RAM.
Deploy Zonal TKG cluster
Verifying Supervisor Cluster Zones
Before we dive in the configuration steps, let’s check the current supervisor cluster zonal deployment. From vCenter UI you can see that we have 3 defined availability zones under three vSphere clusters:
In addition, if you click on your vCenter server name (top left pane) and then choose Configure > vSphere Zones you can list the available zones:
I also created my first namespace (pindakaas) on which I will deploying my guest Tanzu Kubernetes Clusters (TKC)
Now, from our bootstrap/jumpbox Linux machine I will login to my supervisor cluster and inspect the supervisor nodes running
The supervisor cluster is up, running and in stable state. Next step is to deploy a guest Tanzu Cluster (TKC) on our 3 availability zones.
Creating TKC YAML deployment file
While the concept of deploying guest Tanzu Kubernetes Clusters (TKCs) on top of vSphere supervisor cluster is the same in the sense of creating deployment YAML with the following parameters:
- Defining Tanzu release to be used.
- Defining Node size
- Defining Storage Class
- Setting CNI parameters (optional)
However, the new v1alpha3 API adds a lot of new specs (for zonal deployment for example) in addition new TKR images format and many others, below is the sample deployment YAML which I used to deploy my multi-zone guest TKC
apiVersion: run.tanzu.vmware.com/v1alpha3 kind: TanzuKubernetesCluster metadata: name: multizone-tkc01 namespace: pindakaas annotations: run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu spec: topology: controlPlane: replicas: 3 vmClass: best-effort-medium storageClass: pindakaas-storagepolicy tkr: reference: name: v1.23.8---vmware.2-tkg.2-zshippable nodePools: - name: workers-pool-1 replicas: 1 failureDomain: tonychocoloney vmClass: best-effort-medium storageClass: pindakaas-storagepolicy tkr: reference: name: v1.23.8---vmware.2-tkg.2-zshippable - name: workers-pool-2 replicas: 1 failureDomain: stroopwaffels vmClass: best-effort-medium storageClass: pindakaas-storagepolicy tkr: reference: name: v1.23.8---vmware.2-tkg.2-zshippable - name: workers-pool-3 replicas: 1 failureDomain: pindas vmClass: best-effort-medium storageClass: pindakaas-storagepolicy tkr: reference: name: v1.23.8---vmware.2-tkg.2-zshippable settings: network: cni: name: antrea services: cidrBlocks: ["198.53.100.0/16"] pods: cidrBlocks: ["192.0.5.0/16"] serviceDomain: cluster.local
If you are familiar with vSphere with Tanzu TKC YAML then it will be obvious for you that in vSphere 8 VMware has introduced a new Tanzu API which is called TanzuKubernetesCluster v1alpha3. This API adds new capabilities to the TKC deployment such as
- Annotation usage to provision non-default OS for the VM nodes.
- Topology describes the number, purpose, organisation of nodes and the resources allocated for each.
- Nodes are grouped into pools based on their purpose `controlPlane` is special kind of a node pool while `nodePools` is for groups of worker nodes.
- For v1alpha3 API, the only supported TKR (Tanzu Kubernetes Reference) is v1.23.8+vmware.1-tkg.2-zshippable / v1.23.8+vmware.2-tkg.2-zshippable.
- failureDomain is the name of a vSphere Zone, and failureDomain is required for multi-zoned Supervisor and in a multi-zoned Supervisor, you will have 3 node pools each referencing a different failureDomain zone name.
The rest of the YAML is pretty much the same as v1alpha2 API, if you need more details about Tanzu APIs then you can reference VMware documentation. In my setup, the storage class pindakaas-storagepolicy is zonal storage policy that was previously created in my previous blog post HERE.
Verifying the deployed multi-zone TKC
After I applied the above YAML to my supervisor cluster (kubectl apply -f yaml-filename.yaml) the zonal TKC cluster deployment will kick in and in about 10 to 15 min (depends on the size of your nodes and config) you should see the TKC guest cluster created across the 3 zones we have:
From our bootstrap Linux host, lets login to the newly created multi-zone guest Tanzu cluster and verify that the newly configured control-plane and worker nodes are in Ready state:
kubectl-vsphere login –server=https://172.10.200.2 –insecure-skip-tls-verify -u administrator@vsphere.local –tanzu-kubernetes-cluster-namespace pindakaas –tanzu-kubernetes-cluster-name multizone-tkc01
kubectl config use-context multizone-tkc01
kubectl get nodes -o wide
Now, let us inspect how the supervisor cluster has allocated control-plane and worker nodes to zones. This is done by means of “label” control and worker nodes with special Tanzu labels, each labels is mapped to the availability zones we have (and we have 3 of them) and then this is used to place the nodes on the desired availability zone.
For a better output of node labels I will use jq to filter on the json output of my cluster nodes, which if you use Ubuntu you can download it using “sudo apt install jq -y“.
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.labels}' |jq '.'
This will result in a “prettified” output of the nodes and their assigned labels
Deploying a test distributed application
Our multi-zone guest cluster is now deployed and as a last step we will deploy on top of it a test distributed kubernetes application and verify its pods creation across zones.
For this step, I will use an app called yelb which was developed by a former VMware colleague. It is a simple application but has a cool UI to test micro-services.
first, we need to create a namespace and create a cluster role binding on the default Pod Security Policy (PSP) so that we can schedule pods on top of TKC guest clusters.
kubectl create clusterrolebinding default-tkg-admin-privileged-binding --clusterrole=psp:vmware-system-privileged --group=system:authenticated kubectl create ns yelb
Then, copy and paste the below YAML file and apply it on your cluster using “kubectl apply -f yelb-deployment-filename.yaml“
apiVersion: v1 kind: Service metadata: name: redis-server labels: app: redis-server tier: cache namespace: yelb spec: type: ClusterIP ports: - port: 6379 selector: app: redis-server tier: cache --- apiVersion: v1 kind: Service metadata: name: yelb-db labels: app: yelb-db tier: backenddb namespace: yelb spec: type: ClusterIP ports: - port: 5432 selector: app: yelb-db tier: backenddb --- apiVersion: v1 kind: Service metadata: name: yelb-appserver labels: app: yelb-appserver tier: middletier namespace: yelb spec: type: ClusterIP ports: - port: 4567 selector: app: yelb-appserver tier: middletier --- apiVersion: apps/v1 kind: Deployment metadata: name: yelb-ui namespace: yelb spec: replicas: 3 selector: matchLabels: app: yelb-ui tier: frontend secgroup: web template: metadata: labels: app: yelb-ui tier: frontend secgroup: web spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - pindas - stroopwaffels - tonychocoloney podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - yelb-ui topologyKey: topology.kubernetes.io/zone containers: - name: yelb-ui image: harbor-repo.vmware.com/dockerhub-proxy-cache/mreferre/yelb-ui@sha256:9df5e2611d6cf7cbc304104c18bb93ab3b185ae68ad25f75b655be1106cdd1b2 ports: - containerPort: 80 --- apiVersion: apps/v1 kind: Deployment metadata: name: redis-server namespace: yelb spec: replicas: 3 selector: matchLabels: app: redis-server tier: cache secgroup: cache template: metadata: labels: app: redis-server tier: cache secgroup: cache spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - pindas - stroopwaffels - tonychocoloney podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - redis-server topologyKey: topology.kubernetes.io/zone containers: - name: redis-server image: harbor-repo.vmware.com/challagandlp/mreferre/redis@sha256:3c07847e5aa6911cf5d9441642769d3b6cd0bf6b8576773ae3a0742056b9dd47 ports: - containerPort: 6379 # volumeMounts: # - name: redis-slave-data # mountPath: /data # volumes: # - name: redis-slave-data # persistantVolumeClaim: # claimName: redis-slave-claim --- apiVersion: apps/v1 kind: Deployment metadata: name: yelb-db namespace: yelb spec: replicas: 3 selector: matchLabels: app: yelb-db tier: backenddb secgroup: db template: metadata: labels: app: yelb-db tier: backenddb secgroup: db spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - pindas - stroopwaffels - tonychocoloney podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - yelb-db topologyKey: topology.kubernetes.io/zone containers: - name: yelb-db image: harbor-repo.vmware.com/challagandlp/mreferre/yelb-db@sha256:6412d2fe96ee71ca701932d47675c549fe0428dede6a7975d39d9a581dc46c0c ports: - containerPort: 5432 --- apiVersion: apps/v1 kind: Deployment metadata: name: yelb-appserver namespace: yelb spec: replicas: 3 selector: matchLabels: app: yelb-appserver tier: middletier secgroup: app template: metadata: labels: app: yelb-appserver tier: middletier secgroup: app spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - pindas - stroopwaffels - tonychocoloney podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - yelb-appserver topologyKey: topology.kubernetes.io/zone containers: - name: yelb-appserver image: harbor-repo.vmware.com/challagandlp/mreferre/yelb-appserver@sha256:db367946dc02cf38752ad925e0b0fbff0f5c6f9186ca481fb8541530879d9c8d ports: - containerPort: 4567 --- apiVersion: v1 kind: Service metadata: name: yelb-ui labels: app: yelb-ui tier: frontend namespace: yelb spec: type: LoadBalancer ports: - port: 80 protocol: TCP targetPort: 80 selector: app: yelb-ui tier: frontend
The important and new section in the above YAML is the following:
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - pindas - stroopwaffels - tonychocoloney podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - yelb-appserver topologyKey: topology.kubernetes.io/zone
The node affinity section is what tells the kube scheduler on which nodes the deployment pods should be scheduled, and in the above we match on our 3 zones, however we also need to make sure than no pods from same deployments coexists on the same node or zone (to ensure maximum HA) and thats why we need the podAntiAffinity parameter which instructs Kube-Scheduler to not schedule pods of same label together on same zone.
After the deployment is created, all the pods (which I configured 3 pod replicas per deployment i.e. 1 pod per zone) should be in running state
Let’s now inspect the 3 pods of the yelb-ui deployment and check on which node each pod is running. We should see that we have single pod per worker node i.e. per zone
I also created a loadbalancer service (see deployment YAML above) which exposes the UI pod on port 80, this service will be assigned a VIP from the underlying NSX infrastructure or if you are using NSX ALB or HA-Proxy then this will be assigned from the useable IP pool for virtual services.
From a web browser navigate to http://172.10.200.5 you should see the yelb app homepage