Overview

vSphere 8 introduced zonal supervisor cluster deployments in order to improve Tanzu workload resiliency, by enabling TKG clusters deployments across 3 vSphere clusters (where each cluster is mapped to a zone) providing wider faulty domains than just single vSphere cluster as was the case vSphere 7. In vSphere 8 HA zones, the supervisor cluster is stretched across 3 clusters and when TKG clusters are deployed on top of that. The stretched supervisor clusters allows users to control the zone placement of their workloads to achieve the most optimum availability for their distributed applications and mimics the same multi-zone experience as in the public cloud.

In a previous blog post (HERE) I enabled multi-zone (zonal) supervisor cluster deployment based on NSX-T as networking layer, in this post I will deploy a multi-zonal high available TKG cluster and eventually will run and test a distributed kubernetes hosted application on top.

Lab Inventory

For software versions I used the following:

    • VMware ESXi 8.0 IA
    • vCenter server version 8.0 IA
    • VMware NSX 4.0.1.1
    • TrueNAS 12.0-U7 used to provision NFS data stores to ESXi hosts.
    • VyOS 1.4 used as lab backbone router and DHCP server.
    • Ubuntu 20.04.2 LTS as DNS and internet gateway.
    • Windows Server 2012 R2 Datacenter as management host for UI access.

For virtual hosts and appliances sizing I used the following specs:

    • 7 x ESXi hosts each with 12 vCPUs, 2 x NICs and 128 GB RAM.
    • vCenter server appliance with 2 vCPU and 24 GB RAM.

Deploy Zonal TKG cluster

Verifying Supervisor Cluster Zones

Before we dive in the configuration steps, let’s check the current supervisor cluster zonal deployment. From vCenter UI you can see that we have 3 defined availability zones under three vSphere clusters:

In addition, if you click on your vCenter server name (top left pane) and then choose Configure > vSphere Zones you can list the available zones:

I also created my first namespace (pindakaas) on which I will deploying my guest Tanzu Kubernetes Clusters (TKC)

Now, from our bootstrap/jumpbox Linux machine I will login to my supervisor cluster and inspect the supervisor nodes running

The supervisor cluster is up, running and in stable state. Next step is to deploy a guest Tanzu Cluster (TKC) on our 3 availability zones.

Creating TKC YAML deployment file

While the concept of deploying guest Tanzu Kubernetes Clusters (TKCs) on top of vSphere supervisor cluster is the same in the sense of creating deployment YAML with the following parameters:

  • Defining Tanzu release to be used.
  • Defining Node size
  • Defining Storage Class
  • Setting CNI parameters (optional) 

However, the new v1alpha3 API adds a lot of new specs (for zonal deployment for example) in addition new TKR images format and many others, below is the sample deployment YAML which I used to deploy my multi-zone guest TKC 

apiVersion: run.tanzu.vmware.com/v1alpha3
kind: TanzuKubernetesCluster
metadata:
  name: multizone-tkc01
  namespace: pindakaas
  annotations:
    run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
spec:
  topology:
    controlPlane:
      replicas: 3
      vmClass: best-effort-medium
      storageClass: pindakaas-storagepolicy
      tkr:
        reference:
          name: v1.23.8---vmware.2-tkg.2-zshippable
    nodePools:
    - name: workers-pool-1
      replicas: 1
      failureDomain: tonychocoloney
      vmClass: best-effort-medium
      storageClass: pindakaas-storagepolicy
      tkr:
        reference:
          name: v1.23.8---vmware.2-tkg.2-zshippable
    - name: workers-pool-2
      replicas: 1
      failureDomain: stroopwaffels
      vmClass: best-effort-medium
      storageClass: pindakaas-storagepolicy
      tkr:
        reference:
          name: v1.23.8---vmware.2-tkg.2-zshippable
    - name: workers-pool-3
      replicas: 1
      failureDomain: pindas
      vmClass: best-effort-medium
      storageClass: pindakaas-storagepolicy
      tkr:
        reference:
          name: v1.23.8---vmware.2-tkg.2-zshippable
  settings:
    network:
      cni:
        name: antrea
      services:
        cidrBlocks: ["198.53.100.0/16"]
      pods:
        cidrBlocks: ["192.0.5.0/16"]
      serviceDomain: cluster.local

If you are familiar with vSphere with Tanzu TKC YAML then it will be obvious for you that in vSphere 8 VMware has introduced a new Tanzu API which is called TanzuKubernetesCluster v1alpha3. This API adds new capabilities to the TKC deployment such as

  • Annotation usage to provision non-default OS for the VM nodes.
  • Topology describes the number, purpose, organisation of nodes and the resources allocated for each. 
  • Nodes are grouped into pools based on their purpose `controlPlane` is special kind of a node pool while `nodePools` is for groups of worker nodes.
  • For v1alpha3 API, the only supported TKR (Tanzu Kubernetes Reference) is v1.23.8+vmware.1-tkg.2-zshippable / v1.23.8+vmware.2-tkg.2-zshippable.
  • failureDomain is the name of a vSphere Zone, and failureDomain is required for multi-zoned Supervisor and in a multi-zoned Supervisor, you will have 3 node pools each referencing a different failureDomain zone name.

The rest of the YAML is pretty much the same as v1alpha2 API, if you need more details about Tanzu APIs then you can reference VMware documentation. In my setup, the storage class pindakaas-storagepolicy is zonal storage policy that was previously created in my previous blog post HERE.

Verifying the deployed multi-zone TKC 

After I applied the above YAML to my supervisor cluster (kubectl apply -f yaml-filename.yaml) the zonal TKC cluster deployment will kick in and in about 10 to 15 min (depends on the size of your nodes and config) you should see the TKC guest cluster created across the 3 zones we have:

From our bootstrap Linux host, lets login to the newly created multi-zone guest Tanzu cluster and verify that the newly configured control-plane and worker nodes are in Ready state:

kubectl-vsphere login –server=https://172.10.200.2 –insecure-skip-tls-verify -u administrator@vsphere.local –tanzu-kubernetes-cluster-namespace pindakaas –tanzu-kubernetes-cluster-name multizone-tkc01

kubectl config use-context multizone-tkc01

kubectl get nodes -o wide

Now, let us inspect how the supervisor cluster has allocated control-plane and worker nodes to zones. This is done by means of “label” control and worker nodes with special Tanzu labels, each labels is mapped to the availability zones we have (and we have 3 of them) and then this is used to place the nodes on the desired availability zone.

For a better output of node labels I will use jq to filter on the json output of my cluster nodes, which if you use Ubuntu you can download it using “sudo apt install jq -y“. 

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.labels}' |jq '.'

This will result in a “prettified” output of the nodes and their assigned labels

Deploying a test distributed application

Our multi-zone guest cluster is now deployed and as a last step we will deploy on top of it a test distributed kubernetes application and verify its pods creation across zones.

For this step, I will use an app called yelb which was developed by a former VMware colleague. It is a simple application but has a cool UI to test micro-services.

first, we need to create a namespace and create a cluster role binding on the default Pod Security Policy (PSP) so that we can schedule pods on top of TKC guest clusters.

kubectl create clusterrolebinding default-tkg-admin-privileged-binding --clusterrole=psp:vmware-system-privileged --group=system:authenticated

kubectl create ns yelb

Then, copy and paste the below YAML file and apply it on your cluster using “kubectl apply -f yelb-deployment-filename.yaml

apiVersion: v1
kind: Service
metadata:
  name: redis-server
  labels:
    app: redis-server
    tier: cache
  namespace: yelb
spec:
  type: ClusterIP
  ports:
  - port: 6379
  selector:
    app: redis-server
    tier: cache
---
apiVersion: v1
kind: Service
metadata:
  name: yelb-db
  labels:
    app: yelb-db
    tier: backenddb
  namespace: yelb
spec:
  type: ClusterIP
  ports:
  - port: 5432
  selector:
    app: yelb-db
    tier: backenddb
---
apiVersion: v1
kind: Service
metadata:
  name: yelb-appserver
  labels:
    app: yelb-appserver
    tier: middletier
  namespace: yelb
spec:
  type: ClusterIP
  ports:
  - port: 4567
  selector:
    app: yelb-appserver
    tier: middletier
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: yelb-ui
  namespace: yelb
spec:
  replicas: 3
  selector:
    matchLabels:
      app: yelb-ui
      tier: frontend
      secgroup: web
  template:
    metadata:
      labels:
        app: yelb-ui
        tier: frontend
        secgroup: web
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - pindas
                - stroopwaffels
                - tonychocoloney
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - yelb-ui
            topologyKey: topology.kubernetes.io/zone
      containers:
      - name: yelb-ui
        image: harbor-repo.vmware.com/dockerhub-proxy-cache/mreferre/yelb-ui@sha256:9df5e2611d6cf7cbc304104c18bb93ab3b185ae68ad25f75b655be1106cdd1b2
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-server
  namespace: yelb
spec:
  replicas: 3
  selector:
    matchLabels:
      app: redis-server
      tier: cache
      secgroup: cache
  template:
    metadata:
      labels:
        app: redis-server
        tier: cache
        secgroup: cache
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - pindas
                - stroopwaffels
                - tonychocoloney
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - redis-server
            topologyKey: topology.kubernetes.io/zone
      containers:
      - name: redis-server
        image: harbor-repo.vmware.com/challagandlp/mreferre/redis@sha256:3c07847e5aa6911cf5d9441642769d3b6cd0bf6b8576773ae3a0742056b9dd47
        ports:
          - containerPort: 6379
      #   volumeMounts:
      #     - name: redis-slave-data
      #       mountPath: /data
      # volumes:
      # - name: redis-slave-data
      #   persistantVolumeClaim:
      #     claimName: redis-slave-claim
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: yelb-db
  namespace: yelb
spec:
  replicas: 3
  selector:
    matchLabels:
      app: yelb-db
      tier: backenddb
      secgroup: db
  template:
    metadata:
      labels:
        app: yelb-db
        tier: backenddb
        secgroup: db
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - pindas
                - stroopwaffels
                - tonychocoloney
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - yelb-db
            topologyKey: topology.kubernetes.io/zone
      containers:
      - name: yelb-db
        image: harbor-repo.vmware.com/challagandlp/mreferre/yelb-db@sha256:6412d2fe96ee71ca701932d47675c549fe0428dede6a7975d39d9a581dc46c0c
        ports:
        - containerPort: 5432
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: yelb-appserver
  namespace: yelb
spec:
  replicas: 3
  selector:
    matchLabels:
      app: yelb-appserver
      tier: middletier
      secgroup: app
  template:
    metadata:
      labels:
        app: yelb-appserver
        tier: middletier
        secgroup: app
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - pindas
                - stroopwaffels
                - tonychocoloney
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - yelb-appserver
            topologyKey: topology.kubernetes.io/zone
      containers:
      - name: yelb-appserver
        image: harbor-repo.vmware.com/challagandlp/mreferre/yelb-appserver@sha256:db367946dc02cf38752ad925e0b0fbff0f5c6f9186ca481fb8541530879d9c8d
        ports:
        - containerPort: 4567
---
apiVersion: v1
kind: Service
metadata:
  name: yelb-ui
  labels:
    app: yelb-ui
    tier: frontend
  namespace: yelb
spec:
  type: LoadBalancer
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: yelb-ui
    tier: frontend

The important and new section in the above YAML is the following:

affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - pindas
                - stroopwaffels
                - tonychocoloney
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - yelb-appserver
            topologyKey: topology.kubernetes.io/zone

The node affinity section is what tells the kube scheduler on which nodes the deployment pods should be scheduled, and in the above we match on our 3 zones, however we also need to make sure than no pods from same deployments coexists on the same node or zone (to ensure maximum HA) and thats why we need the podAntiAffinity parameter which instructs Kube-Scheduler to not schedule pods of same label together on same zone.

After the deployment is created, all the pods (which I configured 3 pod replicas per deployment i.e. 1 pod per zone) should be in running state

Let’s now inspect the 3 pods of the yelb-ui deployment and check on which node each pod is running. We should see that we have single pod per worker node i.e. per zone

 

I also created a loadbalancer service (see deployment YAML above) which exposes the UI pod on port 80, this service will be assigned a VIP from the underlying NSX infrastructure or if you are using NSX ALB or HA-Proxy then this will be assigned from the useable IP pool for virtual services.

From a web browser navigate to http://172.10.200.5 you should see the yelb app homepage