Overview

I am just back from VMware Explore in Barcelona after presenting an interesting session about securing containers with Antrea and NSX and I am very excited and pumped to see how our Tanzu portfolio has/is evolved/evolving and this just gave me a push to revisit a topic which is bit common, yet will be even more interesting in the future of TKGs offering. TKGs which is known as vSphere with Tanzu offers a vSphere integrated management control plane for Kubernetes workloads, which leverages a wide spectrum of services offered with vSphere such as embedded nodes lifecycle management, use of vSphere clusters HA zones (in vSphere 8), deploying guest TKC (Tanzu Kubernetes Clusters) by applying a simple YAML, making use of NSX solid security and load balancing capabilities for containerised workloads and much more.

In this blog post I will be revisiting the topic of enabling workload management on vSphere (TKGs) on top of VDS networking and using NSX ALB (Avi vintage) as loadbalancer for supervisor and guest TKC deployed clusters. I say “revisit” since I have already covered the same topic once with NSX as networking provider and loadbalancer (HERE) and once using VDS as networking provider and HA-Proxy as loadbalancer (HERE).

Lab Inventory

For software versions I used the following:

    • VMware ESXi 7.0U3d
    • vCenter server version 7.0U3g
    • NSX Advanced Load Balancer version 22.1.2
    • TrueNAS 12.0-U7 used to provision NFS data stores to ESXi hosts.
    • VyOS 1.4 used as lab backbone router and DHCP server.
    • Ubuntu 20.04.2 LTS as DNS and internet gateway.
    • Windows Server 2012 R2 Datacenter as management host for UI access.

For virtual hosts and appliances sizing I used the following specs:

    • 3 x ESXi hosts each with 12 vCPUs, 2 x NICs and 128 GB RAM.
    • vCenter server appliance with 2 vCPU and 24 GB RAM.

Network Topology and addressing requirements for vSphere with Tanzu and NSX ALB

There are specific networking parameters that need to be planned before we proceed with Avi deployment and workload management, this includes which DVS port groups need to be created for Avi and for TKC workload clusters. NSX ALB controller is deployed with a single management interface which handles UI and API requests, Service Engines are also deployed with one management interface through which they communicate to ALB controller plus data interfaces through which they connect to server pools where traffic needs to be balanced, this is also where VIPs will be placed.

For vSphere with Tanzu, the 3 supervisor VMs are deployed and every VM will be assigned a management IP from management network, this will be the same as the management port group used for Avi and SE management interfaces. See below diagram and table for more clarification:

Note: Tanzu Kubernetes Control Plane nodes (supervisor cluster VMs) are deployed dual-homed, one vNIC connected to same workload network as TKC nodes and one vNIC connected to management network of our NSX ALB controller and Service Engines.

Deploying and Configuring NSX ALB

Step 1: Deploy and connect NSX ALB controller to vcenter

After you download the NSX ALB OVA from VMware customer connect (which redirects you to Avi networks download page) you need to login to your vcenter and deploy the NSX ALB controller as follows:

  1. Right click the cluster which is going to host the NSX ALB controller and choose Deploy OVF Template.
  2. Follow the Deploy OVA Template wizard instructions:
    • Choose Thick or Think for disk format.
    • Choose a port group for Destination Networks in Network Mapping. This port group will be used by the Avi Controller to communicate with vCenter.
    • Specify the management IP address and default gateway. In the case of DHCP, leave this field empty.

 

Choose Local file if you have downloaded the Avi controller ova locally to your machine, click on upload files and then choose the ova file to be uploaded.

Choose a name for Avi controller VM

Choose where to deploy Avi controller, Resource Pool needs to be created beforehand if you want to set resource usage for Avi.

Review ova details

Avi recommends Thick layered zeroed for controller VM but in my setup I will use Thin provisioning to optimise storage usage.

Choose name of DVS port group to which Avi management interface will be connected to.

If below is left blank then DHCP will be used to assign IP addresses and rest of network parameters for controller VM.

Review the template configuration before deploying and if all is good then click FINISH

  • Power on the VM.

Once the NSX ALB controller is powered up and booted, navigate to a web browser and open web page to management address/FQDN of your NSX ALB controller and perform initial setup steps as described under performing the Avi controller Initial setup in Avi documentation.

In my lab I just accepted all the default settings for the initial setup wizard.

Step 2: Create a controller CA certificate and configure vCenter cloud

Before we configure NSX ALB connection to vCenter, we need to create a custom controller CA certificate which vCenter will use while enabling workload management in order to connect to NSX ALB controller and create load balancers required by supervisor cluster and the later to be deployed guest TKC workload clusters.

The steps of creating a controller CA certificate can be found in my previous blog post (HERE) under step 4. Once you have created the self-signed controller certificate, logout out and then log in the UI and navigate to Templates > Security >SSL/TLS Certificates and click on the small export icon on the right of your custom generated certificate 

Click on Copy to clipboard button to export your certificate, paste the contents in a text file as we will need this later while we are enabling workload management in vCenter

Next we need to setup our vCenter cloud in NSX ALB infrastructure configuration, navigate to Infrastructure > Clouds and then you can choose either to edit the Default-Cloud and change its type to vCenter or create a new cloud. In my setup I will edit the existing Default-Cloud as follows:

General section is where you define the name of your cloud provider, type and choose which address management scheme you want to use, in my setup I will use static assignment so I will leave the DHCP options unchecked

One important point which the highlighted option to prefer static routes over connected ones for VS placement, this is needed since my workload network which will host my TKC guest clusters will be in different subnet than the Data network which Avi will use to deploy the Service Engines. By default, Service Engines will consider connected subnets only for placing VS VIPs which means that workloads need to be placed in the same subnet as data network of service engines. If however, your workload network is in a different subnet then you need to configure static route on SEs pointing to that and we need to check the above option to instruct SE to prefer static routes over directly connected ones.

To get a better understanding of networking in vSphere with Tanzu using NSX ALB check VMware documentation

After that you need to configure vCenter/vSphere parameters

Make sure that the cloud is added with Write credentials so that Avi controller can provision and configure service engines. Also there is a new added feature in Avi 22.1.2 which allows the user to specify a content library from which Avi controller can pull or push OVA template for service engines.

After that, we need to configure IP address assignment scheme for the management network. The management network is where Avi controller will be attaching the management interface of service engines once they are deployed. This maps to a configured port group on a DVS of the cluster which will have the service engines deployed, revise the first section in this blog post to understand more about networking requirements for workload management and NSX ALB. Since I am not using a DHCP in this setup, I need to specify an IP address pool from which Avi controller will assign IP addresses to SE management interfaces

Last step is to create and choose an IPAM profile, this is needed to tell Avi on which DVS port groups virtual services VIPs will be provisioned (useable ports)

This is how my IPAM configuration looks like

You can ignore DNS state registration if you do not want Avi to add/remove VIP DNS entries when VIPs are created or deleted.

Click Save and ensure that the Status circle is green for your Default-Cloud instance after the above configuration

Step 3: Finalise Avi networking setup 

Last step in NSX ALB setup is to configure the service engine data network subnet and IP pool range from which VIPs will be assigned, in addition we still need to define a static route for service engine via data interfaces to be able to reach our workload network (workload_mgmt) on which Supervisor VMs and Tanzu cluster nodes will be running (refer to network topology requirement diagram in the beginning of this post).

From Avi UI, navigate to Infrastructure > Cloud Resources > Networks you should see all DVS port groups that are part of your vSphere cloud

The highlighted network is where service engines data interfaces will be connected to and VIPs will be placed, since I will not use DHCP for that purpose (as it requires a DHCP option that is not supported by my DHCP server) I will have to define a static IP pool within subnet 172.10.42.0/24 from which SE and VIPs can consume addresses. If you click on the small pen icon to the right you can add subnets and define your IP pool range, see below:

Since service engine and VIP data network is 172.10.42.0/24 (NSXALB-SE-Data) while workloads will be deployed in 172.10.60.0/24 (workload_mgmt) we need to add a static route for service engine to be able to reach workload network, to do this navigate to Infrastructure > Cloud Resources > VRF Context and make sure that you are under Default-Cloud (in case you have multiple clouds defined) and then edit the global VRF by click on the pen icon to the right

Click on ADD to add a static route to your workload network, for simplicity I added a default route for my service engines to reach any subnet outside of their own 172.10.42.0/24

With this we have concluded our NSX ALB (Avi) setup and preparation, next section we will start enabling workload management in vSphere (vSphere with Tanzu)

Enable workload management in vSphere 

Step 1: Create a TKR content library and storage policy

Open a browser window to your vCenter server and before we start with enabling vSphere with Tanzu, we need to create a content library from which TKGs can pull Tanzu OS images to provision guest TKC clusters. By default, vCenter comes with an embedded published library for that purpose which can be accessed by creating a subscribed content content library to (https://wp-content.vmware.com/v2/latest/lib.json) so I created a content library called Tanzu KubernetesRelease with the below settings

Next we need to create a storage policy which vSphere with Tanzu will apply to Tanzu cluster nodes when creating them, From vCenter navigate to Policies and Profiles and then VM Storage Policies and then click on CREATE to create a new VM storage policy

I created a VM storage policy called General-Storage with the details shown below, but of course you can create any policy with matching requirements for your setup

Step 2: Enable Workload Management on VDS Networking

To enable workload management, you just need to click on GET STARTED in the workload management pane to kick start a step by step wizard to guide you through the process

The first step is to choose your networking stack/provider for TKGs, and since I used a vCenter which does not have NSX the only option I have is to deploy workload management on standard DVS networking and as you can see the alert that we must use an external load balancer for this setup (NSX ALB or HA-Proxy)

Click NEXT and then you need to choose on which cluster supervisor cluster will be deployed, in my setup I have only 1 cluster called Toasti

Step 3 is to choose a VM storage policy which workload management process will use to deploy Tanzu cluster nodes VMs

Step 4 is to fill in the load balancer connection parameters which will be used while creating workloads on this supervisor cluster. We need to paste the controller certificate contents which we generated as part of the NSX ALB preparation (revise step 2 in this post).

Next step 5 is about configuring the supervisor cluster DVS port group and IP addressing scheme, I choose static and had the below configuration assigned

In step 6, I filled in the workload network parameters, this is where Tanzu cluster nodes will be deployed and their IP addresses will be assigned from

Step 7 is choosing a content library with OS image templates for Tanzu nodes (which we created earlier)

 

Last step is to choose control plane VM size (supervisor cluster VMs) based on your setup and then click FINISH

During the workload management deployment process, you will notice that Avi starts to deploy the service engines (load balancers) after the supervisor cluster VMs are deployed and configured

The whole process took about 10 minutes and after a successful deployment you should see a cluster IP assigned from the VIP arrange we configured for our VIPs in Avi earlier this post (from subnet 172.10.42.0/24) 

Step 3: Create a test namespace and login to supervisor cluster

Click on Namespaces and then on CREATE NAMESPACE to create a taste namespace

You need to set a dns compliant name for your namespace and choose under which cluster this namespace will be created

Click on CREATE to create the namespace, complete the rest of namespace configuration as shown below

If you click on Manage VM classes you will find a list of predefined VM classes which we can choose from and use to create our guest Tanzu clusters on supervisor clusters.

Next, we will login to our newly created supervisor cluster and deploy a test guest TKC on top of it, just before we proceed make sure to download vsphere kubectl tools to your jumpbox as described in one of my previous blog posts (HERE). Login to your Linux jumpbox and run the following command to login to the supervisor cluster:

kubectl vsphere login --server=https://<ip address assigned by Avi to supervisor cluster> -u administrator@vsphere.local --insecure-skip-tls-verify

you need of course to adjust values in the above command to match your environment

Below is a list of some resources I will need to use while deploying my test cluster (and yes I changed my terminal software in the middle of the blog post 🙂 )

The deployment YAML for my guest TKC is pasted below, I am using v1alpha2 API and customised the storage of my control and worker nodes in addition to the pod networks that will be assigned later by antrea to pods created in this TKC cluster:

apiVersion: run.tanzu.vmware.com/v1alpha2
kind: TanzuKubernetesCluster
metadata:
  name: uchimata
  namespace: sambalbij
spec:
  topology:
    controlPlane:
      replicas: 1
      vmClass: best-effort-large
      storageClass: general-storage
      volumes:
        - name: etcd
          mountPath: /var/lib/etcd
          capacity:
            storage: 50Gi
      tkr:
        reference:
          name: v1.22.9---vmware.1-tkg.1.cc71bc8
    nodePools:
    - name: worker-pool01
      replicas: 3
      vmClass: best-effort-xlarge
      storageClass: general-storage
      volumes:
        - name: containerd
          mountPath: /var/lib/containerd
          capacity:
            storage: 200Gi
      tkr:
        reference:
          name: v1.22.9---vmware.1-tkg.1.cc71bc8
  settings:
    storage:
      defaultClass: general-storage
    network:
      cni:
        name: antrea
      services:
        cidrBlocks: ["198.53.100.0/16"]
      pods:
        cidrBlocks: ["192.0.5.0/16"]
      serviceDomain: cluster.local

After I applied the above YAML supervisor cluster will begin deploying my guest TKC and eventually this is how it looks like the below, also notice that the created nodes are assigned an IP address from the 172.10.60.0/24 subnet which we specified in the beginning of workload management wizard

if I login to this newly created guest TKC using kubectl vsphere login and inspect the control and worker nodes from cli, I will get the below output:

Also from NSX ALB UI we can see the provisioned VIPs for supervisor cluster control plane and for TKC cluster control plane as well

Hope you have found this post helpful.