Insight into positioning NSX ALB (Avi) with VMware Tanzu Offerings

Overview

VMware NSX Advanced Load Balancer (Avi Networks) offers rich capabilities for L4-L7 load balancing across different clouds and for different workloads, this in addition to Global Site Load Balancing functionality (GSLB) which allows an organisation to run multiple sites in either Active-Active (load balancing and DR) or Active-Standby (DR) fashion. For load balancing containerised workloads in Tanzu/Kubernetes clusters, traditional load balancer solutions which are appliance-based hardware or virtual editions were not designed for such new workloads nor optimised for multi-cluster, multi-site or multi-cloud environments. The underlying infrastructure lacks the elasticity, scalability, observability and automation required to support Kubernetes environments. Those challenges are exactly what NSX ALB (Avi networks) are trying to solve. NSX ALB AKO (Avi Kubernetes Operator) is a container based component which integrates with NSX ALB controller to offer local site L4 (ingress) L7 (Ingress) containers workload load balancing, this in addition to AMKO (Avi Multi-Cluster Kubernetes Operator) which allows organisations to deploy Active-Active or Active- Standby Tanzu/Kubernetes clusters leveraging NSX ALB (Avi) Global Site Load Balancing capabilities. All of this in addition to a state of the art Web Application Firewall (WAF) which ensures the delivery of reliable and secure application services which are essential to application availability and delivery. 

VMware Tanzu offers  enterprise ready Kubernetes along with an eco-system which makes containerised workload management (security, life cycle, visibility, etc) easier and more efficient. VMware Tanzu offerings are all based on verified, signed and tested Kubernetes packages by VMware which ensures reliable, fast and secure application delivery experience. VMware Tanzu portfolio has grown and changed rapidly over the past couple of years and this makes it challenging to customers to keep up with the differences between the offered Tanzu platforms and what NSX ALB (Avi Networks) can offer with each offering.

In this blog post, I will try to cover some of the differences between the current VMware Tanzu offerings and how NSX Advanced Load Balancer (Avi Networks) can be integrated with every solution and what are the gotcha and Aha moments that you might come across when you try to integrate NSX ALB with Tanzu.

The Solutions Covered in this Post

  • vSphere with Tanzu and Tanzu Kubernetes Grid Services (TKGS) with NSX ALB
  • Tanzu Kubernetes Grid Services (TKGS) with NSX-T Networking and NSX ALB
  • VMware Tanzu Kubernetes Grid (TKGm) with NSX-ALB
  • VMware Tanzu Kubernetes Grid with NSX ALB and Cloud Director (Kubernetes as a Service)
  • VMware Advanced Load Balancer GSLB (Global Site Load Balancing) and Federated AMKO

vSphere with Tanzu and Tanzu Kubernetes Grid Services (TKGS) with NSX ALB

I am starting with this offering because I saw a misconception in the field when I speak with my customers about Tanzu on vSphere. vSphere with Tanzu is used to describe the ability to run Kubernetes workloads directly on the ESXi hypervisor by means of creating Kubernetes control plane (called supervisor cluster) on the hypervisor layer. This is done by creating upstream Kubernetes clusters within dedicated resource pools which are called guest TKC (Tanzu Kubernetes Clusters). Another option with vSphere with Tanzu is what is called vSphere Pods, which is basically running Containers/Pods directly on ESXi hypervisor without the need to deploy guest TKC clusters, however this is only possible if Workload Management is enabled with NSX-T networking.

Tanzu Kubernetes Clusters need to be created under a vSphere Namespace which defines what are the resouces (storage, VM classes, RBAC) that will be available for users/developers when they deploy their Tanzu clusters. Tanzu Kubernetes Clusters are based on Kubernetes orchestrator which in this case is based on verified, tested and supported VMware packages. Guest TKC needs to be deployed on Supervisor Clusters which are created when users enable Workload Management on a vSphere Cluster, see below screenshot:

In the above screenshot, I enabled Workload Management on vSphere cluster Beans and afterwards I created a Namespace called homelab which hosts a guest TKC called beanscluster1. As a result of enabling workload management on Beans cluster, TKGS has created SupervisorControlPlaneVMs shown above which acts as the management cluster for all subsequently created TKCs.

How is NSX ALB positioned with TKGS?

NSX ALB can be used with TKGS to provide both Layer4 ingress (layer 4 load balancing for kube-api) to access supervisor and guest clusters endpoints, and can also be used to configure Ingress (Layer 7 load balancing) for services running inside guest TKCs. 

In order to use NSX ALB as endpoint Layer 4 ingress provider for supervisor and guest clusters, you need to configure NSX ALB during enabling workload management on your vSphere cluster (Beans in my case). I wrote many blog posts about how to enable workload management using NSX ALB, you can reference this HERE. When you use NSX ALB as endpoint layer 4 LB provider for supervisor cluster, you can see that AKO will be deployed within system pods running inside supervisor cluster namespace called vmware-system-ako

for any subsequently created created guest Tanzu Kubernetes Clusters, the AKO instance which is running in the supervisor cluster will be creating corresponding Avi virtual service for every L4 load balancing service required on guest TKC clusters, see below output from guest TKC cluster pods and although AKO is not present there is still a virtual service created in Avi UI for that workload cluster

However from Avi UI you can see that there is a L4 virtual service is created for cluster beanscluster1

For Layer 7 Ingress, AKO needs to be manually deployed in guest TKC clusters using helm, I also discussed this topic in details with step by step instructions in previous blog post (HERE and HERE)

After AKO is manually deployed you will be able to see AKO pod running and any L4 ingress or L7 Ingress rules you create in your guest TKC will be handled by AKO i.e. Avi virtual services. In the below screenshot it shows my AKO pod is running inside my guest TKC and for a deployed application load balancer service it has acquired it from Avi via AKO

This can also be verified from Avi UI

TKGS with AKO with DVS Networking (without NSX-T) Reference Architecture

Below is a reference architecture of how TKGS and AKO interact together

In summary, TKGS with NSX ALB as endpoint provider on DVS networking can make use of both L4 and L7 capabilities of NSX ALB (Avi) in addition to the analytics capability of Avi.

Tanzu Kubernetes Grid Services (TKGS) with NSX-T Networking and NSX ALB

This setup is a bit tricky, due to the fact that NSX-T natively and by default offers Layer4 and Layer7 ingress and Ingress capabilities (respectively) to both supervisor and TKC clusters. If you want to make use of NSX ALB AKO in this setup this is then limited to Layer7 Ingress only and no Layer4 load balancing, since the later will always be handled by NSX-T.

In this setup, AKO is only deployed inside Guest TKC clusters and not inside the supervisor cluster, NSX-T manager needs to be added as NSX-T cloud in NSX ALB (Avi) and routing and DFW rules needs to be configured properly to allow Avi service engines to route to guest TKC nodes. I have written a previous detailed blog post on how to build and operate this model, you can reference that blog post HERE.

TKGS with AKO with NSX-T Networking  Reference Architecture

VMware Tanzu Kubernetes Grid (TKGm) with NSX Advanced Load Balancer

TKGm is used to refer to Tanzu Kubernetes Grid multi-cloud, the name is actually driven from the fact that TKG can be deployed on any cloud provider (vSphere, AWS, Azure, GCP …) while providing the same Tanzu user experience to the end user. So basically offering unified streamlined Tanzu Kubernetes runtime across multi-cloud.

Tanzu Kubernetes Grid has been associated with what is called standalone management cluster deployment, which is referring to a TKG management cluster instance that needs to be first deployed on a cloud provider before actual Tanzu workload clusters can be provisioned. The management cluster is responsible for life cycle management operations on workload clusters (upgrading, scaling and so on).

Tanzu Kubernetes Grid is composed of the following building blocks:

  • Tanzu CLI: which is needs to be installed on a bootstrap host (Windows, Linux or Mac) this is the command line tool used to manage and deploy TKG clusters. Tanzu CLI version must match the same version of the standalone management cluster.
  • TKG standalone Management Cluster: A management cluster is a Kubernetes cluster that deploys and manages other Kubernetes clusters, called workload clusters that host containerised apps. It is never recommended to deploy user containerised workloads on the management cluster.
  • Tanzu Workload Cluster: Workload clusters are the clusters which actually serve users workloads. They are deployed by the management cluster and run on the same private or public cloud infrastructure. workload clusters can run different Kubernetes image versions and subsequently different package versions as well. Default CNI for workload clusters is Antrea but can be set in the deployment YAML to be Calico.

TKG and NSX Advanced Load Balancer

NSX Advanced Load Balancer (Avi) AKO can be fully utilised with TKGm, i.e. for Layer4 ingress as well as Layer7 Ingress. For that, NSX ALB needs to be configured during the management cluster deployment workflow as endpoint provider for layer 4. see sample screenshot below from TKG standalone deployment wizard

Step by step instructions on how to enable NSX ALB on TKGm can be reference in one of my previous blog posts HERE. Once NSX ALB is enabled during TKG standalone management cluster deployment, AKO pod will be running 

AKO will also be automatically deployed (using AKOO) in every subsequent workload cluster that will be deployed, which by default will handle both layer4 and layer7 services. It is important to mention that AKOO is AKO Operator, It orchestrates AKO and doesn’t translate any K8s API. Using AKOO, TKG management cluster automates the creation and installing of AKO pods in every workload cluster created.

TKGm with AKO Reference Architecture

 

VMware Tanzu Kubernetes Grid with NSX ALB and Cloud Director

Cloud Director is VMware’s platform for service providers, which offers wide variety of multi-tenant services for service providers including market place services, cloud native applications, DRaaS, LBaaS and much more. Cloud Director also offers service providers the ability to offer Kubernetes as a Service to tenants based on TKGS and TKGm. In case of TKGS, the vCenters that are backed the provider VDC need to have Workload Management enabled, one of the advantages of offering Kubernetes as a Service backed by TKGS is that it is natively integrated in Cloud Director 10.4 and does not need the provider to install any extra Cloud Director plugins.

For service providers using TKGS to offer KaaS (Kubernetes as a Service) for their tenants, tenants will have access to their guest TKC cluster and for layer4/layer7 ingress/Ingress they will inherit the same method as the one enabled and used on the supervisor cluster. 

Service providers who want to offer their tenants Kubernetes clusters based on TKGm then service providers need to install, configure and manage extra Cloud Director plugin called Container Service Extension (CSE) which adds the capability for service providers to offer tenant portals with the option of creating TKG clusters based on TKGm. However, with this model there is no TKG management cluster which tenants need to deploy, the workflow for provisioning TKG workload clusters for both provider and tenant is summarised below:

  • Provider Workflow:
    • Provider needs to deploy and integrate CSE with Cloud Director
    • Enable and publish kubernetes roles and capability to Tenant
    • Upload supported TKG images to a shared content library.
  • Tenant Workflow:
    • From tenant portal, tenant start kubernetes deployment wizard
    • Select which TKG template needs to be used.
    • Once deployment starts, CSE will provision a temporarily bootstrap VM (ephemeral VM) which runs docker KinD cluster that will be used as a temporarily management cluster to spin up TKG workload cluster for the tenant.
    • Once TKG workload cluster is provisioned, CSE (which is running in Cloud Director) will delete the ephemeral VM.

If you would like to learn more about Cloud Director and TKGS deployment you can reference my previous blog posts HERE and HERE.

If you would like to learn more about Cloud Director and TKGm deployment, you can reference my previous blog posts HERE and HERE.

TKGm and Cloud Director Reference Architecture

Where does NSX ALB (Avi) fits in the whole Cloud Director and KaaS story?

NSX Advanced Load Balancer (Avi) is a requirement for service providers to be able to offer TKGm workload clusters to tenants, which means that AKO will be responsible for Layer4 ingress and TKG endpoint access, however when it comes to Ingress services (Layer7) inside workload clusters, AKO can not really be used, due to the fact that for AKO to offer layer7 it needs connectivity to NSX ALB controller which is a provider managed entity in this case and hence no service provider will be exposing provider level component to tenants. 

How to offer Layer7 Ingress to tenants in case of TKGm and Cloud Director?

At the moment, the conventional method to have layer7 Ingress services for tenants who run TKGm clusters as offering from Cloud Director is to use Contour, till further notice from VMware.

VMware Advanced Load Balancer GSLB (Global Site Load Balancing) and Federated AMKO

NSX Advanced Load Balancer offers Global Site Load Balancer functionality, which offers balancing an application’s load across instances of the application that have been deployed to multiple locations (typically, multiple data centers and/or public/private clouds or stretched clusters). An application is a Virtual Service on NSX ALB (Avi) which maps to a backed server pools which holds your actual applications that you want to balance the load across. However, in a Tanzu/Kubernetes environment an application is containerised and AKO is responsible for creating the virtual services for that application on Avi. However, AKO has no concept of multi-cluster/multi-site and thats why AMKO is needed in this scenario.

AMKO is Avi Multi-Cluster Kubernetes Operator which allows AKO replication from one cluster to be replicated to other clusters (on other locations) in either Active-Active or Active-Standby. In this model, AKO interaction with the deployed TKGs or TKGm cluster is exactly as explained above, the extra step will be installing AMKO on every TKGs or TKGm cluster that you would like to have as part of the Active-Active or Active-Standby sites.

The term federated AMKO means that AMKO uses federation to replicate AMKO configuration to a set of member clusters. This ensures a seamless recovery of AMKO configuration during disasters.

GSLB and Federated AMKO Reference Architecture

As mentioned, every Tanzu cluster that will be member of the GSLB setup needs to have both AKO and AMKO deployed and configured

In my previous blog post HERE I discussed in details how you can setup the above reference architecture and utilise AKO/AMKO for multi-site active-active Ingress services.

Hope you have found this blog post helpful!

Bassem Rezkalla

View Comments

Recent Posts

Configuring NSX ALB Web Application Firewall (WAF) policies on Secure Ingress

Overview NSX Advanced Load Balancer (a.k.a Avi) offers variety of advanced load balancing and application…

7 months ago

Migrate NSX Standard Load Balancer Workloads to NSX ALB using Avi Migration Tool

Overview With the release of VMware NSX 4.0 VMware announced the deprecation of NSX standard…

8 months ago

Backup & Restore TKGS Guest Clusters Using Standalone Velero

Overview Backup and restore is the main building block in any organisation's disaster recovery policy…

8 months ago

Configuring Secure Ingress on TKGS using Cert-Manager and HashiCorp Vault

Overview In this blog post I am going to walk you through the configuration of…

9 months ago

Enabling Antrea NodePortLocal in Single and Multi-Zonal TKGS Clusters

Overview NodePortLocal is a feature that is part of the Antrea Agent, through which a…

10 months ago

Securing Openshift Clusters Using VMware Antrea and NSX 4.1 – Part II

Overview In part two of this blog post, we will be using NSX DFW to…

10 months ago