
Overview
NSX ALB (Avi) offers rich capabilities for L4-L7 load balancing across different clouds and for different workloads, this in addition to Global Site Load Balancing functionality (GSLB) which allows an organisation to run multiple sites in either Active-Active (load balancing and DR) or Active-Standby (DR) fashion. For load balancing containerised workloads in Tanzu/Kubernetes clusters, Avi offers AKO (Avi Kubernetes Operator) for local site L4-L7 containers workload load balancing, while if an organisation would like to load balance containers workload across multiple sites or for DR then AMKO (Avi Multi-Cluster Kubernetes Operator) need to be deployed along with multiple Avi controller clusters configured for GSLB functionality.
How Avi GSLB and AMKO work
Avi GSLB provides optimal application access to users who are in geographically distributed areas, this in addition to offering the resiliency to loss of a site or a network connection or to perform non-disruptive operations in one site while maintaining data access to another site. Workloads/applications can either be standard server workloads or containerised applications running in a Tanzu/Kubernetes cluster. For the later, organisation needs to make use of AKO for local L4-L7 load balancing or AMKO for site wide load balancing capabilities. The Avi Multi-Cluster Kubernetes Operator (AMKO) is an operator for Tanzu/Kubernetes that facilitates application delivery across multiple clusters. AMKO runs as a pod in one of the clusters and provides DNS-based Global Server Load Balancing for an application that is deployed across multiple clusters. AMKO automates the GSLB configuration and operations on the Avi Vantage platform. Together with the Avi Kubernetes operator (AKO) in each cluster, AKO and AMKO provide application deployment and operations through familiar Kubernetes objects and Avi CRDs.
Lab Inventory
For software versions I used the following:
- VMware ESXi 7.0U3g
- vCenter server version 7.0U3k
- TrueNAS 12.0-U7 used to provision NFS data stores to ESXi hosts.
- VyOS 1.4 used as lab backbone router, DHCP server, NTP and Internet GW.
- Ubuntu 20.04 LTS as bootstrap machine.
- Windows 10 pro as RDP Jumpbox.
- Windows 2019 Server as DNS
- Tanzu Kubernetes Grid 2.1
- NSX ALB controller 22.1.2
- AKO version 1.9.2
- AMKO version 1.9.1
For virtual hosts and appliances sizing I used the following specs:
- 3 x virtualised ESXi hosts each with 12 vCPUs, 2 x NICs and 128 GB RAM.
- vCenter server appliance with 2 vCPU and 24 GB RAM.
Reference Architecture
Deployment Workflow
- Deploy TKG management clusters for site01 and site02
- Deploy 2 workload clusters in site01 and site02
- Configure GSLB sites on Avi controllers (leader and follower).
- Deploy and configure AKO on site01 & site02 workload clusters and deploy demo app on both clusters.
- Deploy and configure AMKO in both site01 and site02.
- Create Ingress service in both clusters.
- Configure zone delegation on organisation DNS.
- Verify AMKO and GSLB deployment
Deploy TKG management and workload clusters in Site01 and Site02
My setup is based on two sites each with its own Avi controller and TKG deployment. Each TKG cluster is composed of a management cluster and a workload cluster (see reference architecture). Each TKG management cluster uses NSX ALB (Avi) as load balancer endpoint provider, for more details on how to deploy TKG management clusters with NSX ALB (Avi) as load balancer you can refer to one of my previous blog posts HERE.
My TKG management clusters are shown in the below screenshots:
Site01 TKG Management Cluster
Site02 TKG Management Cluster
For workload clusters, I deployed one workload cluster per site and since I am using TKG 2.1 I used the cluster class descriptor YAML instead of the legacy declarative YAML that was used till TKG version 1.6.1. A sample of the deployment YAML for site01 TKG workload cluster is shown below:
apiVersion: v1 kind: Secret metadata: name: nsxbaas-tkg-wld-site01 namespace: default stringData: password: insert vsphere password username: administrator@vsphere.local --- apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: annotations: osInfo: photon,3,amd64 tkg/plan: dev labels: tkg.tanzu.vmware.com/cluster-name: nsxbaas-tkg-wld-site01 name: nsxbaas-tkg-wld-site01 namespace: default spec: clusterNetwork: pods: cidrBlocks: - 100.86.0.0/16 services: cidrBlocks: - 100.66.0.0/16 topology: class: tkg-vsphere-default-v1.0.0 controlPlane: metadata: annotations: run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon replicas: 1 variables: - name: controlPlaneCertificateRotation value: activate: true daysBefore: 90 - name: auditLogging value: enabled: true - name: podSecurityStandard value: audit: baseline deactivated: false warn: baseline - name: aviAPIServerHAProvider value: true - name: vcenter value: cloneMode: fullClone datacenter: /Homelab datastore: /Homelab/datastore/DS01 folder: /Homelab/vm/TKG network: /Homelab/network/TKG-MGMT-WLD-NET resourcePool: /Homelab/host/Kitkat/Resources/TKG server: vc-l-01a.nsxbaas.homelab storagePolicyID: "" template: /Homelab/vm/photon-3-kube-v1.24.9+vmware.1 tlsThumbprint: "" - name: user value: sshAuthorizedKeys: - none - name: controlPlane value: machine: diskGiB: 40 memoryMiB: 8192 numCPUs: 2 - name: worker value: count: 1 machine: diskGiB: 40 memoryMiB: 4096 numCPUs: 2 version: v1.24.9+vmware.1 workers: machineDeployments: - class: tkg-worker metadata: annotations: run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon name: md-0 replicas: 2
You then need to apply the above file in both sites after modifying per site parameters. In my setup I have two running TKG workload clusters, one in every site as shown in the below screenshot:
Site01 TKG Workload Cluster
Site02 TKG Workload Cluster
Configure GSLB sites on Avi controllers (leader and follower)
In my setup I will setup an Avi controller in site01 as GSLB leader and another controller in Site02 as follower but both will be working as Active-Active sites in handling global DNS requests. As a best practice, it is recommended to create a dedicated SE group on both sites to handle the special DNS virtual service that will serve GSLB. For this I will be creating a dedicated SE group as shown below on site01 and similar one in site02 (names on both sites do not have to match).
From Avi UI navigate to Infrastructure > Cloud Resources > Service Engine Group and click on CREATE
Click on Advanced if you want to configure some specific parameters for the Service Engines that will host the DNS VS (for example to be created on specific hosts/clusters).
Click on SAVE and then navigate to Applications > Virtual Services to create the global DNS virtual service and then the Advanced Setup option
Assign a name to the DNS VS (g-dns-vs-site01) and ensure that the application profile is System-DNS and click on Create VS VIP to assign a VIP address to that DNS VS
In the CREATE VS VIP window choose on which network the DNS VS service engines VIPs will be placed and assigned
Under VIPs click on ADD to choose where the DNS VS VIP will be placed (placement network) and from which subnet VIP address will be allocated
Click SAVE and then under Step 4 (Advanced) choose the dedicated SE group we created earlier to host DNS virtual service.
Click NEXT and save VS creation, then you should see GSLB service engines deploying in vCenter
Repeat the above steps in Site02, finally you should see your g-dns vs created in site02 as well
Configure GSLB Sites in Avi
After we have defined the DNS virtual services on both sites, we need to setup GSLB sites. From site01 Avi UI, navigate to Infrastructure > GSLB > Site Configuration and click on the pencil icon on the right
Define your site parameters which include IP address of Avi controller/cluster and the sub-domain to which GSLB DNS service should be handling. In my setup this is called avi.nsxbaas.homelab, and eventually in our external DNS we will need to configure a zone delegation for avi.nsxbaas.homelab pointing to the previously created DNS virtual services VIPs.
Click on Save and Set DNS Virtual Services, this is where we define the DNS VS we created earlier for Site01.
Click on SAVE and then click on ADD New Site, this is in order to setup Site02 as follower
Configure the same parameters but this time for site02
Once Site02 is added, you should see the below GSLB site configuration and site02 must be In sync with the leader (site01).
Deploy and configure AKO on site01 & site02 workload clusters and deploy demo app on both clusters
AKO deployment is the same for both workload clusters, you will only need to modify the values.yaml to match each configuration of every site. In a previous post I discussed step by step details of installing AKO on TKG workload clusters, however I added the steps you need to do in summary and a screenshot from the important sections you need to modify in the values file before you deploy AKO.
To deploy AKO using helm, make sure to install helm on your bootstrap machine and then execute the following commands:
helm repo add ako https://projects.registry.vmware.com/chartrepo/ako helm search repo |grep ako helm show values ako/ako --version 1.9.2 > site01-ako-values.yaml
The below are important sections you need to modify in your values file:
Save and exit the above file and then create a namespace called avi-system and apply the above file to create AKO pod under that namespace:
kubectl create namespace avi-system helm install ako/ako --generate-name --version 1.9.2 -f site01-ako-values.yaml --set avicredentials.username=admin --set avicredentials.password=<password> --namespace=avi-system
If deployment is successful then you should see AKO pod running
Repeat the same for site02.
Deploy a demo app to test ingress
I will use a demo app to test multi-site ingress functionality of Avi GSLB, for this I used a demo app called Online Boutique which you can download its manifest from https://github.com/GoogleCloudPlatform/microservices-demo if you receive an ImageBackoff error due to pull limit on any of the containers being deployed, then make sure to add the following to your docker config json file and apply it to the namespace where you are deploying the demo app to. In my case I used the below:
kubectl create ns microservices-app docker login kubectl create secret generic regcred --from-file=.dockerconfigjson=/home/nsxbaas/.docker/config.json --type=kubernetes.io/dockerconfigjson -n microservices-app kubectl edit sa default -n microservices-app
And then add the highlighted line to the service account default
This is how my microservices app deployment look like
The above highlighted value is the loadbalancer service created as part of the demo app deployment and has successfully acquired an IP address from Avi controller SE VIP network.
Deploy and configure AMKO in both site01 and site02
Before we deploy the actual AMKO pod and start configuring sites wide virtual services, we need to prepare a file called GSLB members file which is basically a merged kubeconfig file from both workload clusters which is needed to give AMKO read access to services and the ingress/route objects for all the member clusters, since AMKO will be reading those objects and updating Avi leader controller.
To create gslb-members file just follow the following steps:
Step 1: Generate kubeconfig for both workload clusters
tanzu cluster kubeconfig get nsxbaas-tkg-wld-site01 --admin --export-file nsxbaas-tkg-wld-site01-kubeconfig-admin tanzu cluster kubeconfig get nsxbaas-tkg-wld-site02 --admin --export-file nsxbaas-tkg-wld-site02-kubeconfig
Step 2: Merge both kubeconfig files
export KUBECONFIG=nsxbaas-tkg-wld-site01-kubeconfig-admin:nsxbaas-tkg-wld-site02-kubeconfig kubectl config view --flatten > gslb-members.yaml
Step 3: Verify that gslb-members file is working
Step 4: Create a generic secret to be used by AMKO to authenticate to workload clusters
You will need to repeat this step for both workload clusters on both sites
kubectl config use-context nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01 kubectl create secret generic gslb-config-secret --from-file gslb-members -n avi-system
Note: I had to rename the file to be gslb-members and not gslb-members.yaml since amko deployment was failing when the secret was created from a file name other than “gslb-member” also the namespace to which AMKO is deployed must be named avi-system
Step 5: Modify AMKO values files
AMKO values file is in the same helm repo for ako, so we can simply pull it using the same command we used earlier to pull AKO values yaml
helm show values ako/amko --version 1.9.1 > site01-amko-values.yaml
My AMKO values file for site01 looks like the below
Default values for amko. # This is a YAML-formatted file. # Declare variables to be passed into your templates. replicaCount: 1 image: repository: projects.registry.vmware.com/ako/amko pullPolicy: IfNotPresent # Configs related to AMKO Federator federation: # image repository image: repository: projects.registry.vmware.com/ako/amko-federator pullPolicy: IfNotPresent # cluster context where AMKO is going to be deployed currentCluster: 'nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01' # Set to true if AMKO on this cluster is the leader currentClusterIsLeader: true # member clusters to federate the GSLBConfig and GDP objects on, if the # current cluster context is part of this list, the federator will ignore it memberClusters: - nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01 - site02-nsxbaas-tkg-wld-admin@site02-nsxbaas-tkg-wld # Configs related to AMKO Service discovery serviceDiscovery: # image repository # image: # repository: projects.registry.vmware.com/ako/amko-service-discovery # pullPolicy: IfNotPresent # Configs related to Multi-cluster ingress. Note: MultiClusterIngress is a tech preview. multiClusterIngress: enable: false configs: gslbLeaderController: 'avi-l-01a.nsxbaas.homelab' controllerVersion: 22.1.2 memberClusters: - clusterContext: nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01 - clusterContext: site02-nsxbaas-tkg-wld-admin@site02-nsxbaas-tkg-wld refreshInterval: 1800 logLevel: INFO # Set the below flag to true if a different GSLB Service fqdn is desired than the ingress/route's # local fqdns. Note that, this field will use AKO's HostRule objects' to find out the local to global # fqdn mapping. To configure a mapping between the local to global fqdn, configure the hostrule # object as: # [...] # spec: # virtualhost: # fqdn: foo.avi.com # gslb: # fqdn: gs-foo.avi.com useCustomGlobalFqdn: false gslbLeaderCredentials: username: 'admin' password: 'password' globalDeploymentPolicy: # appSelector takes the form of: appSelector: label: app: gslb # Uncomment below and add the required ingress/route/service label # appSelector: # namespaceSelector takes the form of: # namespaceSelector: # label: # ns: gslb <example label key-value for namespace> # Uncomment below and add the reuqired namespace label # namespaceSelector: # list of all clusters that the GDP object will be applied to, can take any/all values # from .configs.memberClusters matchClusters: - cluster: nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01 - cluster: site02-nsxbaas-tkg-wld-admin@site02-nsxbaas-tkg-wld # list of all clusters and their traffic weights, if unspecified, default weights will be # given (optional). Uncomment below to add the required trafficSplit. # trafficSplit: # - cluster: "cluster1-admin" # weight: 8 # - cluster: "cluster2-admin" # weight: 2 # Uncomment below to specify a ttl value in seconds. By default, the value is inherited from # Avi's DNS VS. # ttl: 10 # Uncomment below to specify custom health monitor refs. By default, HTTP/HTTPS path based health # monitors are applied on the GSs. # healthMonitorRefs: # - hmref1 # - hmref2 # Uncomment below to specify custom health monitor template. Either healthMonitorRefs or healthMonitorTemplate # is allowed. # healthMonitorTemplate: hmTemplate1 # Uncomment below to specify a Site Persistence profile ref. By default, Site Persistence is disabled. # Also, note that, Site Persistence is only applicable on secure ingresses/routes and ignored # for all other cases. Follow https://avinetworks.com/docs/20.1/gslb-site-cookie-persistence/ to create # a Site persistence profile. # sitePersistenceRef: gap-1 # Uncomment below to specify gslb service pool algorithm settings for all gslb services. Applicable # values for lbAlgorithm: # 1. GSLB_ALGORITHM_CONSISTENT_HASH (needs a hashMask field to be set too) # 2. GSLB_ALGORITHM_GEO (needs geoFallback settings to be used for this field) # 3. GSLB_ALGORITHM_ROUND_ROBIN (default) # 4. GSLB_ALGORITHM_TOPOLOGY # # poolAlgorithmSettings: # lbAlgorithm: # hashMask: # required only for lbAlgorithm == GSLB_ALGORITHM_CONSISTENT_HASH # geoFallback: # fallback settings required only for lbAlgorithm == GSLB_ALGORITHM_GEO # lbAlgorithm: # can only have either GSLB_ALGORITHM_ROUND_ROBIN or GSLB_ALGORITHM_CONSISTENT_HASH # hashMask: # required only for fallback lbAlgorithm as GSLB_ALGORITHM_CONSISTENT_HASH # Uncomment below to specify gslb service down response settings for all gslb services. # Applicable values for type are: # 1. GSLB_SERVICE_DOWN_RESPONSE_NONE (default) # 2. GSLB_SERVICE_DOWN_RESPONSE_ALL_RECORDS # 3. GSLB_SERVICE_DOWN_RESPONSE_FALLBACK_IP (needs fallbackIP to be set too) # 4. GSLB_SERVICE_DOWN_RESPONSE_EMPTY # # downResponse: # type: # fallbackIP: # required only for type == GSLB_SERVICE_DOWN_RESPONSE_FALLBACK_IP serviceAccount: # Specifies whether a service account should be created create: true # Annotations to add to the service account annotations: {} # The name of the service account to use. # If not set and create is true, a name is generated using the fullname template name: resources: limits: cpu: 250m memory: 300Mi requests: cpu: 100m memory: 200Mi service: type: ClusterIP port: 80 rbac: # creates the pod security policy if set to true pspEnable: false persistentVolumeClaim: '' mountPath: /log logFile: amko.log federatorLogFile: amko-federator.log
I then installed amko using the command:
helm install ako/amko --generate-name --version 1.9.1 -f site01-amko-values.yaml --namespace=avi-system
If the deployment is successful then AMKO stateful set pods should be all in running state in both clusters
Deploy Ingress for demo app and verify GSLB service creation
The whole idea of using AMKO is to update Avi GSLB service with ingress objects/services when they are created in any of the workload clusters, this ensures that any ingress rules developers create in a multi-site TKG deployment. The ingress I used is shown below
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: onlineshop-ingress labels: app: gslb spec: rules: - host: onlineshop.avi.nsxbaas.homelab http: paths: - pathType: Prefix path: / backend: service: name: frontend-external port: number: 80
Note the label app:gslb which I added to the ingress yaml, this is to instruct AMKO to handle this ingress and update its status accordingly with Avi GSLB sites. Once Ingress successfully deployed, we can see that AMKO has created a GSLB service in Avi UI mapping to that ingress
If you click on the pencil icon to the right of the GSLB service you can get more information such as the load balancing algorithm (which is Round Robin by default)
Verify that both AMKO are in sync and GSLB config is being replicated to both sites, from our jumpbox run the following commad:
Configure zone delegation on organisation DNS.
In order for GSLB to handle incoming requests to our ingress (onlineshop.avi.nsxbaas.homelab) we need to instruct our external (organisation) DNS to forward any name resolution requests ending with (avi.nsxbaas.homelab) to our GSLB DNS VS IPs we created earlier. To do this we need to configure what we call zone delegation on our DNS server, I am using Windows 2019 server as my home lab DNS and below is how to configure zone delegation:
Step 1: Create DNS A records pointing to GSLV VS VIPs
We need to note down the IPs of both g-dns virtual services we created earlier and create DNS A records pointing to our DNS VS VIPs on Site01 and Sitre02
Next, right click on the forward lookup zone and choose New Delegation
Add the DNS VS FQDN we created above as delegated DNS servers for zone avi.nsxbaas.homelab. Note, if the wizard is unable to validate the DNS VS as DNS server just ignore it and move to the next step.
Click on Next and finalise the zone delegation setup
Once you click on Finish you should see both DNS VIPs are added as DNS servers for zone avi.nsxbaas.homelab
Verify AMKO and GSLB deployment
To verify that AMKO and GSLB are load balancing HTTP sessions to our demo app, I will use a simple dig command from my jumbox. I added the sleep 20 command (which adds a 20 seconds pause between dig execution) since GSLB will not be load balancing ever single packet in round robin across 2 geographically distant sites, so I added the pause in between so we can see the alternation of the DNS VS responses. The IPs shown below are the Ingress VS VIPs assigned by Avi to the ingress we created on site01 (192.168.12.8) and site02 (192.168.17.8)
Note: If you are unable to see that GSLB is load balancing traffic across the 2 sites we configured, then from the leader site navigate to Applications > GSLB Services and click on the pencil icon next to the created GSLB service to edit it, from the Edit GSLB Service window and under GSLB pool click on the pencil icon and ensure that under Pool Members you have both Ingresses (from both Tanzu workload clusters) listed.
From a web browser open an HTTP to onlineshop.avi.nsxbaas.homelab
Hope you have found this blog post helpful!
Excellent publication. I have a big question:
Currently, for an environment based on VMware Cloud Foundation with two 4.5 or 5 sites, is this topology officially supported?
Since NSX-T has always been recommended for VCF as load balancer and network provider (T1 and overlay segments).
Thanks
Carlos R