Overview

NSX ALB (Avi) offers rich capabilities for L4-L7 load balancing across different clouds and for different workloads, this in addition to Global Site Load Balancing functionality (GSLB) which allows an organisation to run multiple sites in either Active-Active (load balancing and DR) or Active-Standby (DR) fashion. For load balancing containerised workloads in Tanzu/Kubernetes clusters, Avi offers AKO (Avi Kubernetes Operator) for local site L4-L7 containers workload load balancing, while if an organisation would like to load balance containers workload across multiple sites or for DR then AMKO (Avi Multi-Cluster Kubernetes Operator) need to be deployed along with multiple Avi controller clusters configured for GSLB functionality.

How Avi GSLB and AMKO work

Avi GSLB provides optimal application access to users who are in geographically distributed areas, this in addition to offering the resiliency to loss of a site or a network connection or to perform non-disruptive operations in one site while maintaining data access to another site. Workloads/applications can either be standard server workloads or containerised applications running in a Tanzu/Kubernetes cluster. For the later, organisation needs to make use of AKO for local L4-L7 load balancing or AMKO for site wide load balancing capabilities. The Avi Multi-Cluster Kubernetes Operator (AMKO) is an operator for Tanzu/Kubernetes that facilitates application delivery across multiple clusters. AMKO runs as a pod in one of the clusters and provides DNS-based Global Server Load Balancing for an application that is deployed across multiple clusters. AMKO automates the GSLB configuration and operations on the Avi Vantage platform. Together with the Avi Kubernetes operator (AKO) in each cluster, AKO and AMKO provide application deployment and operations through familiar Kubernetes objects and Avi CRDs.

Lab Inventory

For software versions I used the following:

  • VMware ESXi 7.0U3g
  • vCenter server version 7.0U3k
  • TrueNAS 12.0-U7 used to provision NFS data stores to ESXi hosts.
  • VyOS 1.4 used as lab backbone router, DHCP server, NTP and Internet GW.
  • Ubuntu 20.04 LTS as bootstrap machine.
  • Windows 10 pro as RDP Jumpbox.
  • Windows 2019 Server as DNS
  • Tanzu Kubernetes Grid 2.1
  • NSX ALB controller 22.1.2
  • AKO version 1.9.2
  • AMKO version 1.9.1

For virtual hosts and appliances sizing I used the following specs:

  • 3 x virtualised ESXi hosts each with 12 vCPUs, 2 x NICs and 128 GB RAM.
  • vCenter server appliance with 2 vCPU and 24 GB RAM.

Reference Architecture

 

Deployment Workflow

  • Deploy TKG management clusters for site01 and site02
  • Deploy 2 workload clusters in site01 and site02
  • Configure GSLB sites on Avi controllers (leader and follower).
  • Deploy and configure AKO on site01 & site02 workload clusters and deploy demo app on both clusters.
  • Deploy and configure AMKO in both site01 and site02.
  • Create Ingress service in both clusters.
  • Configure zone delegation on organisation DNS.
  • Verify AMKO and GSLB deployment

Deploy TKG management and workload clusters in Site01 and Site02

My setup is based on two sites each with its own Avi controller and TKG deployment. Each TKG cluster is composed of a management cluster and a workload cluster (see reference architecture). Each TKG management cluster uses NSX ALB (Avi) as load balancer endpoint provider, for more details on how to deploy TKG management clusters with NSX ALB (Avi) as load balancer you can refer to one of my previous blog posts HERE.

My TKG management clusters are shown in the below screenshots:

Site01 TKG Management Cluster

Site02 TKG Management Cluster

For workload clusters, I deployed one workload cluster per site and since I am using TKG 2.1 I used the cluster class descriptor YAML instead of the legacy declarative YAML that was used till TKG version 1.6.1. A sample of the deployment YAML for site01 TKG workload cluster is shown below:

apiVersion: v1
kind: Secret
metadata:
  name: nsxbaas-tkg-wld-site01
  namespace: default
stringData:
  password: insert vsphere password
  username: administrator@vsphere.local
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  annotations:
    osInfo: photon,3,amd64
    tkg/plan: dev
  labels:
    tkg.tanzu.vmware.com/cluster-name: nsxbaas-tkg-wld-site01
  name: nsxbaas-tkg-wld-site01
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 100.86.0.0/16
    services:
      cidrBlocks:
      - 100.66.0.0/16
  topology:
    class: tkg-vsphere-default-v1.0.0
    controlPlane:
      metadata:
        annotations:
          run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
      replicas: 1
    variables:
    - name: controlPlaneCertificateRotation
      value:
        activate: true
        daysBefore: 90
    - name: auditLogging
      value:
        enabled: true
    - name: podSecurityStandard
      value:
        audit: baseline
        deactivated: false
        warn: baseline
    - name: aviAPIServerHAProvider
      value: true
    - name: vcenter
      value:
        cloneMode: fullClone
        datacenter: /Homelab
        datastore: /Homelab/datastore/DS01
        folder: /Homelab/vm/TKG
        network: /Homelab/network/TKG-MGMT-WLD-NET
        resourcePool: /Homelab/host/Kitkat/Resources/TKG
        server: vc-l-01a.nsxbaas.homelab
        storagePolicyID: ""
        template: /Homelab/vm/photon-3-kube-v1.24.9+vmware.1
        tlsThumbprint: ""
    - name: user
      value:
        sshAuthorizedKeys:
        - none
    - name: controlPlane
      value:
        machine:
          diskGiB: 40
          memoryMiB: 8192
          numCPUs: 2
    - name: worker
      value:
        count: 1
        machine:
          diskGiB: 40
          memoryMiB: 4096
          numCPUs: 2
    version: v1.24.9+vmware.1
    workers:
      machineDeployments:
      - class: tkg-worker
        metadata:
          annotations:
            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
        name: md-0
        replicas: 2

You then need to apply the above file in both sites after modifying per site parameters. In my setup I have two running TKG workload clusters, one in every site as shown in the below screenshot:

Site01 TKG Workload Cluster

Site02 TKG Workload Cluster

Configure GSLB sites on Avi controllers (leader and follower)

In  my setup I will setup an Avi controller in site01 as GSLB leader and another controller in Site02 as follower but both will be working as Active-Active sites in handling global DNS requests. As a best practice, it is recommended to create a dedicated SE group on both sites to handle the special DNS virtual service that will serve GSLB. For this I will be creating a dedicated SE group as shown below on site01 and similar one in site02 (names on both sites do not have to match).

From Avi UI navigate to Infrastructure > Cloud Resources > Service Engine Group and click on CREATE 

Click on Advanced if you want to configure some specific parameters for the Service Engines that will host the DNS VS (for example to be created on specific hosts/clusters).

Click on SAVE and then navigate to Applications > Virtual Services to create the global DNS virtual service and then the Advanced Setup option

Assign a name to the DNS VS (g-dns-vs-site01) and ensure that the application profile is System-DNS and click on Create VS VIP to assign a VIP address to that DNS VS

In the CREATE VS VIP window choose on which network the DNS VS service engines VIPs will be placed and assigned

Under VIPs click on ADD to choose where the DNS VS VIP will be placed (placement network) and from which subnet VIP address will be allocated

 

 

Click SAVE and then under Step 4 (Advanced) choose the dedicated SE group we created earlier to host DNS virtual service.

Click NEXT and save VS creation, then you should see GSLB service engines deploying in vCenter

Repeat the above steps in Site02, finally you should see your g-dns vs created in site02 as well

Configure GSLB Sites in Avi

After we have defined the DNS virtual services on both sites, we need to setup GSLB sites. From site01 Avi UI, navigate to Infrastructure > GSLB > Site Configuration and click on the pencil icon on the right

Define your site parameters which include IP address of Avi controller/cluster and the sub-domain to which GSLB DNS service should be handling. In my setup this is called avi.nsxbaas.homelab, and eventually in our external DNS we will need to configure a zone delegation for avi.nsxbaas.homelab pointing to the previously created DNS virtual services VIPs.

Click on Save and Set DNS Virtual Services, this is where we define the DNS VS we created earlier for Site01.

Click on SAVE and then click on ADD New Site, this is in order to setup Site02 as follower

Configure the same parameters but this time for site02

Once Site02 is added, you should see the below GSLB site configuration and site02 must be In sync with the leader (site01).

 

Deploy and configure AKO on site01 & site02 workload clusters and deploy demo app on both clusters

AKO deployment is the same for both workload clusters, you will only need to modify the values.yaml to match each configuration of every site. In a previous post I discussed step by step details of installing AKO on TKG workload clusters, however I added the steps you need to do in summary and a screenshot from the important sections you need to modify in the values file before you deploy AKO.

To deploy AKO using helm, make sure to install helm on your bootstrap machine and then execute the following commands:

helm repo add ako https://projects.registry.vmware.com/chartrepo/ako
helm search repo |grep ako
helm show values ako/ako --version 1.9.2 > site01-ako-values.yaml

The below are important sections you need to modify in your values file:

Save and exit the above file and then create a namespace called avi-system and apply the above file to create AKO pod under that namespace:

kubectl create namespace avi-system
helm install ako/ako --generate-name --version 1.9.2 -f site01-ako-values.yaml --set avicredentials.username=admin --set avicredentials.password=<password> --namespace=avi-system

If deployment is successful then you should see AKO pod running

Repeat the same for site02.

Deploy a demo app to test ingress

I will use a demo app to test multi-site ingress functionality of Avi GSLB, for this I used a demo app called Online Boutique which you can download its manifest from https://github.com/GoogleCloudPlatform/microservices-demo if you receive an ImageBackoff error due to pull limit on any of the containers being deployed, then make sure to add the following to your docker config json file and apply it to the namespace where you are deploying the demo app to. In my case I used the below:

kubectl create ns microservices-app 
docker login
kubectl create secret generic regcred --from-file=.dockerconfigjson=/home/nsxbaas/.docker/config.json  --type=kubernetes.io/dockerconfigjson -n microservices-app 
kubectl edit sa default -n microservices-app 

And then add the highlighted line to the service account default

This is how my microservices app deployment look like

The above highlighted value is the loadbalancer service created as part of the demo app deployment and has successfully acquired an IP address from Avi controller SE VIP network.

Deploy and configure AMKO in both site01 and site02

Before we deploy the actual AMKO pod and start configuring sites wide virtual services, we need to prepare a file called GSLB members file which is basically a merged kubeconfig file from both workload clusters which is needed to give AMKO read access to services and the ingress/route objects for all the member clusters, since AMKO will be reading those objects and updating Avi leader controller.

To create gslb-members file just follow the following steps:

Step 1: Generate kubeconfig for both workload clusters

tanzu cluster kubeconfig get nsxbaas-tkg-wld-site01 --admin --export-file nsxbaas-tkg-wld-site01-kubeconfig-admin
tanzu cluster kubeconfig get nsxbaas-tkg-wld-site02 --admin --export-file nsxbaas-tkg-wld-site02-kubeconfig

Step 2: Merge both kubeconfig files

export KUBECONFIG=nsxbaas-tkg-wld-site01-kubeconfig-admin:nsxbaas-tkg-wld-site02-kubeconfig
kubectl config view --flatten > gslb-members.yaml

Step 3: Verify that gslb-members file is working

Step 4: Create a generic secret to be used by AMKO to authenticate to workload clusters

You will need to repeat this step for both workload clusters on both sites

kubectl config use-context nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01
kubectl create secret generic gslb-config-secret --from-file gslb-members -n avi-system

Note: I had to rename the file to be gslb-members and not gslb-members.yaml since amko deployment was failing when the secret was created from a file name other than “gslb-member” also the namespace to which AMKO is deployed must be named avi-system

Step 5: Modify AMKO values files 

AMKO values file is in the same helm repo for ako, so we can simply pull it using the same command we used earlier to pull AKO values yaml

helm show values ako/amko --version 1.9.1 > site01-amko-values.yaml

My AMKO values file for site01 looks like the below

 Default values for amko.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

image:
  repository: projects.registry.vmware.com/ako/amko
  pullPolicy: IfNotPresent

# Configs related to AMKO Federator
federation:
  # image repository
  image:
    repository: projects.registry.vmware.com/ako/amko-federator
    pullPolicy: IfNotPresent
  # cluster context where AMKO is going to be deployed
  currentCluster: 'nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01'
  # Set to true if AMKO on this cluster is the leader
  currentClusterIsLeader: true
  # member clusters to federate the GSLBConfig and GDP objects on, if the
  # current cluster context is part of this list, the federator will ignore it
  memberClusters:
  - nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01
  - site02-nsxbaas-tkg-wld-admin@site02-nsxbaas-tkg-wld

# Configs related to AMKO Service discovery
serviceDiscovery:
  # image repository
  # image:
  #   repository: projects.registry.vmware.com/ako/amko-service-discovery
  #   pullPolicy: IfNotPresent

# Configs related to Multi-cluster ingress. Note: MultiClusterIngress is a tech preview.
multiClusterIngress:
  enable: false

configs:
  gslbLeaderController: 'avi-l-01a.nsxbaas.homelab'
  controllerVersion: 22.1.2
  memberClusters:
  - clusterContext: nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01
  - clusterContext: site02-nsxbaas-tkg-wld-admin@site02-nsxbaas-tkg-wld
  refreshInterval: 1800
  logLevel: INFO
  # Set the below flag to true if a different GSLB Service fqdn is desired than the ingress/route's
  # local fqdns. Note that, this field will use AKO's HostRule objects' to find out the local to global
  # fqdn mapping. To configure a mapping between the local to global fqdn, configure the hostrule
  # object as:
  # [...]
  # spec:
  #  virtualhost:
  #    fqdn: foo.avi.com
  #    gslb:
  #      fqdn: gs-foo.avi.com
  useCustomGlobalFqdn: false

gslbLeaderCredentials:
  username: 'admin'
  password: 'password'

globalDeploymentPolicy:
  # appSelector takes the form of:
  appSelector:
    label:
      app: gslb
  # Uncomment below and add the required ingress/route/service label
  # appSelector:

  # namespaceSelector takes the form of:
  # namespaceSelector:
  #   label:
  #     ns: gslb   <example label key-value for namespace>
  # Uncomment below and add the reuqired namespace label
  # namespaceSelector:

  # list of all clusters that the GDP object will be applied to, can take any/all values
  # from .configs.memberClusters
  matchClusters:
  - cluster: nsxbaas-tkg-wld-site01-admin@nsxbaas-tkg-wld-site01
  - cluster: site02-nsxbaas-tkg-wld-admin@site02-nsxbaas-tkg-wld

  # list of all clusters and their traffic weights, if unspecified, default weights will be
  # given (optional). Uncomment below to add the required trafficSplit.
  # trafficSplit:
  #   - cluster: "cluster1-admin"
  #     weight: 8
  #   - cluster: "cluster2-admin"
  #     weight: 2

  # Uncomment below to specify a ttl value in seconds. By default, the value is inherited from
  # Avi's DNS VS.
  # ttl: 10

  # Uncomment below to specify custom health monitor refs. By default, HTTP/HTTPS path based health
  # monitors are applied on the GSs.
  # healthMonitorRefs:
  # - hmref1
  # - hmref2

  # Uncomment below to specify custom health monitor template. Either healthMonitorRefs or healthMonitorTemplate
  # is allowed.
  # healthMonitorTemplate: hmTemplate1

  # Uncomment below to specify a Site Persistence profile ref. By default, Site Persistence is disabled.
  # Also, note that, Site Persistence is only applicable on secure ingresses/routes and ignored
  # for all other cases. Follow https://avinetworks.com/docs/20.1/gslb-site-cookie-persistence/ to create
  # a Site persistence profile.
  # sitePersistenceRef: gap-1

  # Uncomment below to specify gslb service pool algorithm settings for all gslb services. Applicable
  # values for lbAlgorithm:
  # 1. GSLB_ALGORITHM_CONSISTENT_HASH (needs a hashMask field to be set too)
  # 2. GSLB_ALGORITHM_GEO (needs geoFallback settings to be used for this field)
  # 3. GSLB_ALGORITHM_ROUND_ROBIN (default)
  # 4. GSLB_ALGORITHM_TOPOLOGY
  #
  # poolAlgorithmSettings:
  #   lbAlgorithm:
  #   hashMask:           # required only for lbAlgorithm == GSLB_ALGORITHM_CONSISTENT_HASH
  #   geoFallback:        # fallback settings required only for lbAlgorithm == GSLB_ALGORITHM_GEO
  #     lbAlgorithm:      # can only have either GSLB_ALGORITHM_ROUND_ROBIN or GSLB_ALGORITHM_CONSISTENT_HASH
  #     hashMask:         # required only for fallback lbAlgorithm as GSLB_ALGORITHM_CONSISTENT_HASH

  # Uncomment below to specify gslb service down response settings for all gslb services.
  # Applicable values for type are:
  # 1. GSLB_SERVICE_DOWN_RESPONSE_NONE (default)
  # 2. GSLB_SERVICE_DOWN_RESPONSE_ALL_RECORDS
  # 3. GSLB_SERVICE_DOWN_RESPONSE_FALLBACK_IP (needs fallbackIP to be set too)
  # 4. GSLB_SERVICE_DOWN_RESPONSE_EMPTY
  #
  # downResponse:
  #   type:
  #   fallbackIP:         # required only for type == GSLB_SERVICE_DOWN_RESPONSE_FALLBACK_IP

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name:

resources:
  limits:
    cpu: 250m
    memory: 300Mi
  requests:
    cpu: 100m
    memory: 200Mi

service:
  type: ClusterIP
  port: 80

rbac:
  # creates the pod security policy if set to true
  pspEnable: false

persistentVolumeClaim: ''
mountPath: /log
logFile: amko.log

federatorLogFile: amko-federator.log

 

I then installed amko using the command:

helm install ako/amko --generate-name --version 1.9.1 -f site01-amko-values.yaml --namespace=avi-system 

If the deployment is successful then AMKO stateful set pods should be all in running state in both clusters

Deploy Ingress for demo app and verify GSLB service creation

The whole idea of using AMKO is to update Avi GSLB service with ingress objects/services when they are created in any of the workload clusters, this ensures that any ingress rules developers create in a multi-site TKG deployment. The ingress I used is shown below

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: onlineshop-ingress
  labels:
    app: gslb
spec:
  rules:
  - host: onlineshop.avi.nsxbaas.homelab
    http:
      paths:
      - pathType: Prefix
        path: /
        backend:
          service:
            name: frontend-external
            port:
              number: 80

Note the label app:gslb which I added to the ingress yaml, this is to instruct AMKO to handle this ingress and update its status accordingly with Avi GSLB sites. Once Ingress successfully deployed, we can see that AMKO has created a GSLB service in Avi UI mapping to that ingress

If you click on the pencil icon to the right of the GSLB service you can get more information such as the load balancing algorithm (which is Round Robin by default)

Verify that both AMKO are in sync and GSLB config is being replicated to both sites, from our jumpbox run the following commad:

Configure zone delegation on organisation DNS.

In order for GSLB to handle incoming requests to our ingress (onlineshop.avi.nsxbaas.homelab) we need to instruct our external (organisation) DNS to forward any name resolution requests ending with (avi.nsxbaas.homelab) to our GSLB DNS VS IPs we created earlier. To do this we need to configure what we call zone delegation on our DNS server, I am using Windows 2019 server as my home lab DNS and below is how to configure zone delegation:

Step 1: Create DNS A records pointing to GSLV VS VIPs

We need to note down the IPs of both g-dns virtual services we created earlier and create DNS A records pointing to our DNS VS VIPs on Site01 and Sitre02

Next, right click on the forward lookup zone and choose New Delegation

Add the DNS VS FQDN we created above as delegated DNS servers for zone avi.nsxbaas.homelab. Note, if the wizard is unable to validate the DNS VS as DNS server just ignore it and move to the next step.

Click on Next and finalise the zone delegation setup

Once you click on Finish you should see both DNS VIPs are added as DNS servers for zone avi.nsxbaas.homelab

Verify AMKO and GSLB deployment

To verify that AMKO and GSLB are load balancing HTTP sessions to our demo app, I will use a simple dig command from my jumbox. I added the sleep 20 command (which adds a 20 seconds pause between dig execution) since GSLB will not be load balancing ever single packet in round robin across 2 geographically distant sites, so I added the pause in between so we can see the alternation of the DNS VS responses. The IPs shown below are the Ingress VS VIPs assigned by Avi to the ingress we created on site01 (192.168.12.8) and site02 (192.168.17.8)

Note: If you are unable to see that GSLB is load balancing traffic across the 2 sites we configured, then from the leader site navigate to Applications > GSLB Services and click on the pencil icon next to the created GSLB service to edit it, from the Edit GSLB Service window and under GSLB pool click on the pencil icon and ensure that under Pool Members you have both Ingresses (from both Tanzu workload clusters) listed.

From a web browser open an HTTP to onlineshop.avi.nsxbaas.homelab

Hope you have found this blog post helpful!