Overview

In a previous blog post (HERE) I deployed NSX Application Platform on top of TKGs kubernetes cluster, however I had different discussions with both customers and partners and based on that I decided to write another blog post to cover NSX Application Platform (NAPP) deployment on upstream (aka native) Kubernetes cluster, since not all NSX customers have Tanzu. The challenge that I will be working on in this post, is preparing an upstream K8s cluster with the necessary requirements to run NAPP pods, which is not out-of-the box scenario. If you run TKGs or TKGm then your Tanzu workload clusters will get rolled-out with all the required components and services to run NAPP, which is obviously not the case if you plan to deploy NAPP on upstream Kubernetes.

Lab Inventory

For software versions I used the following:

    • VMware ESXi 8.0 IA
    • vCenter server version 8.0 IA
    • VMware NSX-T 3.2.1.2 large form factor
    • TrueNAS 12.0-U7 used to provision NFS data stores to ESXi hosts.
    • VyOS 1.4 used as lab backbone router and DHCP server.
    • Ubuntu 20.04.2 LTS as DNS and internet gateway.
    • Windows Server 2012 R2 Datacenter as management host for UI access.

For virtual hosts and appliances sizing I used the following specs:

    • 3 x ESXi hosts each with 12 vCPUs, 2 x NICs and 128 GB RAM.
    • vCenter server appliance with 2 vCPU and 24 GB RAM.
    • 4 x Ubuntu 18.04 server VMs for upstream Kubernetes cluster

Upstream Kubernetes cluster requirements to run NAPP

NSX Application Platform has strict requirements that need to be met to make sure the deployment is successful. Below is a list with those requirements:

  • Licensing requirements and this can be found on VMware site HERE.
  • NSX Application Platform System Requirements (HERE) In my lab I am deploying a standard form factor deployment and the size of my Kubernetes nodes are:

    • One control node and 3 worker nodes, each with 8 vCPUs, 32GB of RAM and 300GB of storage per node. I just had some resources to spare in my lab so thats why I oversized my nodes over the resources mentioned in VMware documentation for NAPP standard form factor.
  • Your upstream Kubernetes cluster must have the following resources before we start deploying NAPP on it:
    • Kubernetes version matching the supported Kubernetes tool that comes with your NSX version (I am using NSX-T 3.2.1.2 so kubernetes tools is 1.20 and hence my Kubernetes version).
    • You must have a CNI installed in your K8s cluster, for this lab I am using Antrea (make sure to use an Antrea version matching your Kubernetes version).
    • A load balancer service provider such as NSX ALB or MetalLb, I am using MetalLb in this lab but I deployed NSX ALB AKO before in one of my previous posts (HERE) so feel free to check it if you want to learn how to deploy NSX ALB AKO as your load balancer provider for your Kubernetes clusters.
    • Container Storage Interface (CSI) in order to provision dynamic volumes, you must have an available storage class in the Kubernetes cluster you are using and it must support resizing.

Preparing upstream Kubernetes cluster to deploy NAPP

Deploy Kubernetes v1.20 on control and worker nodes

There are many documentations and guides on the internet that include instructions on how to deploy an upstream Kubernetes cluster, I also had this as part of one of my previous blog posts (HERE) but for sake of completeness I will include the steps here once again.

Step 1: Node preparation and installing CRI runtime

Login to your controller node and run the following commands:

sudo swapoff -a

cat << EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf 
net.bridge.bridge-nf-call-iptables = 1 
net.ipv4.ip_forward = 1 
net.bridge.bridge-nf-call-ip6tables = 1 
EOF
sudo sysctl --system

The above commands will disable swap (make sure to uncomment swap line also in /etc/fstab) and enable required kernel modules and system settings for running containerd. You should see similar output as the below screenshot:

Next, let’s go ahead and install containerd, currently containerd is being distributed as part of docker engine so you can follow the steps mentioned Docker Engine Installation to deploy containerd. In the below screenshot I highlighted the commands sequence and output for reference

Do not forget to add your current user to docker group so you can run containerd without the need to modify docker and containerd system files permissions

sudo usermod -aG docker $USER

Then reboot your system, after that you need to create containerd configuration directories under which it will store config files (in case you want to use containerd instead of docker).

sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd

Step 2: Install Kubeadm, Kubectl and Kubelet version 1.20

As mentioned earlier under the requirements section, we need to make sure that our Kubernetes version matches the supported Kubernetes tools with the NSX-T version that we are using. In my setup I am using NSX-T 3.2.1.2 that has kubernetes tools 1.20. You can check this from download section of NSX Application Platform under the NSX download section in VMware customer connect products download page.

apt-transport-https is also one of the needed packages and needs to be installed:

sudo apt update && sudo apt install -y apt-transport-https curl

Then you need to add Google’s gpg key for kube packages repository 

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

Next  we need to setup the actual kubernetes repository entry in the debian package manager:

cat << EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF

Next, copy and paste the below command to install the required kubelet, kubectl and kubeadm v1.20 packages and lock them to prevent any apt upgrades from updating them in the future:

sudo apt update && sudo apt install -y kubelet=1.20.0-00 kubectl=1.20.0-00 kubeadm=1.20.0-00 && sudo apt-mark hold kubelet kubeadm kubectl

Step 3: Configure worker nodes for kubernetes v1.20

Go ahead and repeat steps 1 and 2 for your worker nodes

Step 4: Initialising the k8s cluster, this step you need to complete only on the controller node

In this step we will initialise our kubernetes cluster and assign the pods CIDR from which our CNI (Antrea) will be assigning IPs to pods. The next command need to be applied on the controller node only:

sudo kubeadm init --pod-network-cidr 10.20.0.0/16 --kubernetes-version 1.20.0

Your command output should be similar to the below:

Finalise the init process by running the following commands to create .kube config directory:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Now, use the kubeadm join command which is generated in the above output and run it as superuser on worker nodes to get them to join our Kubernetes cluster, in my setup this is command I used:

kubeadm join 172.10.42.1:6443 --token hcohfk.ivohknawkdeytamn --discovery-token-ca-cert-hash sha256:0edd6e9d66530efd1d795c8cb00b0cc89000b23a2cfe35a80b7eb2fd6d41a48f

Once all your worker nodes join the cluster, you can verify that they successfully joined by running the below command form your controller node:

kubectl get nodes -o wide

The status of all nodes (including controller) is NotReady because we have not installed a CNI yet (Antrea) so in the next step we will be doing this to get our cluster in Ready state.

Deploy Antrea CNI and verify cluster state

Container Network Interface (CNI) is the Kubernetes cluster component responsible for providing network connectivity between pods, services across the cluster as well as networking traffic entering or leaving pods. CNI is crucial component for kubernetes cluster operation and without one installed, kubernetes clusters will never enter Ready state or be able to host pods. For this task you can use any CNI you prefer but I will stick to VMware Antrea (open source Antrea will work as good as this one) in this blog post.

Before we deploy Antrea CNI, we need to check which VMware Antrea version matches our Kubernetes v1.20 cluster, so by checking VMware Antrea release notes we can see that VMware Container Networking with Antrea 1.3.2 is based off the Antrea v1.2.4 open-source release which supports Kubernetes versions up to v1.21, so this is what we need to download and use in our Kubernetes cluster.

Step 1: Download VMware Container Networking Antrea 1.3.2 and upload it to controller node

Before you deploy Antrea you need to install open vswitch packages on all your kubernetes nodes, this is a requirement for Antrea.

sudo apt -y install openvswitch-switch

Login to VMware customer connect and navigate to Products > All Products and search for VMware Antrea and then choose the version matching your kubernetes cluster, for more details on how to install Antrea you can check my blog post HERE.

Once you apply Antrea manifest deployment YAML, make sure that all your Antrea agent pods are in running state and that your Kubernetes cluster has moved to Ready state. 

To have all in one place, I placed my Antrea YAML deployment file here for convenience, just copy and paste the below in a yaml file and apply the configuration using “kubectl apply -f yaml_filename.yaml“. The file can be viewed/downloaded from HERE.

Step 2: Verify Antrea pods deployment and cluster status

If the Antrea pods are deployed properly you should see similar output to the below:

Deploy MetalLB as cluster load balancer 

NSX Application Platform pods will require to have an entry point through which NSX will be configuring and interacting with those pods, for this purpose we need to assign an IP address from an “ingress” pool so that NSX can use it. This is called Service Name or Interface Service Name and is used as the HTTPS endpoint to connect to the NSX Application Platform. In this post I will be deploying MetalLB as a load balancer in the cluster and assign an address (along with a DNS entry for that address so we have a FQDB name) from load balancer block to be used as the interface service name.

Step 1: Deploy MetalLB from manifest

Login to your Kubernetes controller node and apply the following command:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.7/config/manifests/metallb-native.yaml

If MetalLB pods are deployed successfully you should see all the pods under metallb-system in Running state:

Step 2: Configure IP pool for MetalLB and create a FQDN entry for Interface Service Name to be used later by NAPP

To configure the IP address pool that MetalLB will be assigning address from, you need to create a YAML file, paste the following entries in it and then apply it to your Kubernetes cluster using Kubectl apply -f command:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: napp-pool
  namespace: metallb-system
spec:
  addresses:
  - 172.10.42.100-172.10.42.200

Obviously you need to enter an address range which matches your setup, be aware that if you plan to assign a different IP pool for your load balancer IP assignment other than the same subnet on which your kubernetes cluster nodes are in, then you need to use L3 mode on MetalLB which advertises the load balancer subnet to your gateway by means of BGP. To simplify my lab setup, I deployed MetalLB IP range in the same subnet as my Kubernetes nodes so I do not need to run BGP between my MetalLB pods and my core router.

Next we need to add an entry in our DNS for the interface service name, in my setup I assigned IP 172.10.42.190 to FQDN napp-service.nsxbaas.homelab

Deploy Container Storage Interface

A container storage interface is a plugin which allows Kubernetes to provision volumes on external storage providers, this could be vSphere storage or public cloud volumes provider. For this setup, I am going to use vSphere container plugin, also called the upstream vSphere CSI driver, is a volume plug-in that runs in a native Kubernetes cluster deployed in vSphere and is responsible for provisioning persistent volumes on vSphere storage. 

It is very important to make sure that the version of the CSI plugin that you will deploy matches the Kubernetes cluster version you are running, in my lab I am using K8s v1.20 and this is compatible with CSI 2.4.3, you can find the compatibility list in VMware Documentation.

Step 1: Create VMware CSI namespace for vSphere storage plugin deployment

All the deployment manifests for CSI 2.4.3 can be found HERE. Login to you Kubernetes controller node and clone the CSI repository 

git clone https://github.com/kubernetes-sigs/vsphere-csi-driver/

To create the vmware-system-csi namespace, navigate to manifests/csi/vsphere-csi-driver/manifests/vanilla under the directory to which you cloned the repo and then run the command:

kubectl apply -f namespace.yaml

Before installing the vSphere Container Storage Plug-in in your generic Kubernetes environment, make sure that you taint the control plane node with the node-role.kubernetes.io/control-plane=:NoSchedule parameter.

kubectl taint nodes k8s-napp-controller node-role.kubernetes.io/control-plane=:NoSchedule

After that, we need to create a kubernetes secret which will contain the configuration details to connect to vSphere, my configuration looks like this:

[Global]
cluster-id = "f3cdcccf-331b-49f6-a67b-341cfaeb6082"

[NetPermissions "A"]
ips = "*"
permissions = "READ_WRITE"
rootsquash = false

[VirtualCenter "vc-l-01a.nsxbaas.homelab"]
insecure-flag = "true"
user = "administrator@vsphere.local"
password = "Omitted"
port = "443"
datacenters = "nsxbaas"

In order to get the cluster-id (which is used as a unique cluster identity for your kubernetes cluster) you can use the following command on your controller node:

kubectl get namespace kube-system --output jsonpath={.metadata.uid}

This will return the UUID of your kube-system namespace which is unique across your kubernetes cluster, and this is the value you need to fill in the above config file. Also, the above is for File based storage, if you use block storage (VMFS) then no need to add the sections of NetPermissions.

Copy and paste the contents of the above file in a file called csi-vsphere.conf and then create a configuration secret using the following command:

kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf --namespace=vmware-system-csi

Last step is to deploy vsphere csi plugin daemonset and verify that your vsphere plugin has successfully been registered to your kubernetes cluster. Switch to the directory under which we cloned the csi-driver repository earlier and run the following command:

kubectl apply -f vsphere-csi-driver.yaml

This should result in similar output to the below:

Step 2: Verify that the vSphere Container Storage Plug-in has been successfully deployed.

You can use the following set of commands to verify that the csi driver has successfully been deployed:

kubectl get deployment --namespace=vmware-system-csi

kubectl describe csidrivers

kubectl get CSINode

What is Next?

This concludes the first part of this blog post, in the second part I will start NSX Application Platform deployment on the upstream Kubernetes cluster we prepared in this post. I hope you have found this useful.