NodePortLocal is a feature that is part of the Antrea Agent, through which a backend Pod can be reached from the external network using a port of the Node on which the Pod is running. By default, Kubernetes offers NodePort service to expose Pods traffic to external networks, however using NodePort service, Kubernetes will assign a specific port for backed pod and will listen (open) that port on ALL worker nodes in the cluster regardless if the node is hosting the Pod or not. This exposes a security risk since worker nodes will be listening on a non-used port. NodePortLocal on the other hand enables better security and integration with external Load Balancers which can take advantage of the feature: instead of relying on NodePort Services implemented by kube-proxy, external Load-Balancers can consume NPL port mappings published by the Antrea to load-balance Service traffic directly to backend Pods.

Below is a screenshot from a L7 Ingress which is utilising a backend Load Balancer service based on the default Cluster/NodePort Kubernetes service

From the cluster command line we can check on which node is the Pod is actually running, in the below screenshot we can see that our loadbalancer service is mapping to a backend service on port TCP 30307 which has an endpoint pod with IP address and listening on port 8080

Lets now verify on which worker node our pod is actually running

As you can see from the above screenshot, our pod is running on worker node with IP but as you can see in the first screenshot from our load balancer, the default Kubernetes nodeport service exposed TCP port 30307 on ALL worker nodes and not only the node which hosts the pod, and this is exactly what NodePortLocal will solve.

Lab Inventory

For software versions I used the following:

  • VMware ESXi 8.0U1
  • vCenter server version 8.0U1a
  • Multi-Zone TKGS Cluster 
  • VMware NSX ALB 22.1.3 for L4 and L7 Tanzu Cluster Load Balancing.
  • TrueNAS 12.0-U7 as backend storage system.
  • VyOS 1.3 used as lab backbone router, NTP and DHCP server.
  • Ubuntu 20.04 LTS as Linux jumpbox.
  • Windows Server 2019 R2 Standard as DNS server.
  • Windows 10 Pro as UI jump box.

For virtual hosts and appliances sizing I used the following specs:

  • 6 x virtualised ESXi hosts each with 12 vCPUs, 2x NICs and 128 GB RAM.
  • vCenter server appliance with 2 vCPU and 24 GB RAM.

Enabling NodePortlLocal on an Existing Multi-Zonal Cluster

Note: Although I am using a multi vsphere zonal TKG cluster in this blog post, the steps mentioned are applicable to single vSphere zonal clusters.

Enabling features in Antrea is controlled by means of FeatureGates, which is simply a list of the available Antrea features, their status (enabled/disabled) and specific configurations related to that feature. In order to enable NodePortLocal feature in vSphere with Tanzu TKG 2.0 (this is the TKGS release that ships with vSphere 8.0U1 and has Antrea version 1.6 as default CNI for guest TKC clusters) we need to first understand something called AntreaConfig CRD.

Step 1: Verifying AntreaConfig CRD and NodePortLocal Status

In my current setup, I have the following multi vsphere zonal Tanzu cluster deployed, for the deployment steps of a vsphere multi-zone cluster you can reference one of my previous blog posts HERE.

The guest TKC cluster above is deployed using v1beta1 API and cluster class YAML and this allows us to review and modify the antrea Config CRD directly from the supervisor cluster so that we can enable/disable Antrea feature gates on existing clusters. To view the current antreaConfig CRD for our cluster, you need to logout from the TKC cluster and login to the supervisor cluster as shown below

Note: the antreaConfig CRD must be named <clustername>-antrea-package where <clustername> is the name you specified for your guest TKC cluster (in my setup it is called zonal-cluster01). This is important of you are creating your own antreaConfig CRD, you need to follow the same exact naming convention.

The command to list available antreaConfig CRDs is

kubectl get antreaconfigs.cni.tanzu.vmware.com

This command need to be executed from supervisor cluster and not from the guest TKC. If you then use the command

kubectl describe antreaconfigs.cni.tanzu.vmware.com zonal-cluster01-antrea-package

At the bottom of the output, you will see list of the available featuregates and the status of each of them, as you can see NodePortLocal is set to false (disabled) by default.

Step 2: Enable NodePortLocal in AntreaConfig CRD

For this step, we need to login to the supervisor cluster and not the guest TKC cluster, since antreaConfig CRD is created by the supervisor cluster for every guest TKC cluster. We can edit the antreaConfig CRD by using the following command and setting NodePortLocal spec to true

kubectl edit antreaconfigs.cni.tanzu.vmware.com zonal-cluster01-antrea-package

Save and exit the above file and then you need to restart antra agents inside the guest TKC cluster, so we need to login to our destination guest TKC cluster zonal-cluster01 

k vsphere login --vsphere-username administrator@vsphere.local --server= --insecure-skip-tls-verify --tanzu-kubernetes-cluster-namespace stretched-homelab --tanzu-kubernetes-cluster-name zonal-cluster01

Then rollout restart antrea agents using the command:

kubectl rollout restart ds/antrea-agent -n kube-system

Step 3: Adjust NSX ALB AKO configmap in guest cluster & verify NPL

This step is not mandatory if you are not using AKO or if you have not already deployed AKO in your guest TKC, in my case I have already deployed a guest cluster (zonal-cluster01) with AKO set to NodePort in order to provide L4 and L7 load balancing for my cluster, hence the need to modify AKO parameters in AKO configmap to enable NodePortLocal. You can edit AKO configmap as shown below

This will open your default file editor, in the configmap make sure to set the highlighted parameters as shown below

Note: You have to explicitly add antrea as cniPlugin, otherwise AKO will not pickup NodePortLocal configuration. 

Save and exit the file then you need to delete AKO pod ako-0 so that changes can take effect

kubectl delete pod ako-0 -n avi-system

Once AKO pod is restarted, you should be able to see any layer4 or layer7 virtual services pointing only to node(s) which actually host(s) exposed pod(s). Notice the node port change as well (61003) this is assigned from a default port in antrea configmap for NPL feature.

From command line we can also verify that our frontend pod is running on node and is automatically labeled by antrea with nodeportlocal annotations

Enabling NodePortlLocal on a newly Deployed Multi-Zonal Cluster

Step 1: Deploy a new multi-zone guest TKC with NPL enabled

For a new cluster deployment, NodePortLocal needs to be enabled beforehand in an antreaConfig CRD under the supervisor cluster. If you recall, antreaConfig CRD is the custom resource which supervisor cluster creates by default for every guest TKC cluster deployed with default antrea settings (NPL is disabled by default). Since we want to have our cluster deployed with antrea as CNI with NPL enabled, we need to login to the supervisor cluster and apply the following antreaCRD YAML

apiVersion: cni.tanzu.vmware.com/v1alpha1
kind: AntreaConfig
name: zonal-cluster02-antrea-package
defaultMTU: ""
disableUdpTunnelOffload: false
AntreaPolicy: true
AntreaProxy: true
AntreaTraceflow: true
Egress: false
EndpointSlice: true
FlowExporter: false
NetworkPolicyStats: false
NodePortLocal: true
noSNAT: false
trafficEncapMode: encap

Please note that the name field under metadata must include the exact name of the “to be deployed” cluster followed by -antrea-package. Save and exit the above file then apply it to the supervisor cluster, you should be able to see the newly created antreaConfig CRD listed in supervisor

Now we need to deploy cluster zonal-cluster02, for that purpose I will use the below YAML

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
  name: zonal-cluster02
  namespace: stretched-homelab
      cidrBlocks: [""]
      cidrBlocks: [""]
    serviceDomain: "cluster.local"
    class: tanzukubernetescluster
    version: v1.23.8+vmware.2-tkg.2-zshippable
      replicas: 3
        - class: node-pool
          name: node-pool-1
          replicas: 2
          #failure domain the machines will be created in
          #maps to a vSphere Zone; name must match exactly
          failureDomain: zone01
        - class: node-pool
          name: node-pool-2
          replicas: 2
          failureDomain: zone02
        - class: node-pool
          name: node-pool-3
          replicas: 2
          failureDomain: zone03
      - name: vmClass
        value: best-effort-medium
      - name: storageClass
        value: zonal-sp

Here is my newly deployed zonal cluster zonal-cluster02

Once the cluster is deployed, from supervisor cluster you can verify that antreaConfig CRD is assigned successfully to it

Step 2: Verify NPL setting and functionality 

For this step I am deploying a test microservices app along with Avi AKO as L4/L7 load balancer. If you are using NSX ALB (Avi) as well and want to use NodePortLocal then make sure that ako values deployment yaml has the following highlighted parameters correctly set:

I deployed ako using the command

helm install ako/ako --generate-name --version 1.9.3 -f ako-zonal-cluster02-values.yaml --namespace=avi-system

In my demo app, I have a clusterIP service called frontend exposing a webpage on port 80 which maps to a backend pod called frontend exposing a webpage on port 8080

I then deployed the following Ingress L7 to expose that clusterIP service using AKO

apiVersion: networking.k8s.io/v1
kind: Ingress
  name: onlineshop-ingress
    app: frontend
  - host: onlineshop.zonal.nsxbaas.homelab
      - pathType: Prefix
        path: /
            name: frontend
              number: 80

Once I applied the above Ingress CRD, I got a virtual service created on Avi pointing to one node only ( which is hosting the frontend pod

we can also verify pod/node allocation from CLI

Hope you have found this post useful.