In a previous blog post series (part one and part two) I covered how service providers can offer Tanzu as a Service (TaaS) to their tenants based on vSphere with Tanzu enabled vSphere clusters, this offers a native out-of-the-box capability of Cloud Director and vSphere to offer Tanzu clusters natively to tenants without the need to use CSE (Container Service Extension for Cloud Director). This offers fast and efficient Tanzu service delivery but requires customers to enable workload management on their backed hosting vsphere clusters, which might not be feasible to all service providers and hence I decided to write a new blog post series to cover how service providers can offer Tanzu/Kubernetes as a service in Cloud Director using CSE 4.0.
Container Services Extension is a Cloud Director plugin which adds the capability of provisioning Tanzu clusters to Cloud Director tenants. CSE plugin is responsible for preparing and deploying a self managed TKG cluster based on trusted VMware TKG templates without the need to provision a management TKG cluster for every tenant. With the release of Cloud Director Container Service Extension version 4.0, VMware removed the cumbersome process of deploying CSE server, CSE client and vcd-cli configuration workflow to enable provisioning TKG clusters in Cloud Director. If you are familiar with CSE versions prior to 4.0 then you will appreciate this major enhancement in CSE4.0 by integrating the workflow of CSE (server, client and API requests to cloud director) in a single vApp.
In this two part blog post series I will go through prerequisites for CSE4.0 along with a basic reference architecture of my setup and then step by step infrastructure preparation and workflow deployment of a tenant TKG workload cluster.
Bill of Materials
For software versions I used the following:
- VMware ESXi 7.0U3g
- vCenter server version 7.0U3g
- VMware NSX-T 3.2.2
- VMware Cloud Director 10.3.3
- VMware NSX Advanced Load Balancer 21.1.3
- TrueNAS 12.0-U7 used to provision NFS data stores to ESXi hosts.
- VyOS 1.4 used as lab backbone router and DHCP server.
- Ubuntu 20.04.2 LTS as DNS and internet gateway.
- Ubuntu 18.04 LTS as Jumpbox and running kubectl to manage Tanzu clusters.
- Windows 10 pro as Jumpbox with UI.
For virtual hosts and appliances sizing I used the following specs:
- 3 x ESXi hosts each with 12 vCPUs, 2 x NICs and 128 GB RAM.
- vCenter server appliance with 2 vCPU and 24 GB RAM.
- NSX-T Manager medium deployment.
- NSX-T Edges with large form factor VM.
- Cloud Director large primary cell.
Prerequisites and preparations
Before we begin make sure that you downloaded the following components
- CSE 4.0 OVA, can be downloaded from VMware Customer Connect.
- A supported TKG template (also from VMware Customer Connect) currently with CSE 4.0 only TKG 1.5.4 and 1.4.3 are supported, also only Ubuntu non-FIPS templates are supported no Photon OS support currently.
- Cloud Director installed (you can review my previous blog post HERE to learn more how to deploy and initialise Cloud Director).
- NSX-T networking must be used for Cloud Director for
- VMware Cloud Director provides load balancing services by leveraging the capabilities of VMware NSX Advanced Load Balancer through the tenant portal.
- TKG clusters requires load balancing functionality to be available and for Cloud Director and CSE integration only NSX ALB (Avi) is supported. So you have to have NSX ALB deployed and have your NSX-T added as cloud provider to Avi (details will follow in this blog post).
- The Tenant Org VDC to which TKG clusters will be deployed requires access to the Internet and the Cloud Director API address.
- Make sure that your Cloud Director instance has a vCenter server and NSX-T Manager added (step by step details on cloud director and NSX-T networking basics can be reviewed on this blog post) Basically, a fully functional VMware Cloud Director Environment which is capable of onboarding tenants and serving required functionality such as creating Org-VDC, Tier-1 Edge Gateways, Routed Network, etc.
A very high level of Tanzu as a service offering with Cloud Director and CSE4.0 is summarised in the diagram below, the idea is to deploy CSE4.0 vApp which interacts with Cloud Director APIs to provide the capability to provision Tanzu Guest Clusters (TKGm) per tenant. Cloud Director provider will provide underlying networking and load balancing requirements in either per tenant dedicated or shared across tenants, while every tenant will be responsible of its own TKGm cluster deployment. Those TKGm clusters are bootstraped by means of an ephemeral temporarily VM which is created by the CSE extension and will be deleted automatically once the Tanzu cluster is up and running. There is no need to deploy Tanzu management cluster per tenant.
Container Service Extension server will be deployed in its own provider managed Org and will need to have connectivity to Cloud Director only (no internet connectivity required). A detailed reference architecture which I used in my home lab and will be referenced in this blog post can be seen below:
Couple of remarks regarding the above architectural design:
- NSX ALB controller(s) are omitted from the above diagram since they are deployed in my home lab management network, they must be of course reachable to Avi SE mgmt T1 via Toasti T0.
- Avi SE mgmt T1 is a dedicated T1 gateway connected to a dedicated T0 gateway only for Avi service engines.
- Default Service Subnet (192.168.255.0/25) is system provisioned subnet to which Avi service engine data interfaces will be connected to. This subnet will also be connected to every Tenant Edge gateway (in case of dedicated tenant edge) or shared among all tenants if all tenants are sharing the same edge gateway.
- Tenant Edge gateway (example Pindakaas-EDGE, HARIBO-EDGE and CSE-EDGE) is a T1 router that Cloud Director will create on NSX-T backing the infrastructure for every tenant, and this tenant edge (T1) will act as the gateway for tenants routed networks.
- External Networks (184.108.40.206/24 pool) external network is an uplink from an edge gateway (T1) to the provider gateway (NSXBaaS GW). You will need to assign a group of IPs from that pool to every tenant edge gateway and will also need to add a NATing rule on every edge gateway to NAT any internal network to an address in the external pool so that VDC networks can reach external network resources (Cloud Director address and the internet). More about external networks can be found in VMware documentation.
- The CSE Server Org is a provider managed org, in which CSE vApp needs to be created. CSE 4.0 Server vApp requires access to Cloud Director API (cloud director IP) and does not require internet access. CSE server vApp is connected to an internal VDC routed network 220.127.116.11/24 which is NATed through CSE-EDGE gateway to an address from the EXT network pool (see green EXT network pool).
- TKG-Nodes-NET (10.210.0.0/24 for Pindakaas and 10.200.0.0/24 for Haribo) is an internal routed network that will need to be created in every tenant and this is where Tanzu nodes and ephemeral VM will be connected to. This network needs to have internet access and access to cloud director IP address. Important note, currently this network supports only static IP pool allocation and no DHCP.
The deployment workflow is broken down to 4 major steps:
- Onboarding a test tenant (Pindakaas) with relevant networking configuration (Covered in Part I of this blog post series).
- CSE 4.0 server preparations and deployment. (Covered in Part I of this blog post series).
- VMware NSX Advanced Load Balancer integration with Cloud Director (Covered in Part II of this blog post series).
- Enabling Tenants to deploy Tanzu Kubernetes Cluster on Cloud Director (Covered in Part II of this blog post series).
Onboarding a test tenant
Step 1: Create a tenant Org and OrgVDC
Login to Cloud Director provider portal, navigate to Organizations and then click on NEW
Create a new Org and give it a name, mine is called Pindakaas
Once it is created, you should see your Org listed without any Org VDCs assigned to it yet
From the left pane click on Organizations VDCs and click on NEW, this will open the New Organization VDC creation window
Click NEXT and then choose under which Org you want to create this VDC under
Click NEXT and then you need to choose a provider VDC (for more details about that reference my previous blog posts part one and part two) for the External Networks selection it is empty because I am assigning the External Network to my tenant via an imported provider GW T0 (NSXBaaS GW) steps will follow later (to learn more about how to create External Networks in Cloud Director you can reference VMware docs) , for now just choose the provider VDC and click NEXT
Choose Flex as the allocation model and then click on NEXT
Set whatever resources limitations or guarantees you want (in my lab I set it all to 0 to avoid any resources limitation issues when I power on VMs) then click NEXT
In step 6, choose which storage policies you want to get assigned to VMs created under this VDC (such policies are pulled from vCenter)
Click NEXT and then you need to select a network pool to servce network traffic from and to org VDC VMs, this is a GENEVE (overlay TZ in NSX-T) which we added in a previous blog post.
Click NEXT, revise the org VDC configuration and if all is good then hit FINISH
Your Org VDC should be listed
Step 2: Create tenant Edge Gateway and configure tenant networking
Now we have created our Tenant Org and Org VDC we need to create and assign an Edge Gateway to it. An edge gateway backed by an NSX-T Data Center tier-0 gateway provides a routed organization VDC network with services such as load balancing, network address translation, and firewall. so that VMs hosted under that tenant can have connectivity to external networks and Internet. To create an Edge Gateway, from Cloud Director provider portal UI click on Edge Gateways on the left pane and then NEW
Select to which OrgVDC you want to add an Edge Gateway, in my lab I will select PindaVDC01
Click NEXT and then give your Edge GW a name, as per my reference architecture diagram I will name it Pindakaas-EDGE, I will not enable a dedicated T0 since all my Edge gateways for all my tenants share a single provider gateway which is shown in the above reference architecture with name NSXBaaS-GW which is simply a VRF created on a T0 gateway in my NSX-T instance backing this cloud director deployment.
Click NEXT and then you need to choose which provider gateway you need to connect this Edge gateway to, in my setup my provider T0 is called NSXBaaS-GW
Click NEXT and then you need to specify on which NSX-T Edge cluster this newly created edge gateway (T1 GW) will run, in my setup I will use the same edge cluster which on which my T0 is running.
Click NEXT. The next step is of crucial importance in understanding the concept of External Networks I spoke about earlier, in this step we assign an IP from a T0 allocated range (18.104.22.168/24) to my newly created edge. The range 22.214.171.124/24 have already been configured and defined when I created my provider gateway (NSXBaas-GW). I have already allocated some IP addresses from this range to other Edge gateways for my existing tenants, so for this newly created Edge I will assign the following IPs:
126.96.36.199 this will be the primary address for edge Pindakaas-EDGE which you can consider as an IP address on a transit link between this edge gateway and the provider gateway (NSXBaas-GW).
188.8.131.52 – 184.108.40.206 is a free range of the external network and this is needed for NSX ALB to assign VIPs for the service engines that will be created to load balance traffic across Tanzu nodes what we will eventually be creating under this tenant Pindakaas.
Click NEXT, review edge configuration and then hit FINISH to create that Edge gateway
You should see your Edge created successfully under Edge Gateways.
Now we created our tenant and tenant edge gateway we need to finalise tenant networking setup, navigate to Organizations and click on the small box with an arrow next to our Pindakaas Org in order to login to tenant portal and finalise networking:
From tenant portal navigate to Networking and then click on NEW to create a tenant org VDC network, this is where we are going to connect our Tanzu cluster nodes to
In step 1 of Org VDC network creation, we need to choose under which VDC we want to create this network. Choose PindaVDC01
Then for the network type choose Routed and click NEXT
In step 3 you need to choose which Edge gateway this network will be connected to, we have only one edge gateway which we created in the previous step
Click NEXT and then assign a name for this Org VDC network (reference the reference architecture diagram)
Assign a static IP pool from which Tanzu nodes (VMs) will get their IP address configuration (revise reference architecture diagram) this subnet will need to be NATed (later in this blog post) through the edge gateway so that Tanzu cluster nodes can reach Cloud Director address and Internet to download packages.
Click NEXT and then configure DNS parameters
Click NEXT and choose a segment profile if you have one configured (I do not)
Click NEXT, revise network deployment parameters and then click FINISH
You should see your network created successfully and listed as an Org VDC network
Before we move to the next deployment workflow which is setting up Container Service Extension, we need to configure two things on our Edge gateway of tenant Pindakaas
- NAT rule for TKG-Nodes-NET (10.210.0.0/24) for external network connectivity.
- Modify a default DROP any edge firewall rule that by default blocks any traffic to and from org VDC networks.
From your tenant portal under Networking > Edge Gateways > Services > NAT add an SNAT rule to allow TKG nodes to access external networks and Internet.
Last step in this section is to modify the DROP any firewall rule, from same window, under services click on Firewall and modify the DROP any rule (for simplicity I modified it to ALLOW any).
CSE 4.0 server preparations and deployment
The second step in the four step deployment workflow for Tanzu on Cloud Director is to deploy and configure Container Service Extension Server 4.0. CSE 4.0 is quite different than its predecessors in the sense that we do not need to deploy a server and client components and do not need to use vcd-cli anymore to configure CSE server side nor to import TKG templates to cloud director. CSE 4.0 is deployed as a vApp with specific configuration parameters to be able to connect to Cloud Director APIs and then it sets everything automatically for you in the background (job well done VMware on CSE 4.0).
Step 1: Download CSE 4.0 Server OVA and Kubernetes Container plugin UI
Download CSE server 4.0 OVA from HERE and Kubernetes Container Clusters Plugin from HERE. From Cloud Director provider portal navigate to More > Customize Portal
Click on UPLOAD and then choose the Kubernetes container clusters plugin ZIP file you downloaded from VMware Customer Connect and then click NEXT and make sure to publish the plugin to all tenants
Review & finish upload plugin
You should now see the Kubernetes Container Clusters UI Plugin for CSE status as Enabled and Published, make sure to disable any previous containers plugins.
Step 2: Configure CSE 4.0 Server deployment parameters
If Kubernetes plugin we uploaded and enabled on previous step is successfully uploaded to Cloud Director you should be able to see CSE Management tab if from provider portal you navigate to More > Kubernetes Container Clusters
The first step in CSE server workflow is to download the OVAs we already talked about in step 1, next task is to upload those OVAs to a provider managed Org in a shared content library. In my setup I have an Org called CSE under which I created two shared catalogs, one with CSE 4.0 ova and the other with TKG templates, see below screenshot:
Make sure that both Catalogs are shared with the rest of Organizations configured, and when you upload CSE server 4.0 OVA and TKG template OVA make sure to upload them to Catalog as vApp Templates
For more information on how to create shared catalogs you can follow VMware docs. Once you have created two shared catalogs for CSE OVA and TKG template, navigate back to cloud director provider portal then to CSE Management and click on START in CSE configuration workflow to start step 3
In the above step, CSE has registered with cloud director and created a new role “CSE Admin Role” which we need to assign to a system user (which we will create later) and this user will be used by CSE server to talk with cloud director API. In addition, the above workflow also created a “Kubernetes Cluster Author Role” which needs to be assigned to tenant users who will be responsible for creating Tanzu clusters and last VM sizing policies for Tanzu nodes that will be created under Tenants. If you click NEXT, then you will have the option to set some system configuration parameters if needed, in my lab setup I just used the defaults
Before we start CSE server vApp deployment we need to create a system user with CSE Admin role, create an API token for this user as this user will be used by CSE to interface with Cloud Director API. From Cloud Director provider portal, navigate to Administration > Users and under Users click on NEW and create a new user:
Next we need to create an API token for our cse-svc user, logout from provider portal and re-login with cse-svc user and go to User preferences
Scroll down in the page till you find API Tokens, click on NEW
Generate and copy API token for cse-svc account, we will need to add this token while we are setting up CSE server vApp.
Below generated token cannot be re-read so you have to copy and store it in external file.
Now we are ready to start deploying our CSE server vApp.
Step 3: Deploy CSE 4.0 Server vApp
Now we are ready to deploy CSE vApp, open tenant portal for our CSE Org and under Applications choose Add vApp from Catalog and then choose to which Org VDC you want to deploy vApp to. The CSE VDC below is a previously created Org VDC which I have created.
Click NEXT and then from a list of available vApp templates choose the CSE server OVA which we uploaded earlier
Click NEXT and Accept the EULA and then click NEXT
Give your vApp a name and set runtime and storage leases to Never Expires
Next specify storage resources parameters for the CSE vApp
Click NEXT and then assign compute resources
Customize vApp hardware if needed (I accepted defaults)
Step 7 is connecting vApp to a VDC network, this is a routed network similar to the one I created under the Pindakaas tenant, in the reference architecture diagram this has subnet 220.127.116.11/24 remember CSE vApp requires reachability to Cloud Director so make sure to add required NAT and FW rules to Edge gateway serving CSE Org.
Below we need to specify CSE user we created earlier (cse-svc) which CSE vApp will use to connect to Cloud Director APIs, this user must have account’s org as system while CSE server vApp Org is the name of the Org we used to deploy CSE server vApp, in my lab this is called CSE as well.
In step 9, revise the configuration and when all is good press FINISH
Once CSE vApp is created, power it on and if you want to login to the appliance then you can use username root and password is automatically generated by deployment process and you can get it by checking the Guest OS Customization section for the CSE VM deployed under CSE vApp.
To monitor CSE activity to Cloud Director, open a console to CSE VM and check a file called cse.log this is where CSE stores all logging info.
With this we have come to the end of part one of this blog post series and we have finalised the first two major phases of Kubernetes/Tanzu as a Service workflow on Cloud Director, in part two I will be finalising NSX ALB integration with Cloud Director and will deploy a TKGm cluster under Pindakaas tenant we created in part one, stay tuned.
Pingback: Offering Multi-tenant Kubernetes-as-a-Service with Cloud Director, CSE 4.0 backed by NSX-T and NSX ALB – Part II - nsxbaas
Thank you for the very enlightening blog series. We have just rolled out a similar environment. We noticed that the supported TKG templates are several months old and contain outdated software. To keep our environment secure, we would like to update these templates. Is this possible and is it supported by VMware? Can we just run apt update && apt upgrade to update the software, or will things break?
TKG templates compatible with CSE are indeed lagging bit behind due to the time it takes VMware to certify and qualify packages and release compatibility. CSE 4.0.1 is out so you might want to check the supported TKG templates there. Regarding the other part of your question, the answer is no, if you mess with the installed packages in the templates then you are running an untested/unverified and thus unsupported.
Hope this helps!