# Deploy a Workers Cluster using the cnvrg Operator

cnvrg is desgined to be extremely flexible and supports using multiple clusters with your instance of cnvrg.

This guide will explain how to use Helm to turn a Kubernetes cluster into a workers cluster for cnvrg.

# Requirements

Before you can complete the installation you must install and prepare the following dependencies on your local machine:

  • kubectl
  • Helm 3.X
  • Kubernetes cluster with ports 80, 443 and 6443 open
  • The Kubeconfig of the above cluster


Haven't created a cluster yet? Check our guides for setting up an AWS EKS, GCP GKE, Azure EKS or Minikube cluster.

# Choose the Domain for the Kubernetes Cluster

As part of the installation, you will need to use a subdomain of your cnvrg installation's domain.

You can choose any subdomain as long as it is a subdomain of the installed instance of cnvrg. ie subdomain.your_cnvrg_domain

For example, if your cnvrg control plane is hosted at app.my-org.cnvrg.io, you could use:

  • workers.my-org.cnvrg.io
  • workers.app.my-org.cnvrg.io
  • or any subdomain *.my-org.cnvrg.io

After you have completed the next steps, you will need to set up the DNS listing for this subdomain in your cloud provider settings.

# Install and Update the Helm repo

Run the following command to download the most updated cnvrg helm charts:

helm repo add cnvrgv3 https://charts.v3.cnvrg.io
helm repo update 
helm search repo cnvrgv3/cnvrg -l


Please note, the version of the helm chart for the worker cluster installation must match the version of helm chart used to deploy the primary cnvrg platform deployment.

# Run the Helm Chart

Now all you need to do is run the helm install command and all of the services and systems will be automatically installed on your cluster. The process can take up to 15 minutes.

helm install cnvrg cnvrgv3/cnvrg --create-namespace -n cnvrg --timeout 1500s \
 --set clusterDomain=<subdomain.your_cnvrg_domain (as explained above)> \
 --set controlPlane.webapp.enabled=false \
 --set controlPlane.sidekiq.enabled=false \
 --set controlPlane.searchkiq.enabled=false \
 --set controlPlane.systemkiq.enabled=false \
 --set controlPlane.hyper.enabled=false \
 --set logging.elastalert.enabled=false \
 --set dbs.minio.enabled=false 

Once the command completes, you will need to set up the DNS routing.

# Completing the Setup

The helm install command can take up to 10 minutes. When the deployment completes, you can go to the url of your newly deployed cnvrg or add the new cluster as a resource inside your organization. The helm command will inform you of the correct url:

🚀 Thank you for installing cnvrg.io!

Your installation of cnvrg.io is now available, and can be reached via:

Talk to our team via email at hi@cnvrg.io

# Monitoring the deployment

You can monitor and validate the deployment process by running the following command:

kubectl -n cnvrg get pods

When the status of all the containers is running or completed, cnvrg will have been successfully deployed. It should look similar to the below output example:

NAME                                    READY   STATUS      RESTARTS   AGE
capsule-6cbcf5c55c-jfhl5                     1/1     Running    0          6m57s
cnvrg-fluentbit-lpctq                        1/1     Running    0          5m52s
cnvrg-fluentbit-vbzf6                        1/1     Running    0          5m48s
cnvrg-ingressgateway-885b47d5d-fhh9k         1/1     Running    0          6m37s
cnvrg-operator-669748dcb5-58865              1/1     Running    0          7m8s
cnvrg-prometheus-operator-7745b9f576-xc2f5   2/2     Running    0          6m58s
config-reloader-58dbff9878-947b7             1/1     Running    0          7m1s
elasticsearch-0                              1/1     Running    0          5m24s
grafana-5475f4fdb5-4fngj                     1/1     Running    0          6m59s
istio-operator-6d685ccbdb-ld7wr              1/1     Running    0          6m59s
istiod-68fdcfb685-4ph7p                      1/1     Running    0          6m43s
kibana-84584dcbdb-dqt9w                      1/1     Running    0          6m58s
kube-state-metrics-595675c49d-gn9x7          3/3     Running    0          6m58s
mpi-operator-7ddd6974fc-mkxp8                1/1     Running    0          6m58s
node-exporter-mx24s                          2/2     Running    0          6m58s
node-exporter-shgvz                          2/2     Running    0          6m58s
postgres-bc9b68649-92tg5                     1/1     Running    0          6m59s
prometheus-cnvrg-infra-prometheus-0          3/3     Running    1          6m25s
redis-56d9cb6d76-zgpdx                       1/1     Running    0          7m1s


The exact list of pods may be different, as it depends on the flags that you used with the helm install command. As long as the statuses are running or completed, the deployment will have been successful.

# Set up the DNS Routing

After the helm installation completes, you will need to set up the routing between the ip of the cluster and the subdomain you have chosen, within your DNS. You will need to create a CNAME/A listing for *.<subdomain.your_cnvrg_domain> with the ip address of the autoscaler for the cluster. Make sure you include the wildcard: *. The domain is the same domain you should enter as clusterDomain in the helm command.

To get the ip address of the cluster run the following command after cnvrg has been deployed:

kubectl -n cnvrg get svc | grep ingress | awk '{print $4}'

It may take a few minutes before the DNS resolves correctly.

# Creating a Kubeconfig

cnvrg require a serviceAccount with a kubeconfig in the worker cluster in order to grant cnvrg control over the worker cluster.

CONTEXT=$(kubectl config current-context)


SECRET_NAME=$(kubectl get serviceaccount ${SERVICE_ACCOUNT_NAME} \
  --context ${CONTEXT} \
  --namespace ${NAMESPACE} \
  -o jsonpath='{.secrets[0].name}')
TOKEN_DATA=$(kubectl get secret ${SECRET_NAME} \
  --context ${CONTEXT} \
  --namespace ${NAMESPACE} \
  -o jsonpath='{.data.token}')

TOKEN=$(echo ${TOKEN_DATA} | base64 -d)

# Create dedicated kubeconfig
# Create a full copy
kubectl config view --raw > ${KUBECONFIG_FILE}.full.tmp
# Switch working context to correct context
kubectl --kubeconfig ${KUBECONFIG_FILE}.full.tmp config use-context ${CONTEXT}
# Minify
kubectl --kubeconfig ${KUBECONFIG_FILE}.full.tmp \
  config view --flatten --minify > ${KUBECONFIG_FILE}.tmp
# Rename context
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
  rename-context ${CONTEXT} ${NEW_CONTEXT}
# Create token user
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
  set-credentials ${CONTEXT}-${NAMESPACE}-token-user \
  --token ${TOKEN}
# Set context to use token user
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
  set-context ${NEW_CONTEXT} --user ${CONTEXT}-${NAMESPACE}-token-user
# Set context to correct namespace
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
  set-context ${NEW_CONTEXT} --namespace ${NAMESPACE}
# Flatten/minify kubeconfig
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
  view --flatten --minify > ${KUBECONFIG_FILE}
# Remove tmp
rm ${KUBECONFIG_FILE}.full.tmp

Now that we have a serviceAccount with a Kubeconfig we need to grand him the required permissions to communicate with Kubernetes API and create workspaces and experimentes in the worker cluster, create and apply the following ClusterRoler and ClusterRoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
  name: cnvrg-job
- apiGroups:
  - ""
  - "networking.istio.io"
  - "apps"
  resources: ["*"]
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
  name: cnvrg-job-binding
  kind: ClusterRole
  name: cnvrg-job
  apiGroup: ""
- kind: ServiceAccount
  name: cnvrg-job
  namespace: cnvrg
  apiGroup: ""

# Add the Cluster as a Compute Resource

After you have completed the DNS routing and the DNS resolves correctly, you can go ahead and add the cluster as a Compute Resource in your control plane.

You will need the domain of the workers cluster and the full Kubeconfig.

# Conclusion

After following the above steps, the workers cluster will be configured and added to your organization, and ready for jobs. Feel free to configure your Compute Templates as required and started running ML workloads.

Last Updated: 11/10/2021, 8:12:07 AM