# Deploy a Workers Cluster using the cnvrg Operator
cnvrg is desgined to be extremely flexible and supports using multiple clusters with your instance of cnvrg.
This guide will explain how to use Helm to turn a Kubernetes cluster into a workers cluster for cnvrg.
# Requirements
Before you can complete the installation you must install and prepare the following dependencies on your local machine:
- kubectl
- Helm 3.X
- Kubernetes cluster with ports 80, 443 and 6443 open
- The Kubeconfig of the above cluster
NOTE
Haven't created a cluster yet? Check our guides for setting up an AWS EKS, GCP GKE, Azure EKS or Minikube cluster.
# Choose the Domain for the Kubernetes Cluster
As part of the installation, you will need to use a subdomain of your cnvrg installation's domain.
You can choose any subdomain as long as it is a subdomain of the installed instance of cnvrg. ie subdomain.your_cnvrg_domain
For example, if your cnvrg control plane is hosted at app.my-org.cnvrg.io
, you could use:
workers.my-org.cnvrg.io
workers.app.my-org.cnvrg.io
- or any subdomain
*.my-org.cnvrg.io
After you have completed the next steps, you will need to set up the DNS listing for this subdomain in your cloud provider settings.
# Install and Update the Helm repo
Run the following command to download the most updated cnvrg helm charts:
helm repo add cnvrgv3 https://charts.v3.cnvrg.io
helm repo update
helm search repo cnvrgv3/cnvrg -l
WARNING
Please note, the version of the helm chart for the worker cluster installation must match the version of helm chart used to deploy the primary cnvrg platform deployment.
# Run the Helm Chart
Now all you need to do is run the helm install
command and all of the services and systems will be automatically installed on your cluster. The process can take up to 15 minutes.
helm install cnvrg cnvrgv3/cnvrg --create-namespace -n cnvrg --timeout 1500s \
--set clusterDomain=<subdomain.your_cnvrg_domain (as explained above)> \
--set controlPlane.webapp.enabled=false \
--set controlPlane.sidekiq.enabled=false \
--set controlPlane.searchkiq.enabled=false \
--set controlPlane.systemkiq.enabled=false \
--set controlPlane.hyper.enabled=false \
--set logging.elastalert.enabled=false \
--set dbs.minio.enabled=false
Once the command completes, you will need to set up the DNS routing.
# Completing the Setup
The helm install command can take up to 10 minutes. When the deployment completes, you can go to the url of your newly deployed cnvrg or add the new cluster as a resource inside your organization. The helm command will inform you of the correct url:
🚀 Thank you for installing cnvrg.io!
Your installation of cnvrg.io is now available, and can be reached via:
http://app.mydomain.com
Talk to our team via email at hi@cnvrg.io
# Monitoring the deployment
You can monitor and validate the deployment process by running the following command:
kubectl -n cnvrg get pods
When the status of all the containers is running
or completed
, cnvrg will have been successfully deployed.
It should look similar to the below output example:
NAME READY STATUS RESTARTS AGE
capsule-6cbcf5c55c-jfhl5 1/1 Running 0 6m57s
cnvrg-fluentbit-lpctq 1/1 Running 0 5m52s
cnvrg-fluentbit-vbzf6 1/1 Running 0 5m48s
cnvrg-ingressgateway-885b47d5d-fhh9k 1/1 Running 0 6m37s
cnvrg-operator-669748dcb5-58865 1/1 Running 0 7m8s
cnvrg-prometheus-operator-7745b9f576-xc2f5 2/2 Running 0 6m58s
config-reloader-58dbff9878-947b7 1/1 Running 0 7m1s
elasticsearch-0 1/1 Running 0 5m24s
grafana-5475f4fdb5-4fngj 1/1 Running 0 6m59s
istio-operator-6d685ccbdb-ld7wr 1/1 Running 0 6m59s
istiod-68fdcfb685-4ph7p 1/1 Running 0 6m43s
kibana-84584dcbdb-dqt9w 1/1 Running 0 6m58s
kube-state-metrics-595675c49d-gn9x7 3/3 Running 0 6m58s
mpi-operator-7ddd6974fc-mkxp8 1/1 Running 0 6m58s
node-exporter-mx24s 2/2 Running 0 6m58s
node-exporter-shgvz 2/2 Running 0 6m58s
postgres-bc9b68649-92tg5 1/1 Running 0 6m59s
prometheus-cnvrg-infra-prometheus-0 3/3 Running 1 6m25s
redis-56d9cb6d76-zgpdx 1/1 Running 0 7m1s
NOTE
The exact list of pods may be different, as it depends on the flags that you used with the helm install
command. As long as the statuses are running
or completed
, the deployment will have been successful.
# Set up the DNS Routing
After the helm installation completes, you will need to set up the routing between the ip of the cluster and the subdomain you have chosen, within your DNS. You will need to create a CNAME
/A
listing for *.<subdomain.your_cnvrg_domain>
with the ip address of the autoscaler for the cluster. Make sure you include the wildcard: *
. The domain is the same domain you should enter as clusterDomain
in the helm command.
To get the ip address of the cluster run the following command after cnvrg has been deployed:
kubectl -n cnvrg get svc | grep ingress | awk '{print $4}'
It may take a few minutes before the DNS resolves correctly.
# Creating a Kubeconfig
cnvrg require a serviceAccount with a kubeconfig in the worker cluster in order to grant cnvrg control over the worker cluster.
SERVICE_ACCOUNT_NAME=cnvrg-job
CONTEXT=$(kubectl config current-context)
NAMESPACE=cnvrg
NEW_CONTEXT=cnvrg-job
KUBECONFIG_FILE="kubeconfig-cnvrg-job"
SECRET_NAME=$(kubectl get serviceaccount ${SERVICE_ACCOUNT_NAME} \
--context ${CONTEXT} \
--namespace ${NAMESPACE} \
-o jsonpath='{.secrets[0].name}')
TOKEN_DATA=$(kubectl get secret ${SECRET_NAME} \
--context ${CONTEXT} \
--namespace ${NAMESPACE} \
-o jsonpath='{.data.token}')
TOKEN=$(echo ${TOKEN_DATA} | base64 -d)
# Create dedicated kubeconfig
# Create a full copy
kubectl config view --raw > ${KUBECONFIG_FILE}.full.tmp
# Switch working context to correct context
kubectl --kubeconfig ${KUBECONFIG_FILE}.full.tmp config use-context ${CONTEXT}
# Minify
kubectl --kubeconfig ${KUBECONFIG_FILE}.full.tmp \
config view --flatten --minify > ${KUBECONFIG_FILE}.tmp
# Rename context
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
rename-context ${CONTEXT} ${NEW_CONTEXT}
# Create token user
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
set-credentials ${CONTEXT}-${NAMESPACE}-token-user \
--token ${TOKEN}
# Set context to use token user
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
set-context ${NEW_CONTEXT} --user ${CONTEXT}-${NAMESPACE}-token-user
# Set context to correct namespace
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
set-context ${NEW_CONTEXT} --namespace ${NAMESPACE}
# Flatten/minify kubeconfig
kubectl config --kubeconfig ${KUBECONFIG_FILE}.tmp \
view --flatten --minify > ${KUBECONFIG_FILE}
# Remove tmp
rm ${KUBECONFIG_FILE}.full.tmp
rm ${KUBECONFIG_FILE}.tmp
Now that we have a serviceAccount with a Kubeconfig we need to grand him the required permissions to communicate with Kubernetes API and create workspaces and experimentes in the worker cluster, create and apply the following ClusterRoler and ClusterRoleBinding:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cnvrg-job
rules:
- apiGroups:
- ""
- "networking.istio.io"
- "apps"
resources: ["*"]
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
...
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cnvrg-job-binding
roleRef:
kind: ClusterRole
name: cnvrg-job
apiGroup: ""
subjects:
- kind: ServiceAccount
name: cnvrg-job
namespace: cnvrg
apiGroup: ""
# Add the Cluster as a Compute Resource
After you have completed the DNS routing and the DNS resolves correctly, you can go ahead and add the cluster as a Compute Resource in your control plane.
You will need the domain of the workers cluster and the full Kubeconfig.
# Conclusion
After following the above steps, the workers cluster will be configured and added to your organization, and ready for jobs. Feel free to configure your Compute Templates as required and started running ML workloads.