# Deploy cnvrg CORE on AWS with native support to Habana Gaudi

In this tutorial, you will learn how to install cnvrg.io on your private Amazon EKS cluster, with native support to Habana Gaudi devices. We will walk through setting up the cluster using DL1 EC2 instances and then installing cnvrg.io using the cnvrg.io Operator and helm.

# Requirements

Before you can complete the installation you must install and prepare the following dependencies on your local machine:

# Prepare an EKS cluster

We will use eksctl cli to deploy an EKS cluster containing two nodegroups, the first is for the cnvrg control plane, and the second for Habana Gaudi workloads. Before deploying Habana Gaudi workloads, we will need to choose the correct AWS AMI based on the region, and the Kubernetes version and to find the availability zone of the machine type

West-2:

Kubernetes Version AMI ID
18 ami-012dcf16d655c757f
19 ami-0f093da0acaaf4a92
20 ami-095898da639cbbbbe
21 ami-0a6edcea4417c9e9c

East-1:

Kubernetes Version AMI ID
18 ami-077ca8fa36be9b6d3
19 ami-0786b638b54209f05
20 ami-0feac853dfbaaaefc
21 ami-0e30d8dea91e2d389

Now, let's find the availability zone in which the machine type is available:

aws ec2 describe-instance-type-offerings \
    --filters Name=instance-type,Values=dl1.24xlarge \
    --location-type availability-zone --region us-east-1

Create a cluster configuration file named eks-cluster.yaml with the following content:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: habana-eks-cluster
  region: us-east-1
  version: "1.21"
availabilityZones: ['us-east-1a','us-east-1b']
nodeGroups:
  - name: cnvrg-control-plan-01
    instanceType: m5a.2xlarge # minimum
    volumeSize: 100
    minSize: 2
    maxSize: 3
    desiredCapacity: 2
    privateNetworking: true
    iam:
      attachPolicyARNs:
      - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
      - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
      - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
      - arn:aws:iam::aws:policy/AmazonS3FullAccess  #you can create policy specfic for bucket created
      withAddonPolicies:
        autoScaler: true
        imageBuilder: true 
    tags:
      k8s.io/cluster-autoscaler/enabled: 'true'
    availabilityZones: ['us-east-1a']
  - name: dl1-ng-1d2
    instanceType: dl1.24xlarge
    instancePrefix: dl1-ng-1d-worker
    volumeSize: 200
    minSize: 0
    maxSize: 4
    desiredCapacity: 1
    privateNetworking: true
    ami: ami-0c385d0d99fce057d 
    iam:
      withAddonPolicies:
        imageBuilder: true
        autoScaler: true
        ebs: true
        fsx: true
    attachPolicyARNs:
        - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
        - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
        - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        - arn:aws:iam::aws:policy/AmazonS3FullAccess #you can create policy specfic for bucket created
    tags:
      k8s.io/cluster-autoscaler/enabled: 'true'
      k8s.io/cluster-autoscaler/node-template/resources/habana.ai/gaudi: "8"
      k8s.io/cluster-autoscaler/node-template/resources/hugepages-2Mi: "30000Mi"
    overrideBootstrapCommand: |
      #!/bin/bash
      /etc/eks/bootstrap.sh habana-eks-cluster
    availabilityZones: ['us-east-1d']

Use eksctl to create the cluster using the cluster config file.

eksctl create cluster -f eks-cluster.yaml

Once the EKS cluster is ready, use Helm to deploy an automatic node autoscaler. First, add the Helm repository of the autoscaler:

helm repo add autoscaler https://kubernetes.github.io/autoscaler

Create the chart deployment using Helm install command, make sure to change:

  • autoDiscovery.clusterName: cluster name as defined in ClusterConfig
  • image.tag: equal to the version of the Kubernetes version
helm install cluster-autoscaler autoscaler/cluster-autoscaler -n kube-system \
--set autoDiscovery.clusterName=habana-eks-cluster \ 
--set awsRegion=us-east-1 \
--set image.tag=v1.21.0 \ 
--set replicaCount=1 \
--set extraArgs.skip-nodes-with-system-pods=false,extraArgs.skip-nodes-with-local-storage=false,extraArgs.cloud-provider=aws

Verify the readiness of the autoscaler using Kubectl command:

kubectl get pods -n kube-system |grep autoscaler

# Install cnvrg

# Install and Update the Helm repo

Run the following command to download the most updated cnvrg helm charts:

helm repo add cnvrg https://charts.cnvrg.io
helm repo update

# Deploy cnvrg

The following is the minimum Helm command with the parameters required by cnvrg to run and utilize Habana Gaudi AMI. it will install cnvrg with an Istio Ingress controller, and S3 as the object storage backend.

Before deploying cnvrg you will need:

  • wildcard dns record for cnvrg
  • The ip address of your cluster
  • The details of the S3 bucket
  • EKS kubeconfig file

Optional:

  • cnvrg premium user credentials for deploying cnvrg premium
helm install cnvrg cnvrgv3/cnvrg --timeout 1500s --wait --namespace=cnvrg --create-namespace \
--set clusterDomain=YOUR-DOMAIN  \
--set registry.user=cnvrghelm \
--set controlPlane.image=cnvrg/core:3.10.1 \
--set controlPlane.baseConfig.featureFlags.HABANA_ENABLED="true" \
--set gpu.habanaDp.enabled=true \
--set monitoring.habanaExporter.enabled=true \
--set controlPlane.objectStorage.bucket=YOUR-S3-BUCKET \
--set controlPlane.objectStorage.accessKey=YOUR-S3-BUCKET-ACCESSKEY \
--set controlPlane.objectStorage.secretKey=YOUR-S3-BUCKET-SECRETKEY \
--set controlPlane.objectStorage.region=YOUR-S3-BUCKET-REGION\

# Advanced Helm Options

There are numerous ways to customize the installation to best fit your own infrastructure and requirement, including disk sizes, memory information, versions, and so on. For the full list of customizable flags, click here.

# Completing the Setup

The helm install command can take up to 10 minutes. When the deployment completes, you can go to the URL of your newly deployed cnvrg or add the new cluster as a resource inside your organization. The helm command will inform you of the correct URL:

🚀 Thank you for installing cnvrg.io!

Your installation of cnvrg.io is now available, and can be reached via:
http://app.mydomain.com

Talk to our team via email at hi@cnvrg.io

# Monitoring the deployment

You can monitor and validate the deployment process by running the following command:

kubectl -n cnvrg get pods

When the status of all the containers is running or completed, cnvrg will have been successfully deployed. It should look similar to the below output example:

NAME                                    READY   STATUS      RESTARTS   AGE
cnvrg-app-69fbb9df98-6xrgf              1/1     Running     0          2m
cnvrg-sidekiq-b9d54d889-5x4fc           1/1     Running     0          2m
controller-65895b47d4-s96v6             1/1     Running     0          2m
init-app-vs-config-wv9c4                0/1     Completed   0          9m
init-gateway-vs-config-2zbpp            0/1     Completed   0          9m
init-minio-vs-config-cd2rg              0/1     Completed   0          9m
istio-citadel-c58d68844-bcwv7           1/1     Running     0          2m
istio-galley-67dfcd65c5-vb2jf           1/1     Running     0          2m
istio-ingressgateway-6d48767f5b-mw4q8   1/1     Running     0          2m
istio-pilot-7bb78bbfb9-dpq6q            2/2     Running     0          2m
minio-0                                 1/1     Running     0          2m
postgres-0                              1/1     Running     0          2m
redis-695c49c986-kcbt9                  1/1     Running     0          2m
seeder-wh655                            0/1     Completed   0          2m
speaker-5sghr                           1/1     Running     0          2m

NOTE

The exact list of pods may be different, as it depends on the flags that you used with the helm install command. As long as the statuses are running or completed, the deployment will have been successful.

# Monitoring your cluster using Kibana and Grafana

Now that cnvrg has been deployed, you can access the Kibana and Grafana dashboards of your cluster.

They are great tools for monitoring the health of your cluster and analyzing the logs of your cluster.

To access Kibana, go to: kibana.<your_domain>.com

To access Grafana, go to: grafana.<your_domain>.com

# Delete cnvrg CORE

If you would like to delete the cnvrg deployment using Helm, run the following command:

helm uninstall cnvrg -n cnvrg

# Upgrade a cnvrg Installation

If you would like to upgrade an existing Helm installation, run the following command with the other settings as required for your install:

helm upgrade cnvrg cnvrg/cnvrg --reuse-values \
  --set cnvrgApp.image=<image>
Last Updated: 12/7/2021, 1:19:00 PM