# Add Node Pools to a GCP GKE Cluster

Different ML workloads need different compute resources. Sometimes, 2 CPUs is enough, but other times you need 2 GPUs. With the benefit of Kubernetes, you can have multiple node pools, each containing different types of instances/machines. With the addition of auto-scaling, you can make sure they are only live when they are being used.

This is very useful when setting up a Workers cluster for running your cnvrg jobs. In this guide we will explain how to add extra node pools to an existing GCP GKE cluster.

In this guide, you will learn how to:

  • Create new node pools in your existing GKE cluster using the gcloud command tools.

# Prerequisites: prepare your local environment

Before you can complete the installation you must install and prepare the following dependencies on your local machine:

# Create the Node Pools

cnvrg can leverage all different machine types, both CPUs and GPUs. Below are templates for CPU and GPU node pools. In each of the commands, you can customize:

  • --name (the word after create)
  • --machine-type
  • --num-nodes
  • --min-count
  • --max-count

The command may take a few minutes to finish running, but afterwards the node pool will have been added to your cluster.

# Create a node pool with CPU nodes

Use the following command to create the new CPU node pool:

PROJECT=<project-id>
CLUSTER_NAME=<cluster-name>
REGION=<region>

gcloud --project ${PROJECT} container node-pools create cpu \
    --cluster ${CLUSTER_NAME} \
	--machine-type n1-standard-2 \
	--enable-autoscaling \
    --num-nodes 0 \
	--min-nodes 0 \
	--max-nodes 5 \
	--region ${REGION} \
    --scopes=storage-rw  \
	--no-enable-autoupgrade 

TIP

You can change n1-standard-2 to any CPU machine available in your chosen region.

# Create a node pool with GPU nodes

Use the following command to create the new GPU node pool:

PROJECT=<project-id>
CLUSTER_NAME=<cluster-name>
REGION=<region>

gcloud --project ${PROJECT} container node-pools create gpu \
	--num-nodes 0 \
	--enable-autoscaling \
	--min-nodes 0 \
	--max-nodes 10 \
	--region ${REGION} \
	--cluster ${CLUSTER_NAME} \
	--machine-type custom-32-204800 \
	--metadata disable-legacy-endpoints=true \
	--image-type=ubuntu \
	--accelerator="type=nvidia-tesla-v100,count=4" \
	--node-taints nvidia.com/gpu=present:NoSchedule \
    --scopes=storage-rw  \
	--no-enable-autoupgrade 

TIP

You can change custom-32-204800 to any GPU machine available in your chosen region.

# Conclusion

The node pools will now have been added to your cluster. You can add more by following the relevant instructions. If you already have deployed cnvrg to the cluster, or added the cluster as a compute resource inside cnvrg, you will not need to do any more setup and the node pools will immediately be usable by cnvrg.

Last Updated: 4/7/2020, 8:51:49 AM