# Containers

The cnvrg platform uses Docker containers to create environments that ensure reproducibility throughout machine learning (ML) pipelines. A Docker container image is one of the main building blocks that cnvrg requires to execute a job. A cnvrg job can be a workspace, experiment, flow, endpoint, or web app.

The cnvrg software affords its users the flexibility to set their environment exactly as required by providing them the ability to:

  • Manage and utilize a versatile set of computes that meet a range of requirements, as documented here.
  • Add and use custom Docker images to run experiments, launch workspaces, build flows, and configure endpoints and apps, as described here.

Every team and user has full control to build or use custom Docker images. In some cases, a user may want to use a custom library without building a Docker image. To do so, use requirements.txt and prerun.sh files.

The topics in this page:

# Environment Variables

The environment for every cnvrg job contains a selection of useful environment variables that users can access as part of their scripts and experiments.

The full cnvrg list is as follows:

CNVRG_COMPUTE_CPU
CNVRG_COMPUTE_MEMORY
CNVRG_COMPUTE_GPU

CNVRG_COMPUTE_TEMPLATE
CNVRG_COMPUTE_CLUSTER

CNVRG_JOB_ID
CNVRG_JOB_URL
CNVRG_JOB_NAME
CNVRG_JOB_TYPE

# Use Cases

Different use cases require access to a custom library instead of a new custom Docker image.

# requirements.txt (Python)

There are some cases when a user wants to use a different version of a library or a package that isn't installed on the machine or included in the Docker image.

To support this use case, cnvrg offers an easy solution. If a requirements.txt file exists in the project root directory, cnvrg installs the packages listed in the file before the workspace or experiment starts.

Complete the following steps to use a requirements.txt file:

  1. Create a file named requirements.txt and save it in the project head tree (in the main project directory).
  2. Specify the Python libraries to use.

WARNING

If creating the file in a local environment, ensure to sync the file to cnvrg or the Git repository.

The following provides an example requirements.txt file:

docutils==0.11
Jinja2==2.7.2
MarkupSafe==0.19
Pygments==1.6
Sphinx==1.2.2
numpy
tensorflow

To generate requirements.txt of the similar format for custom libraries in a project, run:

pip freeze > requirements.txt

# prerun.sh Script

When using cnvrg to run experiments or to start a notebook, there are some cases when a user wants to run a bash script before starting the experiment or notebook.

To support this use case, cnvrg offers an easy solution. When the job starts, it searches for a prerun.sh file and automatically runs it before the experiment or notebook starts.

Complete the following steps to use a prerun.sh script:

  1. Create a file named prerun.sh and save it in the project head tree (in the main project directory). Include all the commands to execute before the experiment starts.
  2. Sync or push the file before starting an experiment or a notebook.

# Container Registries

There are two types of container registries in cnvrg:

  • Public registries such as cnvrg, Docker Hub, and NVIDIA. These registries are marked with a globe. Images can be only pulled from cnvrg public registries. Users within cnvrg cannot push new images to them.
  • Private registries such as ACR, ECR, GCR, NVIDIA private, and Docker Hub private. Images can be both pulled from and pushed to private registries.

The cnvrg platform includes several default Docker container registries. Users can also connect to other container registries beyond those cnvrg provides by default. This allows users to pull from these public registries and add their new images to cnvrg. In the case of a private registry, users can also build and push new images to it.

# Access default registries in cnvrg

Click the Containers tab of your organization to display the cnvrg-included default registries, described in the following sections.

# cnvrg

An organization's account is automatically connected to cnvrg's Docker container registries, which contain the default images the cnvrg team builds and maintains.

# NVIDIA NGC

NVIDIA's NGC platform is integrated with cnvrg. Without any further setup, a user can pull all of NVIDIA's Docker images.

# Docker Hub

The cnvrg platform is connected to the official Docker Hub container registry. Users can pull images from any public Docker Hub repository by simply adding the details to cnvrg. Follow the instructions here.

TIP

To pull an image from a public registry, the best practice is to connect to it, but this is not a requirement.

# Add a registry to cnvrg

Complete the following steps to add a private or public registry to cnvrg:

  1. Click the Containers tab of your organization.

  2. Click Add registry.

  3. From the list of registries, select one to connect to.

  4. In the Registry URL field, enter the full registry URL.

  5. Enter a Title for the new registry.

    NOTE

    If authentication is required for the new registry, click Authentication and provide the credentials (username and password).

    Authentication is optional, because most cloud providers can be connected directly to the Kubernetes cluster, without additional authentication needed. It may also be a public registry.

  6. Click Save.

# Default Docker Images

The cnvrg platform includes a set of Docker images that meet most ML requirements of data scientists. Using ready-made images saves them the often complex task of building Docker images that cnvrg supports.

Users can also add custom Docker images to use in cnvrg.

To view the images and registries within cnvrg, navigate to the Containers tab of your organization.

# Registries section

On the Containers page, the top Registries section displays the registries cnvrg has been configured to connect to.

A container registry is a storage and content delivery system, essentially a holding repository, which contains the Docker images available in different tagged versions.

Users can connect cnvrg to any additional registry to which they have access.

Click an icon to display the container registry's page. From there, edit or delete the registry as well as manage images (build and pull).

# Images section

The Containers page's bottom Images section displays a table listing the Docker images within cnvrg, including both the cnvrg-provided default images (residing in the cnvrg registry) as well as user-added images.

For each image, the table displays the following information:

  • Status
  • Repository name
  • Tag
  • Registry name
  • Creation date
  • Author name

Users can sort and filter the data in the table.

Click an entry in the table to display a summary of the image and its readme.md file. From there, edit or delete the image.

General

# Custom Docker Images

While cnvrg supplies capable, ready-to-use Docker images by default, users can add their own images, as required.

NOTE

The working directory of any images used in cnvrg is /cnvrg, regardless of the Dockerfile setting. To learn the requirements for custom images, see the next Custom Image Requirements section.

The cnvrg software platform enables users to:

  • Build a Docker image: Users can upload a custom Dockerfile; cnvrg builds the Docker image and pushes it to the selected private registry.

    While cnvrg builds the image, it streams the build log live. After the image is ready, cnvrg adds it to the Images table. Then, cnvrg sends an email notification about the success or failure of the image build.

    NOTE

    Building container images is currently only supported for clusters using Docker Engine as the container runtime.

    NOTE

    Users can only build and push an image after connecting cnvrg to a private registry.

  • Pull a Docker image: Users specify the information of an existing Docker image.

    After successfully adding a Docker image, a user can select the newly created Docker image when running a job like a workspace or an experiment. When selected, cnvrg pulls the image from the Kubernetes node and uses it to run that specific job. If the image is already available locally, is not pulled but just loaded.

    For AWS, cnvrg creates a new AMI. For on-premise machines, the Docker image must be available locally on the machine.

    NOTE

    Users can pull images from public registries without connecting to them.

    Several default registries come preconfigured in cnvrg.

    TIP

    Users can also access the Build Image and Pull Image panes from within the page of the relevant registry.

# Custom image requirements

The cnvrg platform is designed to simplify the use of custom Docker images. However, the following important prerequisites are required:

  • Install the desired code language such as Python and R.
  • Install R Studio or R Shiny, as desired.
  • Ensure pip is installed (version > 10.0).
  • Install tar to use custom images with a Visual Studio Code workspace.
  • Install git and ssh to use Git commands (such as git push). Note: When stating a job, cnvrg clones the Git repository (when a project is linked with Git).

TIP

When using a container in cnvrg, the project is downloaded or cloned into /cnvrg. It does not need to be set as the WORKDIR in the Dockerfile.

If the Install Job Dependencies Automatically organization setting is enabled, a custom Docker image automatically works with all of the cnvrg jobs like workspaces, experiments, and endpoints. This includes the cnvrg SDK. The toggle for this feature can be found in an organization's settings.

# Custom Docker image actions

In cnvrg, users can build and pull their custom Docker images.

# Build a custom Docker image

Complete the following steps to build a custom Docker image:

  1. Click the Containers tab of your organization.
  2. Click the Add Image button.
  3. In the displayed dialog, click Build Image to display the Build Image pane.
  4. In the drop-down list, select the Registry where the image is to be pushed.
  5. Enter the Repository name and Tag of the image to be created.
  6. In the Compute drop-down list, select one or more compute engines (cnvrg will attempt to run the job on the first available compute engine selected).
  7. Paste in the custom Dockerfile. Each time a user edits the Dockerfile and clicks Add, cnvrg pushes it again to the relevant repository.
  8. Enter a readme.md (optional).
  9. Click Change Logo and select an image from the list provided (optional).
  10. Click Add.

Cnvrg builds the image, pushes it to the relevant repository, and adds it to the list of available Docker images.

# Pull a custom Docker image

Complete the following steps to pull a custom Docker image:

  1. Click the Containers tab of your organization.
  2. Click the Add Image button.
  3. In the displayed dialog, click Pull Image to display the Pull Image pane.
  4. Complete one of the following two steps:
    • Enter the full Docker URL.
    • Select the registry in the drop-down list and enter the repository name and the tag.
  5. Enter a readme.md (optional).
  6. Click Change Logo and select an image from the list provided (optional).
  7. Click Add.

Cnvrg adds the image to the list of available Docker images.

# Custom images with on-premise compute

If using on-premise hardware with cnvrg, there are a few differences in the Docker image process.

# Build a Docker image (on-premises)

If building an image on an on-premise compute, users have the option to not push the image to a registry. In this case, the image is built and available locally for use on the on-premise hardware.

NOTE

Users can still push the image, as desired.

# Pull a Docker image (on-premises)

To use a specific Docker image on on-premise hardware, users must first ensure it is available locally on the machine. Then, add the Docker image details and select the Pull from existing repository option.

NOTE

Users cannot pull a Docker image to on-premise hardware.

# Docker Image Usage in Jobs

There are three ways to select a Docker image for a cnvrg job:

NOTE

Cnvrg always pulls the latest version of the Docker image and tag as selected for the job.

# Using the web UI image selector

Users can use the cnvrg UI image selector to choose a Docker image when starting a workspace, preparing a flow task, creating a new experiment, or deploying an endpoint from the web UI. Experiment image choice

Complete the following steps to choose an image using the image selector:

  1. Click Start Workspace, New Experiment, Task, or Publish, depending on the cnvrg job being run.
  2. Provide the other relevant details in the displayed pane.
  3. Click the Image drop-down list to select an image.
  4. Click the image repository toggle list and then select the desired tag. The repository and tag are now selected in the Image drop-down list.
  5. Click Start Workspace, Run, Save Changes, or Deploy Endpoint, depending on the cnvrg job.

The selected image will be used as the virtual environment.

TIP

The image selector is located in the Environment section for experiments and in the Advanced tab for flow tasks.

# Using the cnvrg CLI

To use a new image through the CLI, add the --image flag in the run command. For example:

cnvrg run --image="tensorflow:19.07" python3 mnist.py

# Using the Python SDK

In the run() SDK call, pass the name of the image. For example:

from cnvrg import Experiment()
e=Experiment()
e.run(python3 train.py,
      image="tensorflow:19.07")

# Organization Setting: Install Job Dependencies Automatically

There are cnvrg recommendations and required packages depending on whether the organization setting Install Job Dependencies Automatically is enabled or disabled.

# Recommendations when Install Job Dependencies Automatically is enabled

While not required, the cnvrg team recommends users install the cnvrg CLI in their Docker image so they can use all of the CLI commands. Otherwise, the CLI can be used only in the folders /cnvrg (the workdir) and /data (where datasets are mounted).

To use Git easily inside a JupyterLab workspace, the team also recommends users install the JupyterLab Git extension.

# Required packages when Install Job Dependencies Automatically is disabled

If the Install Job Dependencies Automatically organization setting is disabled, users must install the required packages themselves. They also cannot use the cnvrg CLI and SDK unless they install these themselves.

The following table lists the required packages to install for each feature to use with a custom Docker image.

You may need to scroll horizontally to see the full table.

Package Experiments JupyterLab Workspace R Studio Workspace Shiny App Dash App Voila App Tensorboard Compare Serving
tensorboard
butterfly
jupyterlab
jupyterlab-git
RStudio
dash
dash-daq
voila
pygments
gunicorn
flask

# Packages installation guides

WARNING

Creating a Dockerfile and building a working custom image can be a technically challenging process. Each situation differs and requires research while building a Dockerfile.

The following code snippets are examples, which may not work in a particular Dockerfile.

NOTE

Use your terminal to run the following example commands, if applicable and required.

# cnvrg CLI

RUN apt-get install -y ruby ruby-dev
RUN gem install cnvrg --no-ri --no-rdoc

# pip (version > 10)

RUN pip install --upgrade pip

# tar

RUN apt-get install tar

# cnvrg SDK

RUN pip install cnvrg

# TensorBoard

RUN pip3 install tensorboard

# Butterfly

RUN pip3 install butterfly

# JupyterLab

RUN pip3 install jupyterlab

# JupyterLab-git

RUN jupyter labextension install @jupyterlab/git
RUN pip3 install jupyterlab-git && jupyter serverextension enable --py jupyterlab_git

# Dash

RUN pip3 install dash

# Dash DAQ

RUN pip3 install dash-daq

# Voila

RUN pip3 install voila

# Pygments

RUN pip3 install Pygments

# Gunicorn

RUN pip3 install gunicorn

# Flask

RUN pip3 install flask

# R Studio

The simplest method is to build an image based on a Rocker image.

FROM rocker/rstudio:latest

# R Shiny

The simplest method is to build an image based on the official R Shiny Rocker image.

docker pull rocker/shiny

# Other Environment Configurations

# Custom Python Modules

Users can set up their own custom Python modules within cnvrg. To avoid any issues, apply the following best practices when setting up custom modules:

  1. Create and add a file named __init__.py to the modules folder in the file structure.
  2. Add the following code snippet to the beginning of the code being executed:
import os, sys
from os.path import dirname, join, abspath
sys.path.insert(0, abspath(join(dirname(__file__), '..')))

# Git Submodules

If a project is connected to a Git repository that contains submodules, the following prerun.sh file must be added to the repository or Files. Also required is an OAuth Git token set for the user account.

If a user has not already, create a prerun.sh file and then add the code snippet below.

Use the following command to add to the prerun.sh file:

git config --global url."https://x-access-token:${GIT_REPO_CLONE_TOKEN}@github".insteadOf https://github
git submodule init
git submodule update

TIP

A prerun.sh script allows users to run commands to further customize their environment at the start of every job.

# Automate cnvrg Authentication

Cnvrg authentication enables users to easily automate cnvrg CLI commands using a script.

First, use a machine with the cnvrg CLI installed. Locate the password/token from the .netrc file using the terminal and the cat command:

cat ~/.netrc

In the script being created, add the following lines:

export CNVRG_USER="username"
export CNVRG_EMAIL="email@address.com"
export CNVRG_TOKEN="password_from_previous_step"
export CNVRG_OWNER="organization_name"
export CNVRG_API="your_API_URL"
cnvrg auth

Thus, any cnvrg CLI commands now work as part of the script.

Then, use this authentication to automate the following commands:

mkdir data2
cd data2 && cnvrg data init --title="test_data" && cnvrg data sync

TIP

If unsure of the API URL, refer to the CLI documentation.

Last Updated: 10/19/2023, 2:42:32 PM