# Workspaces

A cnvrg workspace is an interactive environment for developing and running code.

The cnvrg platform supports many different interactive development environments such as Python scripts and Jupyter Notebooks. An environment is preconfigured with all dependencies preinstalled and all its files and data preserved across workspace instance restarts. Moreover, a workspace's automatic version control and scalable compute provide virtually unlimited compute resources to perform your data science research.

To run on remote computes, cnvrg has built-in support for JupyterLab, JupyterLab on Spark, RStudio, and Visual Studio Code.

workspaces

The topics in this page:

# Supported Workspaces

The cnvrg platform supports the following workspace types:

# JupyterLab

JupyterLab is a next-generation web-based user interface (UI) for Project Jupyter. JupyterLab enables you to work with documents and activities such as Jupyter Notebooks, text editors, terminals, and custom components (such a Python enviroments) in a flexible, integrated, and extensible manner.

Learn more about JupyterLab.

To switch your JupyterLab Notebook to dark (night) mode, click Settings inside the workspace. Then, click JupyterLab Theme and JupyterLab Dark.

# JupyterLab on Spark

JupyterLab can also be run on Spark in cnvrg. Select this option with a compatible Spark Compute template to use JupyterLab backed by Spark.

# RStudio

RStudio is an integrated development environment (IDE) for R. It includes a console, a syntax-highlighting editor that supports direct code execution, as well as tools for plotting, viewing history, debugging, and managing workspaces.

Learn more about R Studio.

# Visual Studio Code

Visual Studio Code is a lightweight but powerful source code editor, which runs on your desktop and is available for Windows, macOS, and Linux. It includes built-in support for JavaScript, TypeScript, and Node.js. It has a rich ecosystem of extensions for other languages (such as C++, C#, Java, Python, PHP, and Go) and runtimes (such as .NET and Unity).

Learn more about Visual Studio Code.

You can add a Visual Studio Code settings file to your user account. This is used to configure any Visual Studio Code workspaces you start.

# Workspace Environment Overview

Similar to the execution of any other cnvrg machine learning (ML) job, when a workspace is created without an existing volume, cnvrg constructs its environment as outlined in the following steps:

  1. Pulls and executes the selected container.
  2. Clones the selected Git branch and commit (if project connected to Git) into /cnvrg.
  3. Clones the latest versions of your files from your projects Files tab (or the chosen cnvrg commit) into /cnvrg.
  4. Clones or attaches the selected datasets into /data.
  5. Installs packages from the requirements.txt file (if it exists).
  6. Executes the prerun.sh script (if it exists).

When cnvrg completes these steps, the environment is fully constructed and the workspace is loaded.

For more information see Environment.

# Workspace Usage

# Starting a workspace

New Workspace

Complete the following steps to start a new workspace:

  1. Open a project and then click the Workspaces tab.
  2. Click Start a Workspace.
  3. Click Change type to select the notebook type: JupyterLab, JupyterLab on Spark, RStudio, or VSCode.
  4. Set the Title for the workspace.
  5. Select a Compute template(s) for your workspace.
  6. Select Datasets to add to your workspace, if desired. (Optional)
  7. Select a Docker Image to use for your workspace.
  8. Choose an existing Volume or create a new one. (Optional)
  9. Click Start workspace.

Workspace Starting

# Stopping a workspace

A cnvrg workspace stays active until a user stops it or it reaches the max duration.

There are two ways to stop a running workspace:

  • From the Workspaces table
  • From inside a workspace

Complete the following steps to stop a workspace that is loading or running:

  1. Click the Workspaces tab.
  2. Complete one of the following two steps:
    • In the Workspaces table, click the menu at the right end of the workspace to stop, then click Stop.
    • Click the workspace name to stop, then click the Stop button next to its name at the top of the displayed pane.
  3. Confirm the stop action.

# Rerunning a workspace

There are two ways to rerun or reopen a stopped workspace:

  • From the Workspaces table
  • From inside the workspace

Complete the following steps to rerun a stopped workspace:

  1. Click the Workspaces tab.
  2. Complete one of the following two steps:
    • In the Workspaces table, click the menu at the right end of the workspace to resume, then click Start.
    • Click the workspace name to resume, then click the Start button next to its name at the top of the displayed pane.
  3. Confirm the run action.

# Deleting a workspace

Complete the following steps to delete a workspace:

  1. Click the Workspaces tab.
  2. In the Workspaces table, click the menu at the right end of the workspace to delete, then click Delete.
  3. Confirm the deletion.

# Workspace Assets

# Using Compute

Select one or more Compute templates when starting the workspace. If the first compute is unavailable, the job attempts to run on the next compute you have selected, and so on. If you only select one compute template, the job waits until it is available to run or until the wait time reaches the max duration.

# Using Datasets

When starting a workspace, cnvrg provides the option to include one or more Datasets. You can also choose a specific query or commit of the dataset at this point.

When you clone or attach a Dataset, it is mounted in the remote compute and is accessible from the absolute path: /data/name_of_dataset/.

For example, when running an experiment, if you include the dataset hotdogs, the dataset and all its files can be found in the /data/hotdogs/ directory.

Datasets can also be added to a live workspace.

Refer to the Datasets section for more detailed information on uploading your datasets to cnvrg.

# Using Images

Select a Docker Image to serve as the basis for a virtual environment of the workspace.

The cnvrg default Docker images update regularly and support most standard use cases. Learn more about using your own custom Docker image to support your specific use case here.

# Workspace Volumes (Kubernetes Only)

Workspace Volumes allow you to create a persistent file system for your workspace. This enables you to persist the exact same files and datasets, even after shutting down workspaces and restarting them.

When a workspace is started without a volume, a fresh environment is created, requiring user reconfiguration. With a volume, cnvrg loads the exact previously configured setup. This treats a workspace as if it is your own laptop, with its configuration being the same at startup as at the previous shutdown.

Volumes are project-specific, so they can be used only by the project in which they were created. A volume has a title and a size (in GB), and can only be used by one workspace at a time.

When a volume is created, the environment is created according to the regular process. When a volume is reused, it is simply reattached at startup without the required cloning of cnvrg, data, or Git.

Workspace Volumes are supported natively in cloud Kubernetes clusters. Extra configuration is required to enable this functionality in on-premise clusters. Please contact cnvrg support to set up cnvrg Volume functionality. This feature is not supported with on-premise machines.

# Persist or attach a volume's files

A volume is attached to /cnvrg (the working directory) and /data. This means the volume persists files only in these two locations.

If you are working with a cached dataset commit, the commit does not persist in the volume and thus needs to be reattached.

Python packages and environment setup are not persisted either.

# Create a volume

Complete the following steps to start a workspace and create its volume:

  1. On a project's dashboard or from the Workspaces page, click Start a Workspace.
  2. In the displayed dialog, complete the regular workspace form: Title, Compute, Datasets, and Image.
  3. Next to Volume, click + Create New. The Volume field displays.
  4. Enter a Volume Title and Size (GB) for the new volume.
  5. Click Start Workspace. The workspace environment is constructed and the volume is created and attached.

# Use an existing volume

When restarting a closed workspace previously run with a volume, cnvrg automatically reattaches the volume. Ensure the volume is not currently in use before restarting the workspace.

Complete the following steps to create a new workspace with an existing volume:

  1. On a project's dashboard or from the Workspaces page, click Start a Workspace.
  2. In the displayed dialog, complete the regular workspace form: Title, Compute, and Image. Note: Using a volume overrides the Datasets selector, eliminating the need to select a dataset.
  3. Select the desired volume using the Volume selector. Only volumes not currently in use can be selected.
  4. Click Start Workspace. The workspace environment is constructed and the volume is attached.

# Save and reload VScode extensions automatically

When using VScode, users often want to save and use their favorite extension, so upon startup, their workspace loads it automatically. To accomplish this, complete the following steps:

  1. Start a new VScode workspace and create or attach a volume to it. The volume can be any size that meets requirements.

  1. Once the workspace runs, use the Extensions tab to install the desired extensions:

  1. Continue working with your VScode workspace.

  2. Once finished, click the Stop button (or wait for cnvrg to stop it after idle time).

When next using a VScode workspace, either start the same workspace or create a new one and attach the volume created in step 1. Once the workspace is running, the VScode extensions load automatically.

The following provides an example of a new workspace (Workspace 3). First, attaching the volume:

Then, the extensions automatically installing onto it:

# Workspace Images

The cnvrg platform uses Docker containers to ensure reproducibility throughout a machine learning (ML) pipeline. You can create Docker images from within a workspace containing its exact environment.

When using this feature, a new Docker image is constructed and any environment changes are captured. The Docker image is then pushed to the selected registry and repository with the specified tag.

Set the image to build immediately (Build Now) or when the workspace closes (Build On Stop). The image is built inside the compute the workspace is using.

This is useful in many cases. For example, after launching a workspace, you can pip install a Python package, update a framework's version, or apt-get install an Ubuntu package. In this case, the image you originally started the workspace with has changed, and can no longer be safely relied upon to handle your updated code for a new workspace or a remote experiment. This feature allows you to build a new image with an updated environment in one click. You can then use the updated image when running new workspaces, experiments, services, or other ML workloads.

# Requirements

Using this feature requires adding a private Docker registry to your organization.

# Capture image contents

Using this feature means all changes to the environment and related files are captured inside the image, including:

  • PyPI packages and version (pip)
  • Ubuntu packages and versions (apt-get)
  • Environment variables

The image does not include the files in /cnvrg and /data. To persist these files, see Workspace Volumes.

When building an image from a workspace, you can optionally enable Link to workspace. If enabled, the workspace's image metadata is updated to the new one.

This is useful for providing at-a-glance insights into a workspace's environments and, once it closes, restarting the workspace with the correct image.

# Build an image

    # Workspace Controls

    Inside a workspace, its environment displays in the center, where you can access all the functionality of JupyterLab, RStudio, and Visual Studio Code.

    However, you also have access to other controls designed to make interfacing with cnvrg from a workspace as seamless as possible.

    # Workspace titlebar

    The following controls display above the workspace:

    • Workspace Name: Click to edit the workspace title, if desired.
    • Stop: Click to stop a running workspace and release its compute resources.
    • Sync: Click to sync the workspace's working directory with cnvrg. The workspace also periodically syncs according to the Sync time set in the project Settings > Environnement tab.
    • More: Click to access the Upload (force) control, which runs the command cnvrg upload --force inside the workspace.

    # Workspace sidebar

    The workspace sidebar controls enable users to add datasets, monitor running experiments, use AI libraries, view information and logs, and use tools like SparkUI and TensorBoard.

    The following workspace icons display in the right sidebar:

    # Syncing

    Because the workspace runs on a remote compute, the workspace must be synced to commit file changes back to cnvrg.

    Enable auto-sync in your project settings to automatically sync the workspace after a specified interval has passed.

    To sync on demand, click the Sync button that displays above the workspace.

    # Datasets on a live workspace

    Click the Datasets icon in the right sidebar to control the datasets attached to your workspaces or attach new ones.

    The displayed page contains a list of the datasets currently attached. For each dataset, cnvrg displays the name, size, and file count.

    You can also choose a specific query or commit of the datasets.

    When you choose to include a dataset within a workspace (whether at start-up or on-the-fly using the sidebar), the dataset is mounted in the remote compute and is accessible from the absolute path: /data/<name_of_dataset>/.

    For example, if you include a dataset hotdogs in the workspace, the dataset and all its files can be found in the /data/hotdogs/ directory.

    If the dataset is still cloning, the cloning symbol displays next to the dataset name.

    # Attach new datasets to the workspace

    1. Click Select Datasets To Attach.
    2. For each dataset to add to the workspace, select the commit or query and the portion of the dataset to clone.
    3. Click the dataset to add it to the list of datasets for the workspace.
      You can remove a dataset from the list by clicking the X next to its name.
    4. Click Attach.

    The selected datasets begin cloning to the workspace. You can track their statuses from the datasets panel where you attached them.

    TIP

    Any datasets added on-the-fly are located at the absolute path: /data/name_of_dataset/.

    # Sync a dataset (create a new commit)

    To sync a dataset connected to the workspace, click the Sync button next to the dataset's name in the datasets tab.

    You can optionally include a commit message. Clicking Sync uploads any changes made to the dataset in the workspace to the remote dataset as a new commit.

    # AI Library components in a workspace

    You can use all of your available AI Library components from within a workspace.

    Complete the following steps to access and use an AI Library component:

    1. Click the Workspaces tab.
    2. In the right sidebar, click the Libraries icon.
    3. Select the library to use.
    4. Copy the code.
    5. Paste it in your workspace and run it.

    # Active experiments viewing from a workspace

    Complete the following steps to view experiments running from the workspace:

    1. Click the Workspaces tab.
    2. In the right sidebar, click the Experiments icon.

    Experiments currently running from the workspace are displayed here, where you can view their status and click through to their pages.

    # Apps and Dashboards publication from a workspace

    Complete the following steps to publish an app from a workspace:

    1. Click the Workspaces tab.
    2. In the right sidebar, click the Apps icon.
    3. Click Publish Apps.
    4. Follow the instructions here.

    # Workspace Info

    Click the Info icon to access the Workspace Info page and view a workspace's details. Accessing a closed workspace also shows the Workspace Info details.

    Along the top of the page are summary details of the workspace session:

    • Start time
    • End time
    • Duration
    • Start commit
    • Compute
    • Image
    • Volume

    Below these details are the workspace logs and underneath the logs is a list of associated commits for the workspace.

    # Remote SSH

    You can use SSH to connect with a cnvrg workspace and code locally from your machine and IDE. Most major IDEs support this remote SSH feature, including Visual Studio Code and PyCharm (excluding community edition).

    To open a remote SSH session, use the cnvrg ssh start command on your local workstation. This begins an SSH session with a selected workspace. Once you run the cnvrg ssh start command, it connects to the existing remote workspace and the SSH session goes live until you actively interrupt it. The command also identifies the host, port, username, and password required to add the workstation as an SSH host in your IDE.

    # Requirements

    The following is required to remotely SSH to your cnvrg workspace:

    TIP

    ssh-server does not need to be installed in the Docker image, as cnvrg configures this automatically.

    # Remote SSH instructions

    Complete the following steps to remotely SSH to your cnvrg workspace:

    1. Click your cnvrg project's Workspaces tab and launch any workspace type. Wait for the setup to complete.
    2. On your local machine, run the command cnvrg ssh start workspace_id in a terminal session. You can also configure this command as required.
    3. Wait for ssh to start. When it has started, the command identifies the host, port, username, and password for the SSH session.
    4. Add the SSH details as a host within your selected IDE. The default host is: ssh root@localhost -p 2222. Ensure to include the port.
    5. Connect to the newly added SSH host from your IDE. Enter the password when prompted.

    You now have connected to the workspace remotely and can code on your local machine while leveraging your workspace's remote compute. You can access all the files and contents from the remote container. When you are finished working, stop the session by ending the cnvrg ssh start command.

    NOTE

    To disable the host key checking for ssh by default, add the following lines in your ssh config file (usually located at ~/.ssh/config):

    Host localhost
      StrictHostKeyChecking no
    

    To disable it for all hosts, change localhost to *, such that:

    Host *
      StrictHostKeyChecking no
    

    TIP

    Check the cnvrg tutorials for step-by-step guides for Visual Studio Code and PyCharm.

    # Workspace experiment tracking

    A workspace provides the ability to run experiments and train models on-the-fly. You can run your scripts inside your workspace, but still access the cnvrg experiment tracking and visualization features.

    To run an experiment locally through the SDK in a remote workspace with cnvrg tracking, run the experiment using Experiment.run(). The experiment is recorded in the Experiments workspace sidebar. Track its progress either in the workspace or from the cnvrg Experiments page.

    Use the Experiment.run() method as follows:

    from cnvrg import Experiment
    e=Experiment.run(command_or_function, title="Local Workspace Experiment")
    

    For example, to run a Python 3 training script:

    from cnvrg import Experiment
    e=Experiment.run('python3 train.py')
    

    Or to run a function:

    from cnvrg import Experiment
    e=Experiment.run(train(input))
    

    NOTE

    SDK commands can also be used to run experiments outside of a cnvrg workspace.

    Learn more about using the Python SDK in the SDK docs.

    # Git Integration

    Workspaces facilitate code modifications and experiment runs and their results. While rapid experimentation is important, saving the code from a specific cnvrg job is essential to trackable and repeatable experiments. Though you can push to Git before each experiment, this can be avoided by using the git_diff parameter in your Experiment.run() command.

    When you set git_diff=True, cnvrg also syncs the Git difference and saves the changes relative to your experiment's Git index. Then, after modifying your code and running experiments, you can simply compare the experiments, identify the most successful one, clone the cnvrg commit, and then push back the changes to your Git repository.

    The following provides example experiment runs:

    • Run an experiment on remote compute using a script:

      from cnvrg import Experiment
      e=Experiment.run('python3 train,py', compute='medium', git_diff=True)
      
    • Run an experiment directly in your cnvrg workspace or local IDE (local Visual Studio Code/PyCharm):

      from cnvrg import Experiment
      e=Experiment.run(train(), git_diff=True)
      
    • Run an experiment with the CLI:

      cnvrg run --git_diff=True `python3 train.py`
      

    # Idle Workspaces

    When working in cnvrg Workspaces, users may leave them online for an extended time period without use, for example, overnight. For this reason, cnvrg includes the option to automatically designate a workspace as idle, and thereafter, start a shutdown process. An idle workspace is defined by the following two conditions:

    • No changes in the file system under the folders: /cnvrg and /data
    • Usages of CPU, memory, and GPU (if exists) are each lower than 20% After more than set X minutes, cnvrg automatically starts to shut down the workspace.

    NOTE

    The above two conditions must be met before a workspace enters an idle state.

    # Idle workspace settings

    The number of elapsed minutes before cnvrg designates a workspace as idle is defined in organization settings. By default, it is set to 60 minutes. To enable Idle workspace functionality, an administrator enables the Idle flag in the organization settings and sets the number of minutes to perform an idle time check. If a user wants to change the idle time wait period, go to project Settings > Environment tab.

    NOTE

    A user can only decrease the Idle value that is set within organization settings. If a user needs to increase this value, an administrator can set it in the project settings.

    # Shutdown process

    Once a workspace is defined as idle, cnvrg notifies the user by sending an email stating the workspace entered an idle state. The user can still access the workspace for the next 15 minutes. After the 15-minute time period elapses, cnvrg performs the following actions to start the shutdown process:

    • Syncs any changes in the /cnvrg directory
    • Builds an image (if enabled previously)
    • Retains the volume attached to the workspace and eventually releases the resource behind it
    Last Updated: 6/21/2022, 9:55:41 PM