# Experiments

cnvrg allows you to run experiments.

An experiment can be any executable, written in any language: Python (for example,python neuralnet.py), R, Java, Scala and more. It can also be an existing Jupyter notebook.

You can run an experiment on any remote compute (cloud or on-premises) simply and efficiently by using the cnvrg web UI, cnvrg CLI and cnvrg SDK, or through API calls.

When running an experiment, cnvrg automatically takes a snapshot of your code, launches a worker, installs all dependencies (docker-based) and runs your experiment. cnvrg frees up resources for other jobs once the experiment is over.

cnvrg also provides the option to stop, delete and rerun experiments. When running an experiment in the cloud, cnvrg automatically launches an instance, run the experiment and shutdown the instance when it’s over.

When an experiment is running, cnvrg provides the ability to track live exactly what occurs during the session. All standard output is visible to you, resource usage (GPU, CPU, RAM, Disk IO) are periodically checked and reported to you.

The topics in this page:

# Running Experiments

There are several ways of running experiments:

  • Through the web UI
  • Using the cnvrg CLI
  • Using the cnvrg SDK

# Through the web UI

To run an experiment with via the web, go to your project, click the Experiments tab and click New Experiment. Run Experiment

The run page is divided into sections grouped by category. You can open each section to fill in the details you require.
You can also click Open All to expand all the sections.

When you have filled in the details, click Run and cnvrg starts running the experiment according to the choices you have made.

Set a Default Template

You can also save a run template for each project. First fill in the details on the run page that you want to set as default. Afterwards, click Save as default template and confirm your decision. The details you have selected will now be loaded automatically whenever you go to run a new experiment.

# General

General

  • Command to Execute: Write the command the remote compute will run for the experiment (for example, python3 train.py).
  • Working Directory: You can customize the working directory in which the command will be executed by clicking Change. For example, if the script you wish to execute is in a sub-folder named scripts/, you can set /cnvrg/scripts/ as the working directory. The default can be set in project settings.
  • Title: You can provide your experiment with a title. By default, the experiments are named Experiment # (# is the number of experiments run before, plus 1).

TIP

You can dynamically set the title of the experiment using the cnvrg Research Assistant.

# Parameters

Parameters

Here, you can add the parameters you want to use when running a grid search. Add all the parameters you want and cnvrg will dynamically create experiments for all the permutations of parameters.

  1. Click Add to add a parameter.

  2. For each parameter, enter a key and provide values in the fields, as required:

    There are four types of parameters:

    • Float: Scale between float values. Specify a minimum and maximum value and number of steps, in addition to selecting the desired scale (Linear, Log 2, or Log 10). For example, say the key you chose is learning_rate, values might be: Min=0, Max=1, Scale=Linear and Steps=10.
    • Discrete: Comma-separated integers. This type is useful for designating discrete values. For example, say the key you chose is epochs, values might be: 10,20,100.
    • Categorical: Comma-separated strings. You specify a list of values, for example, 1,5,9,13 and cnvrg adds "" to either side of the values. For example, say the key you chose is optimizer, values might be: Adam and sgd.
    • Integer: Scale between integer values. Specify a minimum and maximum value and number of steps, in addition to selecting the desired scale (Linear, Log 2, or Log 10). For example, say the key you chose is folds, values might be: Min=1, Max=1000, Scale=Log10 and Steps=3.

cnvrg displays the number of experiments to be run in the top right corner of the panel.

TIP

Information on how to handle the inputs from this feature can be found here.

# Datasets

Datasets

In this menu, you can select one or more datasets to be mounted on the remote compute for this experiment. You can also select specific commits or queries on any of your datasets.

The dataset will be accessible at the absolute path: /data/name_of_dataset/.

TIP

More info on using datasets can be found here.

# Git integration

NOTE

The following section is relevant only if you have integrated with Git.

Git

When your project is integrated with a Git repo, you can choose the git branch and git commit of your choice.

  • Git Branch: The name of the git branch to clone (default is master).
  • Git Commit: The name of the specific commit you want to clone (default is latest).
  • cnvrg.io Commit: You can load an output directory of a previous job and your new experiment will start at that point and have access to the files from that cnvrg commit (that is, its artifacts).
  • Output Folder: Even when connected to Git, cnvrg will handle the versioning of experiment artifacts. In order for cnvrg to locate files produced by the experiment, you must set the correct directory or path here.

WARNING

Make sure the Output Folder matches the location where your code saves its changes and files. Otherwise, they will not by synced and versioned by cnvrg.

# Environment

Environment

Use this menu to choose the settings that will create the environment for the running of the experiment.

  • Compute - Choose one or more compute templates you would like your experiment to run on. If the first compute is unavailable, the experiment attempts to run on the next compute you have selected. If you only select one compute template, the experiment waits until it is available to run.
  • Image - Choose the docker image to serve as the basis for your experiment.

TIP

More info on setting up your environment can be found here.

# Scheduling

Scheduling

Here you schedule your experiment to run at a specific time or on a recurring basis.

There are three options:

  • Now
  • Schedule: Run once at a specific date and time of your choice.
  • Recurring: Run this experiment repeatedly based on a timing rule.

# More

More

In this final menu, there are a few extra options you can set:

  • AutoSync: When turned on, cnvrg will commit the experiments artifacts every 60 minutes. (Note: the time can be configured in your project's settings.)

  • Restart when idle: When turned on, your experiment will automatically restart if logs are not printed for more than 60 minutes.

  • Email Notifications: There are two options for email notifications when running experiments:

    • On Success: If you enable this option, you will receive an email when the experiment ends successfully.
    • On Error: If you enable this option, you will receive an email if an error occurs while your experiment is running.

    You can also enable email notifications with the SDK and CLI. See the next sections.

    TIP

    You can set a default behavior for email notifications in your project settings. Go to Settings > Environment > Email notifications and toggle on notifications for success or error.

# Using the cnvrg CLI

Experiments can be run locally on your machine, or on one of the cnvrg remote machines.

To run an experiment with the cnvrg CLI, go to your project directory, create a project and simply add the cnvrg run prefix to your command as follows:

cnvrg run python myscript.py

You will immediately receive a link to track the progress of your experiment via the web.

TIP

If you wish to run the experiment local first, use: --local flag (or simply, -l)

cnvrg uses your default organization or user setting for experiments. You can edit these on an experiment basis by using the different flags.

To enable email notifications with the CLI, use the --notify_on_success and --notify_on_error flags.

cnvrg run --notify_on_success --notify_on_error python3 train.py

There is a large list of commands to control your experiments properties. You can choose any combination of commands for your needs. If you don't use any specific commands, cnvrg will use the default settings.

For example:

cnvrg run  --datasets='[{id:d3-new,commit:dd7c1a3d9d5627da9aea5415e3d07202bfb5925e}, {id:d4,commit:3028f51407d83338f72f994bc283572452a877de}]' --git_branch=master --git_commit=0a787d11e1e2f01b15ac63599ebed3777f21dab8 --output_dir=output --machine="gpu" --sync_before=false python3 train.py

For more information about the flags available with the run command, see Experiments.

# Using the cnvrg SDK

You can run experiments in cnvrg using the cnvrg SDK. To run a new experiment, use the experiment.run() method:

from cnvrg import Experiment
e=Experiment.run('python3 train.py', title="SDK Experiment", compute='medium')

The code will start running an experiment on a remote compute and sync the status with cnvrg.

To enable email notifications when running an experiment, use the notify_on_success and notify_on_error parameters in experiment.run().

e = Experiment.run('python3 train.py', notify_on_success=True, notify_on_error=True)

There are many different arguments you can use in the command. The cnvrg SDK documentation has information on all of the options.

# Use Cases

As a cnvrg experiment can be any form of code, there are many different types of experiments that can be run within cnvrg.

# Preprocessing datasets

You can run processing code as an experiment in cnvrg. Ensure that you select the relevant dataset to attach to the experiment and set the command to execute (for example, Rscript preprocess.R).

If you want to update the cnvrg dataset with the processed data, ensure that you run the cnvrg data sync CLI command as part of your code, in the dataset directory. Information on where your dataset is located in the remote compute is here.

# Training machine learning models

You can train a machine learning model using a cnvrg experiment. cnvrg is completely code and package agnostic, so if you want to train an NLP Tensorflow model using Python, train a Recurrent Network based on Keras in R, or train your own custom modeling code, you can!

To train your model set your command to execute exactly as you would on your machine (for example, python3 train.py).

cnvrg's default Docker images support most standard use cases and are updated regularly. Learn more about using your own custom Docker image to support your specific use case here.

TIP

Don't forget to use grid searches and experiment visualization to make the most of training models within cnvrg experiments.

# Running Jupyter notebooks as experiments

You can run existing Jupyter notebooks as experiments by using the jupyter nbconvert command. The command automatically runs the code inside the notebook.

Use the following syntax:

jupyter nbconvert --to notebook --execute "notebook_path/some_notebook.ipynb"

NOTE

The output will be rendered inside an updated Juypter notebook file.

# Using Datasets

When starting an experiment you have the option of including one or more datasets. Choose the datasets you want to use when starting the experiment. You can also choose a specific query or commit of the dataset at this point.

When you choose to include a dataset with an experiment, the dataset will be mounted in the remote compute and accessible from the absolute path: /data/name_of_dataset/.

For example, if you include your dataset hotdogs, when running an experiment, the dataset can be found at /data/hotdogs/ and all of the dataset files will be found in that directory.

TIP

It can be useful to parameterize the dataset slug and make use of the hyperparameter parsing features of cnvrg. Doing so can help avoid hard-coding the dataset's slug and is helpful in case you want to run on a different dataset or multiple datasets.
More information about parameters is here.

# Tracking and Visualizing

cnvrg is designed to make tracking and visualizing your experiments as intuitive as possible. Using either allows you to sort, search and compare experiments easily, enabling deep comparison between experiments and models.

The cnvrg Research Assistant automatically extracts parameters, metadata and model performance, creating tags and beautiful charts from experiments. This happens automatically with no activation needed and the Research Assistant knows how to work with scikit-learn, Keras, TensorFlow, Caffe, and more!

You can customize the tracking of your experiments in two ways:

  • You can use print statements to Standard Output to track any other parameters of your choice, or create your own custom charts.
  • You can completely customize your tracking using the Python SDK.

# Log a metric or parameter as a tag


Example Metrics

You can take any piece of information and extract that as a metric. This means that it will become a tag for your experiment and can then be used for sorting, searching and comparing.

  • Log Parsing

    Print to Standard Output: "cnvrg_tag_key: value"

    For example, if you printed cnvrg_tag_precision: 0.9, the Research Assistant automatically transforms it to a tag named precision with a value of 0.9.

  • Python SDK

    Use the log_param method to track custom metrics and parameters.

    From cnvrg import Experiment
    e = Experiment()
    e.log_param("key", "value")
    

    For example, e.log_param("precision", "0.9") would create a tag named precision with value of 0.9.

# Create a line chart


Example Line Chart

You can create a single or multi-line line chart to track any variable as it changes throughout the running of your experiment. This can be really useful for tracking metrics like accuracy or loss as it develops in the creation of your model.

  • Log Parsing

    There are 2 formats for creating line charts. Using any of the print statements will either create a new graph or add to an already existing graph with the same name.

    1. Name, Value:

      Print to Standard Output: "cnvrg_linechart_Name value: Value"

      For example, if you printed "cnvrg_linechart_computingPie value: 3.14", you would create a graph titled "computingPie" (or add to the graph if it already exists), and add a data point with value "3.14".

    2. Name, Group, Value:

      Print to Standard Output: "cnvrg_linechart_Name group: "Group_name" value: Value"

      For example, if you printed "cnvrg_linechart_computingPie group: "pi_val" value: 3.14", you would create a graph titled "computingPie" (or add to the graph if it already exists), and add a data point with value "3.14" and a line labelled "pi_val". You can use this format to add multiple lines to one graph.

    TIP

    You can use this format to quickly create the correct format in Python using variables:

    cnvrg_linechart_{} value: '{}'\n".format(chart_name, value)
    
  • Python SDK

    Use the log_metric() method to create single line and multi-line charts.

    From cnvrg import Experiment
    e = Experiment()
    e.log_metric("chart_title", Ys=[val_1,val_2,val_3])
    

    For example, e.log_metric("accuracy", ys=[0.6,0.64,0.69]) would create a chart named "accuracy" with 3 data points: (0,0.6), (1,0.64) and (2,0.69).

    For more information on how to create line charts or multi-line charts with the Python SDK, consult the full SDK documentation.

# Create a confusion matrix/heatmap


Example heat map

You can create confusion matrices and heatmaps easily using the Python SDK.

  • Python SDK

    from cnvrg import Experiment
    from cnvrg.charts import MatrixHeatmap
    e = Experiment()
    e.log_chart(key="chart_key", title="chart_title", group=None, step=None, x_ticks=['x', 'y', 'z'], y_ticks=['a', 'b', 'c', 'd'],
        data=MatrixHeatmap(matrix=[(0,0,0), (9,9,9), (1,0,1), (1,1,1)],
                           color_stops=[[0,'#000000'],[1, '#7EB4EB']],
                           min_val=0,
                           max_val=10))
    

    Typing information: x_ticks and y_ticks must be a list and matrix is a list of tuples in struct (x,y,z). color_stops is optional and is a list of lists with size 2, where the nested first value is a float 0 <= X <= 1, and the second value is the hex value for the color to represent matrix values at that point of the scale. min_val and max_val are optional and should be numbers corresponding to the minimum and a maximum values for the key (scaling will be done automatically when these values are not submitted).

    Each struct corresponds to a row in the matrix and to a label from the y_ticks list. The matrix is built from the bottom up, with the first struct and y_tick at the bottom edge. Each value inside the struct corresponds to each x_tick.

    Steps and groups:

    Using steps and groups allow you to submit heatmaps across different steps and visualize it in a single chart with a slider to easily move between the steps. steps should be an integer and group. Multiple steps should be grouped with a single group.

    Animated Heatmap

    for i in range(10):
      e.log_chart(key="MyChart_" + str(i), group="group-1", step=i, title="MyChart_" + str(i),
      x_ticks=['x', 'y', 'z'], 
      y_ticks=['a', 'b', 'c', 'd'],
      data=MatrixHeatmap(matrix=[(0,0,0), (9,9,i), (i,0,1), (1,i,1)],
      min_val=0, max_val=10))
    

    TIP

    When using the group parameter, make sure the chart's key is unique across the different steps

    For more information on how to create confusion matrices with the Python SDK, consult the full SDK documentation.

# Create a bar graph


Example bar chart

You can create single bar and multi-bar graphs using the Python SDK.

  • Python SDK

    from cnvrg import Experiment
    from cnvrg.charts import Bar
    e = Experiment()
    x_value=["bar1","bar2","bar3","bar4","bar5"]
    y_value1=[1,2,3,4,5]
    y_value2=[5,4,3,2,1]    
    e.log_chart("chart_key", title="chart_title", group=None, step=None,
        data=[Bar(x=x_value, y=y_value1, name="y_value1", min_val=0, max_val=10)),
              Bar(x=x_value, y=y_value2, name="y_value2")])
    

    Typing information: x must be a List and y must be an Array, np.ndarry, pd.array or pd.series.

    The x list will populate the labels for the bars, and the corresponding y value will dictate the value of the bar for that category. The name of the y array will be the name of the set/category in the graph. min_val and max_val are optional for each Bar and should be numbers corresponding to the minimum and a maximum values for the key (scaling will be done automatically when these values are not submitted).

    Steps and groups:

    Using steps and groups allow you to submit bar charts across different steps and visualize it in a single chart with a slider to easily move between the steps. steps should be an integer and group. Multiple steps should be grouped with a single group.

    Animated Barchart

    for i in range(10):
      e.log_chart(key="MyChart" + str(i), group="group-1", step=i, title="MyChart",
          data=[Bar(x=["cat1", "cat2", "cat3"], y=[1**1, 2/(i+1), 3*i], name="Bar1"),
                Bar(x=["cat1", "cat2", "cat3"], y=[2**1, 3/(i+1), 4*i], name="Bar2")])
    

    TIP

    When using the group parameter, make sure the chart's key is unique across the different steps

    For more information on how to create bar graphs with the Python SDK, consult the full SDK documentation.

# Create a scatter plot


Example scatter plot

You can create scatter plots using the Python SDK.

  • Python SDK

    from cnvrg import Experiment
    from cnvrg.charts import Scatterplot
    e=Experiment()
    x1_values=[1,2,3,4,5]
    x2_values=[1,2,3,4,5]
    y1_values=[5,4,3,2,1]  
    y2_values=[1,2,3,4,5] 
    e.log_chart("chart_key", title="chart_title",
    data=[Scatterplot(x=x1_values, y=y1_values, name="name"),
          Scatterplot(x=x2_values, y=y2_values, name="name2")])
    

    Typing information: x and y must be an Array, np.ndarry, pd.array or pd.series. x is the list of x values and y is the list of y values.

    For more information on how to create scatter graphs with the Python SDK, consult the full SDK documentation.

# Change the title of an experiment


You can easily change the name of an experiment from within the experiment. You can include variables or any string in label for your experiment. One good example of how to use this tag is when running a grid search - you can use the parameters to name the experiment. Doing so allows you to easily mark what parameters were tested in the specific experiment.

  • Log Parsing

    Print to Standard Output: "cnvrg_experiment_title: experiment_title"

    For example, if you printed cnvrg_experiment_title: "New Name", the experiment will be renamed to "New Name"

  • Python SDK

    From cnvrg import Experiment
    e = Experiment()
    e.title = "experiment_title"
    

    For example, e.title = "New Name" would change the name of the experiment to "New Name"

# Hyperparameter Optimization and Grid Searches

A grid search is the process of choosing a set of parameters for a learning algorithm, usually with the goal of optimizing a measure of the algorithm's performance on an independent dataset.

With cnvrg you can fire up multiple experiments in parallel with a single command. This feature is very useful for grid searches and hyperparameter optimization.

With cnvrg you can fire up multiple experiments in parallel with a single command. This feature is very useful for running grid searches and hyperparameter optimization.

There are two ways of triggering grid searches in cnvrg:

  • Using the UI
  • Using the CLI and a YAML file

# Using the UI

To run a grid search using the UI, navigate to the Experiments tab of your project and click New Experiment.

On the next page you will have access to the Parameters menu. Click it to expand the panel. In this box you can add the parameters you would like to tune for.

Parameters UI

Add another parameter by clicking Add.

TIP

More info on using this menu can be found here.

When you click Run, cnvrg will automatically spin up an experiment for each permutation of the parameters you have included.

# Using the CLI

To run a grid search using the CLI you first need to create a yaml file that contains information about the parameters, and their possible values.

In the YAML file, each parameter has a param_name that should match the argument that is fed to the experiment.

To trigger the grid search, use this command:

cnvrg run --grid=src/hyper.yaml python myscript.py

Below you can see an example yaml file: hyper.yaml

parameters:
    - param_name: "learning_rate"
      type: "discrete"
      values: [0.1, 0.01 ,0.001]

    - param_name: "kernel"
      type: "categorical"
      values: ["linear", "rbf"]

    - param_name: "epochs"
      type: "integer"
      min: 10 # inclusive
      max: 200 # not inclusive
      scale: "linear"
      steps: 10 # The number of linear steps to produce.

    - param_name: "fractions"
      type: "float"
      min: 0.1 # inclusive
      max: 0.2 # not inclusive
      scale: "log2"
      steps: 10 # The number of linear steps to produce.

TIP

The scale can be linear, log2 or log10.

# Using this feature in your code

When you run a grid search, the parameters are included as flags on the command line argument that runs the code in the remote compute.

For example, for when tuning your epochs, the command executed on the remote compute is: python3 myscript.py --epochs 10

Then use an open source library to parse this input and use it as a variable in your code. We recomend:

  • argparse for Python scripts.
  • optparse for R code.

Here are example implementations of both:

Python Implementation Example

Below you can find a Python snippet that uses argparse library to parse input values from command line:

import argparse

parser = argparse.ArgumentParser(description='set input arguments')

parser.add_argument('--epochs', action="store", dest='epochs', type=int, default=10)
parser.add_argument('--learning_rate', action="store", dest='learning_rate', type=float, default=0.0001)

args = parser.parse_args()

epochs        = args.epochs
learning_rate = args.learning_rate

R Implementation Example

Below you can find an R snippet that uses optparse library to parse input values from command line:

 install.packages('optparse', dependencies = TRUE)


 library("optparse")

 option_list = list(
         make_option(c("--epochs"), type="integer",     default=10,help="number of epochs to perform",metavar="number"),
         make_option(c("--learning_rate"), type="double", default=0.001,help="learning rate values",metavar="number"));

 opt_parser = OptionParser(option_list=option_list);
 opt = parse_args(opt_parser);

 #sample printing value

 sprintf("Number of epochs is: ( %s )", opt$epochs)

# Experiment Artifacts

Many machine learning experiments, jobs or even notebook sessions are producing artifacts like: models, images, plots and more.

cnvrg makes it really easy for data scientists and developers to explore an experiment's artifacts, and have them versioned, stored and accessible via the job's artifacts section.

  • Artifacts are synced in one of the following ways:
    • on manual sync
    • autosync
    • when experiment finishes

To view experiment's artifacts, scroll down below the log, and you'll see the following section:

experiment artifacts

As you can see, in this case cnvrg periodically syncs checkpoints from the experiment and adds them to the experiment page with different commits.

This feature allows you to manage all your files easily as they are all associated with a specific experiment. The experiments code is also versioned and so is the dataset. This means that you can easily reproduce the dataset and code that produced the artifacts. Allowing you to always keep track of your work and mange your models effectively.

TIP

You can merge experiment artifact to the master branch by clicking Merge.

# Comparing and Organizing

Understanding your experiments both at-a-glance and in-depth is essential in any data science workflow. cnvrg has features that help you get insights into your experiments quickly and easily.

From the Experiments tab of your project, you can:

  • Perform quick operations on experiments.
  • Customize and sort the experiments table.
  • Search through your experiments.
  • Compare your experiments' cnvrg visualizations and metadata.
  • Compare experiments using TensorBoard.

# Perform quick operations on experiments

From the Experiments tab you can perform quick operations on all your experiments. You can do this on a single experiment or on a group of experiments.

# Operations on a single experiment

Single experiment menu

Each row of the experiment table has a Menu drop-down list on the right end of the entry.

From the drop-down list you can choose the following options:

  • (When running) Stop: Stops the experiment.
  • (When running) Sync: Syncs the current artifacts and status of the experiment with cnvrg.
  • (When running) Stop & Sync: Syncs the experiment then stops the experiment.
  • Export: Emails the experiment's metadata as a CSV.
  • Rerun: Pre-fills an experiment form with all the details from the chosen experiment.
  • Tag: Adds a key-value pair to the metadata of the experiment.
  • Delete: Deletes the experiment.

# Operations on multiple experiments

Multiple experiments menu

To complete actions on multiple experiments at once:

  1. Check the boxes on the left of the table for each experiment to include in the action.

    TIP

    You can use the search bar to filter your experiments and quickly find the experiments you want to select.

  2. Click the Actions drop-down list at the top of the experiments table.

    In the drop-down list you can choose the following options:

    • (When running) Stop: Stops the experiment(s).
    • (When running) Sync: Syncs the current artifacts and status of the experiment(s) with cnvrg.
    • Tag: Adds a key-value pair to the metadata of the experiment(s).
    • Export: Emails the experiments' metadata as a CSV.
    • Delete: Deletes the experiment(s).

TIP

You can also use this menu when only one experiment is selected.

# Customize and sort the experiment table

When you are in the Experiments tab of your project, there are a number of ways you can arrange your experiments table for your own convenience.

# Display certain columns

Choose columns

You can add or remove columns from your experiment by clicking the Table drop-down menu.

Inside the menu, you can click on the available columns to either add them to the table (checked) or remove them (unchecked).
The table will update immediately.

The list of available columns is populated by your experiment metadata. Any tag or parameter, as well as any of the tracked metadata (start time, commit, grid search id and so on) can be used as a column.

# Sort by a column

You can sort your experiments table according to any of the columns displayed.

Click the column title to sort by the values in the chosen column. Clicking once sorts in ascending order. Clicking twice sorts in descending order. CLicking three times will stop sorting by that column.

# Change the number of rows per page

You can change the number of viewable rows in the experiments table.

Click the drop-down list below the table. It shows the number of rows that are currently visible. In the drop-down list, you can select to view 10, 25, 50 or 100 experiments at once.

You can also view the number of experiments being shown, alongside the total number of experiments in the same place.

# Move between pages of the experiment table

On the bottom right of the table you can jump from page to page of the experiment table.

Either click the arrows to move one page in either direction or click the number of the specific page you are looking for.

# Search through your experiments

Search experiments

You can easily search through your experiments using the search bar at the top of the experiments table. Click it to start constructing your search filter.

A search filter is constructed from a category key, category relation, and value:

  • A category key can be any of your columns.
  • A category relation can be any of the operators: = (equals), > (greater than) or < (less than).
  • A value can be any string or integer.

You can also use a category connect to build more complex filters. A category connect can be either AND or OR.

Press Enter or click Search to submit your search criteria and load the matching experiments.

Click the Refresh button in the search bar to reset the experiment table and remove your filter.

For example: status = Success AND Accuracy > 0.85 would return only experiments that have a status of 'success' and an accuracy greater than 0.85.

TIP

cnvrg auto-suggests filter terms to help you construct your filter.

# Compare your experiments' cnvrg visualizations and metadata

Compare Experiments

cnvrg provides a solution to compare and analyze all experiments and models within a project. Harnessing the cnvrg Research Assistant and all the experiment visualization features of experiments (tags and SDK tracking), cnvrg can automatically generate comparisons for all your experiments and models.

To compare experiments:

  1. Navigate to the Experiments table of the desired project.
  2. Check the boxes on the left hand side of each experiment you wish to include in the comparison.
  3. Click Compare at the top of the experiments table.

A new page opens. The page displays all of your experiments metadata and charts alongside each other. Any graphs with matching keys from different experiments are merged into a single graph for easy comparison.

You can dynamically add additional charts using your metadata to the comparison view by clicking Add Chart.

All the comparison metadata can also be exported by clicking Download CSV.

You can remove experiments from the comparison along the top of the page. Click the X next to the experiment you wish to remove from the comparison. The page automatically updates, reflecting the change.

NOTE

Confusion matrices, scatter maps, and bar graphs are not currently included in this comparison.

# Compare using TensorBoard

TensorBoard- drop-down

You can also compare multiple experiments using TensorBoard. You can use this feature with any finished or currently running experiments. On the Experiments tab, use the TensorBoard drop-down menu to access currently running TensorBoard comparisons, stop them, or create new ones.

To create a new TensorBoard comparison, click + Start New TensorBoard session. In the popup, choose your experiments, compute, and refresh frequency. Refresh frequency will determine the frequency to pull new logs from running experiments. If the experiments are not in an ongoing state cnvrg will fetch their end commit.

TIP

You can also mix and match - compare ongoing and completed experiments on the same TensorBoard dashboard.

Finally click Start Session and cnvrg starts running the TensorBoard comparison.

TensorBoard-create

TIP

You can also preselect the experiments you want to compare in the Experiments table and they will automatically populate the experiments field when creating the comparison.

When your TensorBoard comparison is running, you can use the menu in the top bar to:

  • Stop the TensorBoard.
  • Check which experiments are being compared.
  • Share the comparison.
  • Display TensorBoard in full-screen mode.

TensorBoard-menu

NOTE

The TensorBoard comparison feature is currently only supported when running on Kubernetes.

# Experiment Error Codes

cnvrg will help you run your experiments quickly and easily, however, as with all code, your experiment might fail due to one of many possible errors. If an error is encountered, you can use the debug mode to quickly fix issues and restart an experiment. Alternatively, you can fix your code however you desire, sync the updates to cnvrg or push them to your git and then rerun the experiment.

Whenever an experiment fails, there should be an error code that is returned by the command that was run. This can provide important information to debug your code. Here is a breakdown of many of the typical error codes you may experience:

Error Code Explanation
1 General errors in your code
2 Misused shell builtin
126 A command was invoked that could not run
127 A command was invoked that could not be found
128 An invalid argument was called to exit
139 Invalid memory was called

# Experiment Debugger

Debug Experiment

Sometimes things go wrong and experiments fail due to errors in the code, errors in the environment, or just because the configuration is incorrect. cnvrg helps you resolve these situations quickly without needing to rerun the experiment from scratch (saving time and resources).

Once an experiment experiences an error, cnvrg will send you notification via Slack or email that the experiment has failed and notify you that it has entered Debug mode. The experiment remains live for debugging purposes. When you first access the experiment's page, a notification will appear and you will be presented with the option to Add 15 minutes or to not add time by clicking That's ok, I'm done.

By default, the experiment will remain running for 30 minutes, however, you can configure this default value in the Organization settings. The countdown will display the time remaining. When the duration expires, the experiment will be stopped. You can add 15 minutes to the timer by clicking Add 15 minutes button.

The experiment, compute and environment can be accessed via a terminal session on the experiment page. Use the terminal to debug your experiment and fix the error.

When you have finished using the terminal to rectify issues in your code and the environment, you can click the Rerun button to start the experiment from the beginning without restarting the compute. When clicking rerun you can also enable the following:

  • Sync my experiment before: Sync changes to the code back to cnvrg before rerunning.
  • Rerun with prerun script: Re-execute the prerun.sh script before running the command.
  • Rerun with requirements file: Reinstall the packages from the requirements.txt file before running the command. Click Yes, Rerun to confirm your choices and restart the experiment.

TIP

If Slack integration is enabled, you will be notified in real-time when your experiment enters debug mode. Information of how to set this up can be found here.

Debug Experiment

# TensorBoard and Terminal

In all live running experiments, cnvrg allows you to access and interact with the experiment using a fully secured terminal or track live metrics and debug your models using TensorBoard.

  • TensorBoard TensorBoard is an open-source application that makes it easier to understand, debug, and optimize TensorFlow programs.

    To access the TensorBoard of an Experiment, click Menu at the top right of the Experiments page and select Open TensorBoard. A new page with the TensorBoard opens.
    To stop the TensorBoard from running, click Menu at the top right of the Experiments page and select Stop TensorBoard from the dropdown menu.
    If you have stopped the TensorBoard, you can start it again from Menu > Start TensorBoard.

TIP

In the Project's settings you can change the default behavior for whether or not the TensorBoard runs for each experiments. If it is set to off, you can still start it from the Experiment's menu while it runs.

  • Terminal To access the Terminal of the machine/instance/pod that the Experiment is running on, click Menu at the top right of the Experiments page and select Open Terminal. A new page with the Terminals opens.

TensorBoard / Terminal

# The Files in an Experiment

Similar to the execution of any other machine learning workload in cnvrg, when an experiment is run, it constructs its environment as follows:

  1. Pull and execute the chosen container.
  2. Clone the chosen git branch and commit (if project connected to git) into /cnvrg.
  3. Clone the latest version of the files from your projects Files tab (or the chosen cnvrg.io commit) into /cnvrg.
  4. Clone or attach the chosen datasets into /data.
  5. Install packages from the requirements.txt file (if it exists).
  6. Execute the prerun.sh script (if it exists).

When these steps have completed, the environment will be fully constructed and the command will be executed.

For more information see Environment.

# Rerun Experiments

Reproducible code and environments are key elements of cnvrg.

To quickly and easily rerun an experiment using all the same settings (command, compute, datasets, docker image and so on), select an experiment, click Menu and then select Rerun from the dropdown menu.

You will be taken to the run page and all of the details will be pre-selected. You can then check and change anything before finally running this new instance of the experiment.

# More Info

# Export your experiments table

In order to export an experiment table, you must first navigate over to the Experiments tab.

From there you have three options:

  1. Export all from the Actions drop-down menu.

  1. Select a few experiments and from the Actions drop-down menu, you can export only those.
  2. To export a single Experiment, click Menu on the right-hand side of its row and click Export

# View experiments using the CLI

You can also use the CLI to interact with your experiments, including viewing all your experiments and deep-diving into one of them.

To view all experiments, in your project directory run:

cnvrg experiments

To view a specific experiment and check its status and metadata, run:

cnvrg experiments --id ID_OF_EXPERIMENT # e.g Mja3xxhDMVi7AMVEexNs

# Experiment status

An experiment can have one of the following states

  • Pending
  • Initializing
  • Running
  • Success
  • Aborted
  • Error
  • Debug

# Slack integration

You can set up a Slack channel to receive notifications on the status changes for any experiment. The guide for setting up the integration is here.

Last Updated: 6/21/2020, 10:26:02 AM