# cnvrg SDK V2
# Getting Started
The cnvrg SDK was developed in Python and is designed to help data scientists to interact with cnvrg from their code, experiments and models. Through the SDK, you can create experiments, manage models, automate your machine learning pipeline and more.
The topics in this page:
- Prerequisites
- Download and Install the cnvrg SDK
- Authenticating the cnvrg SDK
- User Operations
- Resource Operations
- Project Operations
- Templates Operations
- Workspaces operations:
- Experiment Operations
- Create a new remote Experiment
- Initalize an empty experiment
- Create a local experiment
- Experiment slug
- Get an existing experiment
- Delete Experiment
- Stop a running Experiment
- Get Experiment's system utilization
- Track an Experiment manually
- Examples
- Metadata operations on experiments
- Create a Tag
- Charts
- Upload artifacts
- Download the Experiment's artifacts
- Flow Operations
- Endpoint Operations
- Webapps Operations
- Dataset Operations
# Prerequisites
In order to run the pip commands, Python (version 3.6 or later) should be installed on the system.
# Download and Install the cnvrg SDK
To install, open up your terminal/command prompt and run the following command:
pip3 install cnvrgv2
# Install options
When on self-hosted cnvrg environemnt, you can specify an option for cnvrgv2, to fit the object storage you intend to work with in your cnvrg environment.
For Metacloud, we'll use the default installation without any option.
Add the options to the install command as needed, you can add multiple options by separating with comma:
pip install "cnvrgv2[options]"
available options are:
azure
- Install packages relevant for Azure storage clientgoogle
- Install packages relevant for GCP storage clientpython3.6
- Install specific dependencies for python version 3.6
# SDK Operations
# Authenticating the cnvrg SDK
# Inside a cnvrg job scope
The cnvrg SDK will already be initialized and authenticated with cnvrg using the account that is logged in. You can start using cnvrg SDK functions immediately by running:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
# Authenticate using a local configuration file
You can authenticate to the cnvrg SDK by creating a configuration file in your working directory
- In your working directory create a directory called
.cnvrg
You can create it using the following command:mkdir .cnvrg
- Inside the directory .cnvrg create a configuration file named
cnvrg.config
- Edit the file and insert the following:
check_certificate: <false/true> domain: <cnvrg_full_domain> keep_duration_days: null organization: <organiztion_name> token: <user_access_token> user: <user_email> version: null
- Once you finish editing, save the file Now you can simply run the following in your code and it will log you in automatically:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
# Authenticate using environment variables
You can authenticate to the cnvrg SDK by setting the following environment variable:
CNVRG_JWT_TOKEN
: Your API token ( You can find it in your user settings page)CNVRG_URL
: Your cnvrg url that you use to view cnvrg through the browser, for example: https://app.prod.cnvrg.ioCNVRG_USER
: Your email that you use to log in to cnvrgCNVRG_ORGANIZATION
: The organization name you use Once you set those environment variables, you can simply run the following and it will log you in automatically:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
You can also pass the credentials as parameters:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg(domain="https://app.cnvrg.io",
email="Johndoe@acme.com",
password="123123",
)
If you are on cnvrg metacloud environment you need to use your API KEY that can be found in your Account
page
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg(domain="https://app.domain.metacloud.cnvrg.io",
email="Johndoe@acme.com",
token="YOUR API KEY")
NOTE
As a security measure, please do not put your credentials into your code.
NOTE
The following documentation assume you have successfully logged in to the SDK and loaded the cnvrg object
# User Operations
# Get the logged in user object
To get the logged in user object you can simply run:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
user = cnvrg.me()
Once you have the user object you can get the user fields like: email, username, organizations, git_access_token, name, time_zone For example:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
user = cnvrg.me()
email = user.email
# Set the default organization
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
cnvrg.set_organization("my-org")
# Resource Operations
# Connect your existing Kubernetes cluster
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
mycluster = cnvrg.clusters.create(resource_name="kubernetes_cluster",
kube_config_yaml_path="kube_config.yaml",
domain="https://app.cnvrg.io")
List of optional parameters:
Parameter | type | description | required | default |
---|---|---|---|---|
scheduler | string | supported schedulers to deploy cnvrg jobs | No | cnvrg_scheduler |
namespace | string | the namespace to use inside the cluster | No | cnvrg |
https_scheme | bool | resource supports HTTP/S urls when accessing jobs from the browser | No | False |
persistent_volumes | bool | resource can dynamically create PVCs when running jobs | No | False |
gaudi_enabled | bool | the cluster support HPU devices | No | False |
# Create managed EKS cluster
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
mycluster = cnvrg.clusters.create(build_yaml_path="aws.yaml", provider_name="aws")
List of optional parameters:
Parameter | type | description | required | default |
---|---|---|---|---|
network | string | if left blank cnvrg will automatically provision the network for your cluster | No | istio |
Yaml example:
name: mycluster
version: '1.21'
roleARN: arn:aws:iam::123456789101:role/cnvrg_role
region: us-west-2
vpc: null
publicSubnets:
- ''
privateSubnets:
- ''
securityGroup: ''
nodeGroups:
- availabilityZones:
- us-west-2a
- us-west-2b
- us-west-2d
- us-west-2c
autoScaling: false
instanceType: m5.metal
desiredCapacity: 2
minSize: 0
maxSize: 2
spotInstances: false
volumeSize: 100
privateNetwork: true
securityGroups:
- ''
tags:
- key: ''
value: ''
taints:
- key: ''
value: ''
labels:
- key: ''
value: ''
attachPolicies:
- ''
addonPolicies:
- key: ''
value: ''
# Create partner resource
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
mycluster = cnvrg.clusters.create(resource_name="mypartner", provider_name="aibuilders")
# Get an existing resource
You can get the resource object by using the resource's slug:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
mycluster = cnvrg.clusters.get(slug="cluster-slug")
You can also get all of the resources in the organization:
clusters = [c for c in cnvrg.clusters.list()]
# Update an existing resource
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
mycluster = cnvrg.clusters.get(slug="cluster-slug")
mycluster.update(resource_name="new-name")
List of optional parameters:
Parameter | type | description | required |
---|---|---|---|
scheduler | string | supported schedulers to deploy cnvrg jobs | No |
namespace | string | the namespace to use inside the cluster | No |
https_scheme | bool | resource supports HTTP/S urls when accessing jobs from the browser | No |
persistent_volumes | bool | resource can dynamically create PVCs when running jobs | No |
gaudi_enabled | bool | the cluster support HPU devices | No |
# Delete a resource
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
mycluster = cnvrg.clusters.delete(slug="cluster-slug")
or
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
mycluster = cnvrg.clusters.get(slug="cluster-slug")
mycluster.delete()
# Project Operations
# Create a new project
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.create("myproject")
# Get the project's object:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
Once you have the project object you can get the project fields like: title, slug, git_url, git_branch, last_commit For example:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
title = myproj.title
NOTE
You can also reference the current project from within a job scope:
from cnvrgv2 import Project
ws = Project()
# List all the projects in the organization:
- List all projects that the current user is allowed to view
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.list()
To order the projects list by created_at run the following:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.list(sort="-created_at")
TIP
sort the list by: -key
-> DESC
| key
-> ASC
# Delete a project:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
myproj.delete()
# Project File Operations
# Upload files to a project
myproj.put_files(paths=['/files_dir/file1.txt', '/files_dir/file2.txt'],
pattern='*')
available Parameters:
Parameter | type | description | required | default |
---|---|---|---|---|
paths | List | The list of file paths that will be uploaded to the project | Yes | |
pattern | string | String defining the filename pattern | No | "*" |
message | string | The commit message | No | "" |
override | bool | Whether or not to re-upload even if the file already exists | No | False |
force | bool | Should the new commit copy files from its parent | No | False |
NOTE
If a folder is given all the relevant files in that folder (that answers to the regex pattern) will be uploaded.
# Remove files from a project
You can remove files from the project:
myproj.remove_files(paths='*',
message='This will delete everything!')
NOTE
When deleting files from a project paths
parameter can be both a list of file paths or a string pattern like '*'
# List the project's content
You can list all of the files and folders that are in the project:
myproj.list_files()
myproj.list_folders(commit_sha1='xxxxxxxxx')
Available Parameters
Parameter | type | description | required | default |
---|---|---|---|---|
commit_sha1 | string | Sha1 string of the commit to list the files from | No | None |
query | string | Query slug to list files from | No | None |
query_raw | string | Raw query to list files according to Query language syntax | No | None |
sort | string | Key to sort the list by (-key -> DESC / key -> ASC) | No | "-id" |
# Clone the project to the current working directory
myproj.clone()
# Download the project files
myproj.download(commit_sha1='xxxxxxxxx')
WARNING
The Project must be cloned first
# Sync the local project with the remote one
myproj.sync_local()
# Update the project's settings
you can change any of the project's settings by passing them as keyword arguments
myproj.settings.update(title='NewProjectTitle',
privacy='private')
Available Settings:
Parameter | type | description |
---|---|---|
title | string | The name of the project |
default_image | string | The name of the image to set to be the project's default image |
default_computes | List | The list of the project's default compute template names |
privacy | string | The project's privacy set to either 'private' or 'public' |
mount_folders | List | Paths to be mounted to the docker container |
env_variables | List | KEY=VALUE pairs to be exported as environment variables to each job |
check_stuckiness | bool | Whether to stop or restart experiments that have not printed new logs and have resource utilization below 20% |
max_restarts | int | When "check_stuckiness" is True this sets how many times to repeatedly restart a single experiment each time it idles |
stuck_time | int | The duration (in minutes) that an experiment must be idle for before it is stopped or restarted |
autosync | bool | Whether or not to preform periodic automatic sync |
sync_time | int | The interval (in minutes) between each automatic sync of jobs |
collaborators | List | The list of users that are collaborators on the project |
command_to_execute | string | The project's default command to execute when starting a new job |
run_tensorboard_by_default | bool | Whether or not to run Tensorboard by default with each launched experiment |
run_jupyter_by_default | bool | Whether or not to run Jupyter by default with each launched experiment |
requirements_path | string | The default path to the requirements.txt file that will run with every job |
is_git | bool | Whether if the project is linked to a git repo or not |
git_repo | string | The address of the git repo |
git_branch | string | The default branch |
private_repo | bool | Whether the repo is private or not |
output_dir | string | The default path for jobs output directory |
email_on_success | bool | If email should be sent when the experiment finishes successfully |
email_on_error | bool | If email should be sent when the experiment finishes with an error |
# Setup Git Integrations in project settings
For a public git repository
myproj.settings.update(is_git=True, git_repo="MyGitRepo", git_branch="MyBranch")
For a private git repository using Oauth Token First make sure that the git Oauth Token is saved in your profile and then run
myproj.settings.update(is_git=True, git_repo="PrivateGitRepo", git_branch="MyBranch", private_repo=True)
To disable git integrations
myproj.settings.update(is_git=False)
# Templates Operations
# Create a new template
# Get an existing template
You can get the template object by using the template's slug:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
cluster = cnvrg.clusters.get("cluster_slug")
template = cluster.templates.get("template_slug")
# List all existing templates
List all templates that the current user is allowed to view
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
cluster = cnvrg.clusters.get("cluster_slug")
templates = cluster.templates.list()
for template in templates:
print("Template Details: title: {} , slug: {} , cpu: {} , memory: {} "
.format(template.title, template.slug, template.cpu, template.memory))
# Update an existing template
You can update the existing template attributes:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
cluster = cnvrg.clusters.get("cluster_slug")
template = cluster.templates.get("template_slug")
template.update(title="new title",cpu=3)
# Delete an existing template
You can delete an existing template:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
cluster = cnvrg.clusters.get("cluster_slug")
template = cluster.templates.get("template_slug")
template.delete()
# Workspaces operations:
# Create a new workspace and run it:
from cnvrgv2 import Cnvrg
from cnvrgv2.modules.workflows.workspace.workspace import NotebookType
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ws = myproj.workspaces.create(title="My Workspace",
templates=["small","medium"],
notebook_type=NotebookType.JUPYTER_LAB)
If no parameters are provided, then the default values are used to further customize the created workspace you pass use the following parameters
Parameter | type | description | required | default |
---|---|---|---|---|
title | string | The name of the workspace | No | None |
templates | list | A list containing the names of the desired compute templates | No | None |
notebook_type | string | The notebook type (currently available: "jupyterlab", "r_studio", "vscode") | No | NotebookType.JUPYTER_LAB |
volume | Volume | The volume that will be attached to the workspace | No | None |
datasets | list | A list of datasets to be connected and used in the workspace | No | None |
Image | Image | The image to be used for the workspace environment | No | default organization image |
# Fetch the workspace object
Once the workspace is created you can fetch it by its slug:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ws = myproj.workspaces.get("workspace-slug")
NOTE
You can also reference the current running workspace from within its job scope:
from cnvrgv2 import Workspace
ws = Workspace()
# Access workspace attributes
You can access the workspace attributes by using regular dot notation:
ws_slug = ws.slug
ws_title = ws.title
ws_datasets = ws.datasets
ws_notebook = ws.notebook_type
# Sync the workspace
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ws = myproj.workspaces.get("workspace-slug")
ws.sync()
sync multiple by providing a list containing their slug ids
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
myproj.workspaces.sync(["workspace-slug"])
# Stop a running workspace
Stop a running workspace and sync it (the default is sync=False
):
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
ws = myproj.workspaces.get("workspace-slug")
ws.stop(sync=True)
Stop multiple workspaces at once:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
myproj.workspaces.stop(["workspace-slug"],sync=True)
# Start a stopped workspace
Start a stopped workspace:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ws = myproj.workspaces.get("workspace-slug")
ws.start()
# List all of the workspaces
You can list all the workspaces in the current project, as well as sort them by a key in ASC or DESC order:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
exps = myproj.workspaces.list(sort="-created_at")
TIP
sort the list by: -key
-> DESC
| key
-> ASC
# Delete workspaces
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ws = myproj.workspaces.get("workspace-slug")
ws.delete()
Delete multiple workspaces by listing their slugs:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
myproj.workspaces.delete(['workspace-slug1','workspace-slug2'])
# Operate a Tensorboard
Start a Tensorboard session for an ongoing workspace
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ws = myproj.workspaces.get("workspace-slug")
ws.start_tensorboard()
Get the Tensorboard url:
ws.tensorboard_url
To stop the Tensorboard session:
ws.stop_tensorboard()
#
# Experiment Operations
# Create a new remote Experiment
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
e = myproj.experiments.create(title="my new exp",
template_names=["medium", "small"],
command="python3 test.py")
List of optional parameters:
Parameter | type | description | required | default |
---|---|---|---|---|
title | string | The name of the experiment | No | None |
templates | List | list of the compute templates to be used in the experiment (if the cluster will not be able to allocate the first template, then it will try the one after and so on..) | No | None |
local | bool | whether or not to run the experiment locally (default: local=False ) | ||
command | string | the starting command for the experiment (example: command='python3 train.py' ) | No | False |
datasets | List[Dataset] | A list of dataset objects to use in the experiment | No | None |
volume | Volume | Volume to be attached to this experiment | No | None |
sync_before | bool | Wheter or not to sync the environment before running the experiment | No | True |
sync_after | bool | Wheter or not to sync the environment after the experiment has finished | No | True |
image | object | The image to run on (example: image=cnvrg.images.get(name="cnvrg", tag="v5.0") | No | project's default image |
git_branch | string | The branch to pull files from for the experiment, in case project is git project | No | None |
git_commit | string | The specific commit to pull files from for the experiment, in case project is git project | No | None |
# Initalize an empty experiment
You may create an empty experiment that will not be run automatically (by default: local=True, sync_after=False, sync_before=False
):
e = myproj.experiments.init(title="my new exp")
# Create a local experiment
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
e = myproj.experiments.create(title="my new exp",
local=True
command="python3 test.py",
local_arguments={epochs:20,batch_size:12}))
Parameter | type | description | required | default |
---|---|---|---|---|
local_arguments | dict | If local experiment and command is a function, local_arguments is a dictionary of the arguments to pass to the experiment's function | No | None |
# Experiment slug
In many commands, you will need to use an experiment slug. The experiment slug can be found in the URL for the experiment.
For example, if you have an experiment that lives at: https://app.cnvrg.io/my_org/projects/my_project/experiments/kxdjsuvfdcpqkjma5ppq
, the experiment slug is kxdjsuvfdcpqkjma5ppq
.
# Get an existing experiment
You can get the experiment object by using the experiment's slug
:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
e = myproj.experiments.get("exp-slug")
You can also get all of the experiments in the project:
experiments = [e for e in myproj.experiments.list()]
NOTE
You can also reference the current running experiment from within its job scope:
from cnvrgv2 import Experiment
ws = Experiment()
# Delete Experiment
You can delete a Experiment from a project by its slug
value
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
e = myproj.experiments.get("experiment-slug")
e.delete()
# Do bulk delete on multiple experiments
myproj.experiments.delete(['experiment-slug1','experiment-slug2'])
# Stop a running Experiment
Stop a running Experiment by passing its slug
value (also All Experiment must be running)
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
e = myproj.experiments.get("experiment-slug")
e.stop()
# Do bulk stop on multiple experiments
myproj.experiments.stop(['experiment-slug1','experiment-slug2'])
Parameter | type | description | required | default |
---|---|---|---|---|
sync | bool | sync the experiment's data or not | No | True |
# Get Experiment's system utilization
You can access the experiment's system resources usage data. For example, let's get the 5 last records for memomry utilization percentage:
>>> utilization = e.get_utilization()
>>> utilization.attributes['memory']['series'][0]['data'][-5:]
[[1626601529000, 7.7], [1626601559000, 19.85], [1626601589000, 48.05], [1626601620000, 49.26]]
NOTE
The data syntax is [unix_timestamp, metric]
# Track an Experiment manually
You can initialize an empty Experiment in Cnvrg:
cnvrg = Cnvrg()
proj = cnvrg.projects.get('my-project')
e = proj.experiments.init(title='my-exp')
Now that the Experiment is initialized, its status is ONGOING
and you can preform operations from within your code like with regular Cnvrg Experiments in order to track
If you have initialized an Experiment object, you should conclude the experiment with the e.finish()
command.
To conclude an experiment object:
exit_status = 0
e.finish(exit_status=exit_status)
NOTE
0
is success, -1
is aborted, 1
and higher is error
# Examples
# Metadata operations on experiments
You can create your own logs in an experiment (timestamp default is utcnow()
):
from datetime import datetime
e.log("my first log", timestamp=datetime.now())
e.log(["my first log","my second log"])
Get the experiment's last 40 logs:
logs = e.logs()
# Create a Tag
e.log_param("key","value")
# Charts
You can create various charts by using the sdk, for example create a linechart showing the experiments loss:
from cnvrgv2 import Cnvrg, LineChart
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
e = myproj.experiments.get("exp-slug")
loss_vals = []
# experiment loop:
for epoch in range(8):
loss_vals.append(loss_func())
# attach the chart to the experiment
loss_chart = LineChart('loss')
loss_chart.add_series(loss_vals, 's1')
e.log_metric(loss_chart)
WARNING
chart_name can't include "/"
You will immediately see the chart on the experiment's page:
You can create all different types of charts:
# Heatmap:
In case of Heatmap, list of tuples that form a matrix. e.g, 2x2 matrix: [(0.5,1),(1,1)]
from cnvrgv2 import Heatmap
heatmap_chart = Heatmap('heatmap_example',
x_ticks=['x', 'y'], y_ticks=['a', 'b'],
colors=[[0,'#000000'],[1, '#7EB4EB']],
min=0,
max=10)
heatmap_chart.add_series('s1', [(0.5,1),(1,1)])
e.create_chart(heatmap_chart)
Typing information: x_ticks
and y_ticks
must be a List
and matrix
is a list of tuples in struct (x,y,z)
. color_stops
is optional and is a List
of Lists
with size 2, where the nested first value is a float 0 <= X <= 1, and the second value is the hex value for the color to represent matrix values at that point of the scale. min
and max
are optional and should be numbers corresponding to the minimum and a maximum values for the key (scaling will be done automatically when these values are not submitted).
Each struct corresponds to a row in the matrix and to a label from the y_ticks
list. The matrix is built from the bottom up, with the first struct and y_tick at the bottom edge. Each value inside the struct corresponds to each x_tick.
Using steps and groups allow you to submit the same heatmap across different steps and visualize it in a single chart with a slider to easily switch between the charts. steps
should be an integer and group
should be a string.
Steps and groups:
Using steps and groups allow you to submit heatmaps across different steps and visualize it in a single chart with a slider to easily move between the steps. steps
should be an integer and group
. Multiple steps
should be grouped with a single group
.
# Bar chart:
Single bar:
from cnvrgv2 import BarChart bar_chart = BarChart('bar_example', x_ticks=['bar1', 'bar2']) bar_chart.add_series('s1', [1, 2]) e.create_chart(bar_chart)
Multiple bars:
from cnvrgv2 import BarChart bar_chart = BarChart('bar_example', x_ticks=['bar1', 'bar2']) bar_chart.add_series('s1', [1, 2, 3]) bar_chart.add_series('s2', [3, 4]) e.create_chart(bar_chart)
The x_ticks
list will populate the labels for the bars, and the corresponding series values will dictate the value of the bar for that category. min
and max
are optional and are numbers that correspond the lower and upper bounds for the y values. Optionally, you can set each bar to be a specific color using the colors
list of hex values, with each hex value corresponding to each x value.
Steps and groups:
Using steps and groups allow you to submit bar charts across different steps and visualize it in a single chart with a slider to easily move between the steps. steps
should be an integer and group
. Multiple steps
should be grouped with a single group
.
# Scatter Plot:
You can pass a list of tuple pairs representing points on the axis
- Single set of points:
from cnvrgv2 import ScatterPlot points_list = [(1,1),(2,2), (3,3), (4,4)] scatter_chart = ScatterPlot('scatter_example') scatter_chart.add_series('s1', points_list)
- Multiple sets of points:
from cnvrgv2 import ScatterPlot points_list = [(1,1),(2,2), (3,3), (4,4)] scatter_chart = ScatterPlot('scatter_example') scatter_chart.add_series('s1', points_list) scatter_chart.add_series('s2', points_list[::-1]) # Reversed version of the list
# Upload artifacts
You can add local files to the Experiments artifacts and create a new commit for it:
paths = ['output/model.h5']
e.log_artifacts(paths=paths)
Parameter | Type | Description |
---|---|---|
paths | list | List of paths of artifacts to save |
NOTE
Log images with log_images(file_paths=[<images_paths>])
# Download the Experiment's artifacts
You can download the artifacts to your local working directory
e.pull_artifacts(wait_until_success=True, poll_interval=5)
Parameter | Type | Description |
---|---|---|
wait_until_success | bool | Wait until current experiment is done before pulling artifacts |
poll_interval | int | If wait_until_success is True, poll_interval represents the time between status poll loops in seconds |
#
# Flow Operations
Flows can be created and run from any environment using the SDK. Creating flows requires using a flow configuration YAML file.
# Create a Flow
You can use a flow YAML to create a flow inside a project. You can use either the absolute path to a YAML file or include the YAML content directly. Use the Flow.create
command:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
flow = proj.flows.create(yaml_path='YAML_PATH')
Parameter | type | description | required | default |
---|---|---|---|---|
yaml_path | path | A path to the YAML configuration file. | No | None |
# Example YAML:
---
flow: Flow Example
recurring:
tasks:
- title: Training Task
type: exec
input: python3 train.py
computes:
- medium
image: cnvrg:v5.0
relations: []
# Access Flow attributes
You can access the Flow's attributes by using regular dot notation:
Example:
>>> flow.title
'Training Task'
# Flow Attributes:
Parameter | type | description |
---|---|---|
title | string | The name of the Flow |
slug | string | The flow slug value |
created_at | datetime | The time that the Flow was created |
updated_at | datetime | The time that the Flow was last updated |
cron_syntax | string | The schedule Cron expression string (If the Flow was scheduled) |
webhook_url | string | |
trigger_dataset | string | A dataset that with every change will trigger this Flow |
# Flow slug
In some commands, you will need to use an Flow slug. The Flow slug can be found in the Flow page URL.
For example, if you have an Flow that lives at: https://app.cnvrg.io/my_org/projects/my_project/flows/iakzsmftgewhpxx9pqfo
, the Flow slug is iakzsmftgewhpxx9pqfo
.
# Get a Flow
Get an existing Flow by passing its slug
value or title
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
flow = proj.flows.get("slug/title")
# List Flows
You can list all existing flows:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
flows = proj.flows.list()
for flow in flows:
print(flow.title)
# Run a Flow
To run The Flow's latest version:
flow.run()
# Update Flow
You can update the existing Flow's attributes:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
flow = proj.flows.get("slug/title")
flow.update(title="My Updated Flow")
# Delete Flow
You can delete an existing Flow:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
flow = proj.flows.get("slug/title")
flow.delete()
Or multiple Flows at once by listing all of the Flows slug
values:
proj.flows.delete(["FLOW1_SLUG", "FLOW2_SLUG"])
# Schedule a Flow
You can make the Flow run on schedule by using Cron expression syntax.
# Set a new schedule:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
flow = proj.flows.get("slug/title")
flow.set_schedule("* * * * *") # Run every minute
Disable it with:
flow.clear_schedule()
# Trigger webhook
You can create a webhook that will trigger the Flow run.
Toggle it on/off by setting the toggle
parameter to True
or False
:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
flow = proj.flows.get("slug/title")
flow.toggle_webhook(True)
Get the webhook url:
flow.webhook_url
NOTE
If you just toggled the webhook use flow.reload()
before fetching the webhook_url
# Toggle dataset update trigger
You can toggle the option to trigger on dataset update on/off, by setting the toggle
parameter to True
or False
:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
ds = cnvrg.datasets.get("myds")
proj = cnvrg.projects.get("myproject")
flow = proj.flows.get("slug/title")
flow.toggle_dataset_update(True)
# Flow versions
Every Flow have multiple versions and you can access them:
List all the flow versions:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
flow = proj.flows.get("slug/title")
flow_versions = flow.flow_versions.list()
for fv in flow_versions:
print(fv.title)
Get a specific flow version object by slug or title:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
flow = proj.flows.get("slug/title")
flow_version = flow.flow_versions.get("Version 1")
Get info of a flow version status:
info = flow_version.info()
Stop a running Flow version:
flow_version.stop()
#
# Endpoint Operations
# Create Endpoint
from cnvrgv2 import Cnvrg
from cnvrgv2 import EndpointKind, EndpointEnvSetup
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
ep = proj.endpoints.create(title="myendpoint",
templates=["small","medium"],
kind=EndpointKind.WEB_SERVICE,
file_name="predict.py",
function_name="predict",
env_setup=EndpointEnvSetup.PYTHON3,
kafka_brokers=None,
kafka_input_topics=None,
*args,
**kwargs)
You can use the following parameters to build your Endpoint:
Parameter | type | description | required | default |
---|---|---|---|---|
title | string | Name of the Endpoint | Yes | |
kind | int | The kind of endpoint to deploy (example: EndpointKind.WEB_SERVICE , options: [WEB_SERVICE, STREAM, BATCH] ) | No | EndpointKind.WEB_SERVICE |
templates | List | List of template names to be used | No | None |
image | Image | Image object to create endpoint with | No | organization default image |
file_name | string | The file containing the endpoint's functions | Yes | |
function_name | string | The name of the function the endpoint will route to | Yes | |
env_setup | string | The interpreter to use (example: EndpointEnvSetup.PYTHON3 , options: [PYTHON2, PYTHON3, PYSPARK, RENDPOINT] ) | No | None |
kafka_brokers | List | List of kafka brokers | No | None |
kafka_input_topics | List | List of topics to register as input | No | None |
queue | List | Name of the queue to run this job on | No | None |
kafka_output_topics | List | List of topics to register as input | No | None |
# Endpoint slug
In many commands, you will need to use an endpoint slug. The endpoint slug can be found in the URL for the endpoint.
For example, if you have an endpoint that lives at: https://app.cnvrg.io/my_org/projects/my_project/endpoints/show/j46mbomoyyqj4xx5f53f
, the endpoint slug is j46mbomoyyqj4xx5f53f
.
# Get Endpoint object
You can get Endpoints by passing their slug
value:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
proj = cnvrg.projects.get("myproject")
ep = proj.endpoints.get('slug')
NOTE
You can also reference the current running endpoint from within its job scope:
from cnvrgv2 import Endpoint
ws = Endpoint()
# List Endpoints
ep_list = proj.endpoints.list(sort='-created_at') # Descending order
TIP
sort the list by: -key
-> DESC
| key
-> ASC
# Stop running Endpoints
Stop a running Endpoint by passing its slug
value (sync=False
by default, also All Endpoints must be running)
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ep = myproj.endpoints.get("endpoint-slug")
ep.stop(sync=False)
# Do bulk stop on multiple endpoints
myproj.endpoints.stop(['endpoint-slug1','endpoint-slug2'])
# Start a stopped Endpoint
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ep = myproj.endpoints.get("endpoint-slug")
ep.start()
# Delete Endpoints
You can delete a Endpoint from a project by its slug
value
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ep = myproj.endpoints.get("endpoint-slug")
ep.delete()
# Do bulk delete on multiple endpoints
myproj.endpoints.delete(['endpoint-slug1','endpoint-slug2'])
# Endpoint Attributes
You can access the Endpoint attributes by using regular dot notation, for example:
>>> ep.api_key
'43iVTWTp55N7p62iSZYZLyuk'
Attribute | type | description |
---|---|---|
title | string | Name of the Endpoint |
kind | int | The kind of endpoint (webservice , stream , batch ) |
updated_at | string | when was this Endpoint last updated |
last_deployment | dict | details about the Endpoint's last deployment |
deployments | List | list of dictionaries containing details about all of the Endpoint's deployments |
deployments_count | int | The number of deployments that the Endpoint had |
templates | List | List of compute templates that are assigned to the Endpoint |
endpoint_url | string | The Endpoint's requests URL |
url | string | The Endpoint's base URL |
current_deployment | dict | The active deployment's data |
compute_name | string | The name of the current compute template that is being used for the Endpoint to run |
image_name | string | Name of the Endpoint's environment that is currently deployed |
image_slug | string | The slug value of the Endpoint's deployed image |
api_key | string | API key to access the Endpoint securely |
created_at | string | The time that this endpoint was created |
max_replica | int | Maximum number of pods to run this endpoint on |
min_replica | int | Minimum number of pods0 to run this endpoint on |
export_data | bool | whether to export data or not |
conditions | dict | conditions attached to this Endpoint and trigger a Flow/email every time one of them is met |
# Update The Endpoint's version
You can deploy a new version to the Endpoint and change some of its settings, for example:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ep = myproj.endpoints.get("endpoint-slug")
ep.update_version(file_name="new_predict.py", commit="q7veenevzd83rewxgncx")
# Update the Endpoint's replica set
You can update the minimum and maximum number of pods to run the Endpoint on:
ep.update_replicas(min_replica=2, max_replica=5)
# Get sample code
You can fetch the sample code to query the Endpoint (as shown in the Endpoint's main screen):
example:
>>> sample_code = ep.get_sample_code()
>>> sample_code['curl']
'curl -X POST \\\n http://endpoint_title.cnvrg.io/api/v1/endpoints/q7veenevzd83rewx...'
# Poll charts
You can fetch a dictionary with data about the Endpoint's latency performance, number of requests and user generated metrics from the Endpoint's charts:
>>> ep.poll_charts()
# Rollback version
If you want to rollback the Endpoint version to a previous one you just need to pass the current version's slug
value, for example:
>>> ep.current_deployment['title']
3 # current version is 3
>>> last_version_slug = ep.current_deployment["slug"]
>>> ep.rollback(version_slug=last_version_slug)
>>> ep.reload()
>>> ep.current_deployment['title']
2 # after the rollback the Endpoint's version is now 2
NOTE
To fetch the most updated attributes of the Endpoint, use ep.reload()
# Set feedback loop
You can grab all inbound data and feed it into a dataset for various uses, such as continuous learning for your models, for example:
from cnvrgv2 import Cnvrg
from cnvrgv2.modules.workflows.endpoint.endpoint import FeedbackLoopKind
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ep = myproj.endpoints.get("endpoint-slug")
ds_slug = "dataset-name"
ep.configure_feedback_loop(dataset_slug=ds_slug,
scheduling_type=FeedbackLoopKind.IMMEDIATE)
Setup the feedback loop behaviour with the following parameters:
Parameter | type | description | required | default |
---|---|---|---|---|
dataset_slug | string | slug of the receiving dataset | No | None |
scheduling_type | int | whether if the feedback loop is immediate (for every request) or recurring (for every time interval) (use FeedbackLoopKind.IMMEDIATE or FeedbackLoopKind.RECURRING ) | No | FeedbackLoopKind.IMMEDIATE |
cron_string | string | Cron syntax string if scheduling type is recurring | No | None |
Disable the feedback loop:
ep.stop_feedback_loop()
NOTE
The data will be automatically saved in predict/predictions.csv
# Control batch Endpoint
If the Endpoint is of batch type, then you can control it straight from the SDK:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
ep = myproj.endpoints.get("endpoint-slug")
# Check if it is running
ep.batch_is_running()
# Scale it up or down
ep.batch_scale_up()
ep.batch_scale_down()
#
# Webapps Operations
# Create a webapp
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
wb = myproj.webapps.create("mywebapp",
templates=["small","medium"],
webapp_type="dash",
file_name="app.py")
# Available parameters:
Parameter | type | description | required | default |
---|---|---|---|---|
webapp_type | string | The type of webapp to create ("shiny" , "dash" or "voila" ) | Yes | |
file_name | string | File name of the main app script | Yes | |
title | string | Name of the webapp | No | None |
templates | list | List of template names to be used | No | None |
datasets | list | List of datasets to connect with the webapp. | No | None |
# Available attributes:
You can access the WebApp attributes by using regular dot notation, for example:
>>> wb.webapp_type
'dash'
Attribute | type | description |
---|---|---|
webapp_type | string | The type of webapp ("shiny" , "dash" or "voila" ) |
template_ids | List | The size of the Dataset |
title | int | The name of the Dataset |
members | List | List of collaborators on this Dataset |
category | string | The data structure category |
description | string | Description of the Dataset |
num_files | int | The number of files in the Dataset |
last_commit | string | The last commit on this Dataset |
current_commit | string | The current commit on this Dataset object |
# Get webapp object
Get a specific WebApp by its slug
value
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
wb = myproj.webapps.get("webapp-slug")
NOTE
You can also reference the current running webapp from within its job scope:
from cnvrgv2 import Webapp
ws = Webapp()
# List all the webapps in the project
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
# sort them by decending order
wb = myproj.webapps.list(sort="-created_at")
TIP
sort the list by: -key
-> DESC
| key
-> ASC
# Delete webapp
You can delete a WebApp from a project by its slug
value
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
wb = myproj.webapps.get("webapp-slug")
wb.delete()
# Do bulk delete on multiple webapps
myproj.webapps.delete(['webapp-slug1','webapp-slug2'])
# Stop running webapp
Stop a running WebApp by passing its slug
value (sync=False
by default, also All WebApps must be running)
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
myproj = cnvrg.projects.get("myproject")
wb = myproj.webapps.get("webapp-slug")
wb.stop(sync=False)
# Do bulk stop on multiple webapps
myproj.webapps.stop(['webapp-slug1','webapp-slug2'])
# Dataset Operations
# Create a Dataset
You can create a new Dataset in cnvrg:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
ds = cnvrg.datasets.create(name="MyDataset",category='general')
You can use the following parameters to customize the Dataset:
Parameter | type | description | required | default |
---|---|---|---|---|
name | string | The name of the new Dataset | Yes | |
category | string | The type of dataset, can be one of the following: [general , images , audio , video , text , tabular ] | No | "general" |
# Dataset ID
In some methods, you will need to use a dataset ID. The dataset ID is the name used for the dataset in its URL.
For example, if you have a dataset that lives at: https://app.cnvrg.io/my_org/datasets/MyDataset
, the dataset ID is MyDataset
.
# Access Dataset attributes
You can access the workspace attributes by using regular dot notation:
ds.slug
ds.members
ds.last_commit
# Available attributes:
Attribute | type | description |
---|---|---|
slug | string | The unique slug value of the Dataset |
size | int | The size of the Dataset |
title | int | The name of the Dataset |
members | List | List of collaborators on this Dataset |
category | string | The data structure category |
description | string | Description of the Dataset |
num_files | int | The number of files in the Dataset |
last_commit | string | The last commit on this Dataset |
current_commit | string | The current commit on this Dataset object |
# Get a Dataset
To fetch a Dataset from Cnvrg you can use:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
ds = cnvrg.datasets.get("MyDataset")
# List all existing Datasets
You can list all the datasets in the current organization:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
ds = cnvrg.datasets.list(sort="-created_at")
TIP
sort the list by: -key
-> DESC
| key
-> ASC
# Delete a Dataset
To delete a Dataset call the delete()
function on it's instance:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
ds = cnvrg.datasets.get("MyDataset")
ds.delete()
# Dataset Commits
Every Dataset in Cnvrg may contain multiple data commits that you can interact with in the following manner:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
ds = cnvrg.datasets.get("MyDataset") # Get the Dataset object
# Get A specific commit by its sha1 value
cm = ds.get_commit('xxxxxxxxx')
# OR list all available commits
commits = [cm for cm in ds.list_commits()]
# Last and current commits are available as attributes
last_commit = ds.last_commit
current_commit = ds.current_commit
Each commit contains the following attributes:
Attribute | type | description |
---|---|---|
sha1 | string | The unique sha1 value of this commit |
source | int | Where this commit were created from |
message | string | The commit message |
created_at | List | The date the commit was created on |
# Dataset Queries
You can create and use queries directly from Cnvrg SDK to filter and use the Dataset exectly the way you want it by using Query language syntax
# Create a new Query:
from cnvrgv2 import Cnvrg
cnvrg = Cnvrg()
ds = cnvrg.datasets.get("MyDataset") # Get the Dataset object
ds.queries.create(name='OnlyPngFiles',
query='{"fullpath":"*.png"}',
commit_sha1='xxxxxxxxx')
Query parameters:
Parameter | type | description | required | default |
---|---|---|---|---|
name | string | The name of the query | Yes | |
query | string | The query string according to Query language syntax | Yes | |
commit_sha1 | string | The sha1 value of the commit that this query will be based on | No | None |
# List all of the Dataset queries:
ds.queries.list(sort="-created_at")
TIP
sort the list by: -key
-> DESC
| key
-> ASC
# Get a specific query
q = ds.queries.get('slug')
# Delete a query
q.delete()
# Dataset File operations
# Upload files to a dataset
ds.put_files(paths=['/files_dir/file1.txt', '/files_dir/file2.txt'],
pattern='*')
available Parameters:
Parameter | type | description | required | default |
---|---|---|---|---|
paths | List | The list of file paths that will be uploaded to the dataset | Yes | |
pattern | string | String defining the filename pattern | No | "*" |
message | string | The commit message | No | "" |
override | bool | Whether or not to re-upload even if the file already exists | No | False |
force | bool | Should the new commit copy files from its parent | No | False |
NOTE
If a folder is given all the relevant files in that folder (that answers to the regex pattern) will be uploaded.
# Remove files from a dataset
You can remove files from the dataset:
ds.remove_files(paths='*',
message='This will delete everything!')
NOTE
When deleting files from a dataset paths
parameter can be both a list of file paths or a string pattern like '*'
# List Dataset content
You can list all of the files and folders that are in the dataset:
ds.list_files(query_raw='{"color": "yellow"}',
sort='-id')
ds.list_folders(commit_sha1='xxxxxxxxx')
Available Parameters
Parameter | type | description | required | default |
---|---|---|---|---|
commit_sha1 | string | Sha1 string of the commit to list the files from | No | None |
query | string | Query slug to list files from | No | None |
query_raw | string | Raw query to list files according to Query language syntax | No | None |
sort | string | Key to sort the list by (-key -> DESC / key -> ASC) | No | "-id" |
# Clone the dataset to the current working directory
ds.clone()
# Download dataset latest commit
ds.download()
WARNING
The Dataset must be cloned first
# Sync the local dataset with the remote one
ds.sync_local()