# cnvrg SDK

The cnvrg SDK was developed in Python and is designed to help data scientists to interact with cnvrg from their code, experiments and models. Through the SDK, you can create experiments, manage models, automate your machine learning pipeline and more.

The topics in this page:

# Download and Install the cnvrg SDK

To install the cnvrg SDK simply use the following pip command.

pip install cnvrg

Or if you're using pip for Python 3, use the following:

pip3 install cnvrg

WARNING

The cnvrg SDK requires Python 3.5 or later.

# SDK Operations

# Authenticating the cnvrg SDK

# Inside a cnvrg workspace or experiment

The cnvrg SDK will already be initialized and authenticated with cnvrg using the account that is logged in. You can start using cnvrg SDK functions immediately.

# Authenticate using the SDK

You can authenticate with the SDK Cnvrg Class inside any scripts that you want to run locally. Authentication using this method does not persist after the script has finished running.

Import the Cnvrg submodule and then initialize a Cnvrg object with the correct API, username and password.

from cnvrg import Cnvrg 
Cnvrg(url='/api', email='email_address', password='password')

The above code snippet should contain your API link as required. If you are still not sure what your unique Web URL is, log on via the Web and copy the URL from your browser, or contact support for additional help.

# Authenticate using the CLI

If you are running an experiment on your local machine or in another environment, you can authenticate your local workspace using the cnvrg CLI.

Finally, ensure you are working in a folder that is linked with cnvrg (you can link using either the cnvrg link or cnvrg clone command).

# Project Operations

# Create a new project

To create a new project:

from cnvrg import Project
p = Project.create('Project_Name')

# Upload files and artifacts to a project

To upload a list of files stored locally to the project. The list contains the absolute file paths for the files to be uploaded:

from cnvrg import Project
# set a project object from existing project or create a new one.
p = Project.create('Project_Name')
p.put_files(['/cnvrg/notebook.ipynb','/cnvrg/model.h5'])

NOTE

If you are using python on Windows OS, when putting the file path make sure it is formatted correctly either by putting it as an r string (example:r'C:\User\cnvrg\notebook.ipynb') or writing it with double slashes (example:'C:\\User\\cnvrg\\notebook.ipynb')

# Get a list of existing projects

To return a list of existing projects:

from cnvrg import Project
Project.list()

# Experiment Operations

# Experiment ID

In many commands, you will need to use an experiment ID. The experiment ID can be found in the URL for the experiment.

For example, if you have an experiment that lives at: https://app.cnvrg.io/my_org/projects/my_project/experiments/kxdjsuvfdcpqkjma5ppq, the experiment ID is kxdjsuvfdcpqkjma5ppq.

Then to use e=Experiment(), the command would be e=Experiment('kxdjsuvfdcpqkjma5ppq').

# Run an experiment

If you have a function or file that you would like to run as an experiment, use the Experiment.run() command. The code can be run locally (on your own machine or in a cnvrg workspace) or on remote compute (using the compute parameter).

A function can only be run as an experiment locally (if compute="local").

To create a new experiment and run specific code locally or on a remote machine:

from cnvrg import Experiment
e = Experiment.run(command,title='my title', compute='small')
Parameter type description
command string The command to run. Can be a shell command or function. For example, python3 train.py or main.
title string The title for the experiment.
image string The image to use for the experiment.
compute string The compute to run the experiment on, default is None (will run local). For Kubernetes compute it should be in the format: cluster_name.template_name. For example, workers.medium. You can leave out cluster_name to use the default cluster. For an on-premise machine, simply write the name of the machine. For example, worker1.
datasets array of dictionaries A list of datasets and commits/queries (that is: [{"id": "dataset_id-1", "commit": "commit_sha1-1", "query": "query_name-1", "tree_only": true, "use_cached": true}, {dataset_dict_2}, ...]) to attach to the experiment. You can use either commit or query and not both. use_cached is only relevant when integrated to NFS and is true by default.
callable callable object specify a function to run as experiment, instead of command
arguments dictionary Include parameters to be parsed on the command line (for example, parameters for grid search)
schedule string Set a time to execute the experiment. The time is relative to the timezone defined for the current user. You can specify the time using either a 24- or 12-hour clock format. You can specify time in the format dd.mm.yyyy hh:mm<PM/AM> or yyyy.dd.mm hh:mm<PM/AM>. For example: To specify 4:25 PM on the 31st of January 2012, you can specify any of these: 31.01.2021 16:25, 31.01.2021 4:25PM, 2021.01.31 16:25, 2021.01.31 4:25PM.
recurring string Set a schedule for executing a recurring experiment. Specify the schedule using Cron format. For example, to execute the experiment at 30 minutes past every hour, specify: "30 * * * *"
notify_on_success boolean Receive an email when the experiment ends successfully. Default is False.
notify_on_error boolean Receive an email if an error occurs while your experiment is running. Default is False.
emails string Add extra recipients for email notifications for the experiment. The string should be a list of comma-separated emails. For example, "person1@email.com,person2@email.com".
git_branch string (When project is connected to git) - Which git branch to clone. (Default is project's default)
git_commit string (When project is connected to git) - Which commit of the git branch to clone. (Default is project's default)
commit string (When project is connected to git) - Which commit of the cnvrg project files to clone. (Default is project's default)
output_dir string (When project is connected to git) - The path of the folder with experiment artifacts to be synced as the commit for the experiment. (Default is project's default)
git_diff boolean (When project is connected to git) - Wether to additionally sync the files that are in the git diff. (Default is False)

NOTE

The git_diff parameter is useful when developing in a workspace. With it, you can fiddle with your code and run experiments without first syncing to git. The code you run with will still be synced so you can eventually push back the code that your are most happy with.

# Examples

  • Run an experiment on a remote small compute, using a script with a dataset and a query:

    from cnvrg import Experiment
    e = Experiment.run('python3 train.py',
                        title='my title',
                        compute='small',
                        datasets=['dogscats.only_dogs'])
    
  • Run a grid search:

    from cnvrg import Experiment
    e = Experiment.run('python3 train.py',
                        title='grid search'
                        arguments={"epochs": [10,15,20], "batch_size": [128,256]})
    
  • Run a grid search using a YAML file:

    You can also run a grid search using the SDK by including the parameter information in a YAML file. To use the YAML file, you must first load it.

    from cnvrg import Experiment
    import yaml
    with open('/path/parameters.yaml') as f:
        grid = yaml.full_load(f)
    e = Experiment.run('python3 train.py', grid=grid)
    

    TIP

    Learn more about running grid searches and hyperparameter optimization here.

  • Run an experiment locally:

    from cnvrg import Experiment
    e = Experiment.run('python3 train.py',title='my title')
    
  • Run an experiment on a remote compute:

    from cnvrg import Experiment
    e = Experiment.run('python3 train.py',title='my title',compute='gpu')
    
  • Run an experiment using a specific Docker image:

    from cnvrg import Experiment
    e=Experiment()
    e.run('command', compute='medium', image="tensorflow:19.07")
    
  • Run a local experiment using a function:

    from cnvrg import Experiment
    def main(experiment):
        [print(i) for i in range(100)]
    e = Experiment.run(main,title='my function')
    
  • Load an existing experiment object from inside a running experiment:

    from cnvrg import Experiment
    e = Experiment()
    # run commands on the experiment object:
    e.log_param("tag_key", "tag_value")
    
  • Use email notifications for your experiment:

    from cnvrg import Experiment
    e = Experiment.run('python3 train.py', notify_on_success=True, notify_on_error=True)
    

# Initialize an experiment object

You can also create an Experiment object with the Experiment.init() command. This allows you to easily add SDK tracking into code you will run. Simply initialize the object and then run your code. Use the metadata operations to control what will be tracked by cnvrg.

To create a new experiment object that can track logs, metrics and artifacts:

from cnvrg import Experiment
e = Experiment.init()
# Then run your code and track it using the sdk, for example:
e.log("Starting my experiment")

You can also use the project parameter to tell cnvrg which project the experiment should be part of. If you use Experiment.init() in a cnvrg project folder linked with the CLI (either online or locally) you do not need to set this parameter.

from cnvrg import Experiment
e = Experiment.init(project=Project(‘slug’))

# Finish an experiment object

If you have initialized an Experiment object using the .init() command, you should conclude the experiment with the .finish() command.

To conclude an experiment object:

from cnvrg import Experiment
e.finish(exit_status=exit_status)

The default exit_status is aborted (exit_status=-1).

# Examples

    # Metadata operations on experiments

    # Add logs to an experiment:

    from cnvrg import Experiment
    e = Experiment()
    e.log("my first log")
    e.log(["my first log","my second log"]
    

    # Get logs of an experiment:

    from cnvrg import Experiment
    e = Experiment()
    e.logs() ### prints the logs of the experiment
             ### until the status of the experiment
             ### is error/aborted/success
    

    # Get the title of an experiment and set a new one:

    from cnvrg import Experiment
    e = Experiment("EXPERIMENT_SLUG")
    e.title # get title
    e.title = "dogs-and-cats" # set new title
    

    # Get various other metadata:

    from cnvrg import Experiment
    e = Experiment("EXPERIMENT_SLUG")
    e['status'] # get current status of experiment
    e['full_href'] # get full URL of the experiment
    e['commit'] # get the start commit of the experiment
    e['end_commit'] # get the end commit of the experiment
    

    # Create a tag:

    from cnvrg import Experiment
    e = Experiment()
    e.log_param("my_tag", "my_value")
    

    # Create a line chart:

    • Single line:

      from cnvrg import Experiment
      e = Experiment()
      e.log_metric("chart_name",
              Ys=[0.1, 0.2, 0.3],
              Xs=[1,2,3])
      
    • Multiple lines:

      from cnvrg import Experiment
      e = Experiment()
      e.log_metric("chart_name",
              Ys=[0.1, 0.2, 0.3],
              Xs=[1,1,1],
              grouping=["loss", "val_loss", "acc"])
      

    Xs and grouping are optional. When Xs is set, you can control the corresponding X value for the Y values. When grouping is set, multiple lines will be created: the first grouping, x and y value will be matched as a point on a single line. Xs and Ys can be of type float, integer, numpy single value, list, floats, list of numpy values or a numpy array.

    WARNING

    chart_name can't include "/"

    # Create a bar chart:

    Example bar chart

    • Single bar:

      from cnvrg import Experiment
      from cnvrg.charts import Bar
      e = Experiment()
      x_value=["bar1","bar2",...,"barN"]
      y_value=[1,2,...,n]
      e.log_chart("chart_key", title="chart_title",
          data=Bar(x=x_value, y=y_value, name="y_value"))
      
    • Multiple bars:

      from cnvrg import Experiment
      from cnvrg.charts import Bar
      e = Experiment()
      x_value=["bar1","bar2",...,"barN"]
      y_value1=[1,2...n]
      y_value2=[n, n-1,...,1]    
      e.log_chart("chart_key", title="chart_title",
          data=[Bar(x=x_value, y=y_value1, name="y_value1", min_val=0, max_val=10),
                Bar(x=x_value, y=y_value2, name="y_value2", colors=['#FF0000', '#00FF00', '#0000FF',...])])
      

    Typing information: x must be a List and y must be an Array, np.ndarry, pd.array or pd.series.

    The x list will populate the labels for the bars, and the corresponding y value will dictate the value of the bar for that category. The name of the y array will be the name of the set/category in the graph. min_val and max_val are optional and are numbers that correspond the lower and upper bounds for the y values. Optionally, you can set each bar to be a specific color using the colors list of hex values, with each hex value corresponding to each x value.

    Steps and groups:

    Using steps and groups allow you to submit bar charts across different steps and visualize it in a single chart with a slider to easily move between the steps. steps should be an integer and group. Multiple steps should be grouped with a single group.

    Animated bargraph

    NOTE

    e.log_chart() creates a new graph or overwrites an existing graph. It will not append new bars to the previous version of the graph (unlike e.log_metric()).
    You must therefore include all bars in the single e.log_chart() method every time you update it or use a different chart_key.

    # Create a heat map/confusion matrix:

    Example heat map

    from cnvrg import Experiment
    from cnvrg.charts import MatrixHeatmap
    e = Experiment()
    e.log_chart("chart_key", title="Heatmap", x_ticks=['x', 'y'], y_ticks=['a', 'b'],
        data=MatrixHeatmap(matrix=[(0,5), (1,1)],
                           color_stops=[[0,'#000000'],[1, '#7EB4EB']],
                           min_val=0,
                           max_val=10))
    

    Typing information: x_ticks and y_ticks must be a List and matrix is a list of tuples in struct (x,y,z). color_stops is optional and is a List of Lists with size 2, where the nested first value is a float 0 <= X <= 1, and the second value is the hex value for the color to represent matrix values at that point of the scale. min and max are optional and should be numbers corresponding to the minimum and a maximum values for the key (scaling will be done automatically when these values are not submitted).

    Each struct corresponds to a row in the matrix and to a label from the y_ticks list. The matrix is built from the bottom up, with the first struct and y_tick at the bottom edge. Each value inside the struct corresponds to each x_tick. Using steps and groups allow you to submit the same heatmap across different steps and visualize it in a single chart with a slider to easily switch between the charts. steps should be an integer and group should be a string.

    Steps and groups:

    Using steps and groups allow you to submit heatmaps across different steps and visualize it in a single chart with a slider to easily move between the steps. steps should be an integer and group. Multiple steps should be grouped with a single group.

    Animated Heatmap

    TIP

    When using the group parameter, make sure the chart's key is unique across the different steps

    NOTE

    When using e.log_chart() you are are creating or overwriting an existing graph. It will not append new bars to the previous version (unlike e.log_metric()).
    You must therefore include all values in the single e.log_chart() method every time you update it or use a different chart_key.

    # Create a scatter plot:

    Example scatter plot

    • Single set of points:
      from cnvrg import Experiment
      from cnvrg.charts import Scatterplot
      e=Experiment()
      x_values=[1,2,3,4,5]
      y_values=[5,4,3,2,1]  
      e.log_chart("chart_key", title="chart_title",
          data=Scatterplot(x=x_values, y=y_values, name="name")
      
    • Multiple sets of points:
      from cnvrg import Experiment
      from cnvrg.charts import Scatterplot
      e=Experiment()
      x1_values=[1,2,3,4,5]
      x2_values=[1,2,3,4,5]
      y1_values=[5,4,3,2,1]  
      y2_values=[1,2,3,4,5] 
      e.log_chart("chart_key", title="chart_title",
          data=[Scatterplot(x=x1_values, y=y1_values, name="name"),
                Scatterplot(x=x2_values, y=y2_values, name="name2")]
      

    Typing information: x and y must be an Array, np.ndarry, pd.array or pd.series. x is the list of x values and y is the list of y values.

    NOTE

    When using e.log_chart() you are are creating or overwriting an existing graph. It will not append new bars to the previous version (unlike e.log_metric()).
    You must therefore include all values in the single e.log_chart() method every time you update it or use a different chart_key.

    # Log artifacts for your experiment:

    from cnvrg import Experiment
    e = Experiment()
    e.log_artifacts(['path/file.png','path/file2.png'], 'target_dir/')
    
    Parameter type description
    files list of strings List of files to upload (using relative path).
    target_dir string Path in the experiment file system to upload the files.

    # Show a line chart:

    from cnvrg import Experiment
    e = Experiment("experiment_id")
    e.show_chart("chart_name") ## will open a matplotlib chart
    

    # Operations on experiments

    • Sync during a running experiment:
    from cnvrg import Experiment
    e = Experiment()
    e.sync(message="my commit")
    
    • Set the title of an experiment:
    from cnvrg import Experiment
    e = Experiment()
    e.set_title("my new title")
    
    • Send an email alert during an experiment:
    from cnvrg import Experiment
    e = Experiment()
    e.alert(message="Alert: Accuracy is less than 0.5", subject="Accuracy alert", recipients=["jp@myorg.com"])
    
    • Get a list of commits per experiment
    from cnvrg import Experiment
    e = Experiment()
    e.commits()
    
    • Get the latest commit per experiment
    from cnvrg import Experiment
    e = Experiment()
    e.commits(latest=True)
    

    # Experiment artifacts

    • Retrieve a list of artifacts for a specific experiment:
    from cnvrg import Experiment
    e = Experiment("experiment_id")
    e.artifacts()
    
    • Retrieve a list of artifacts after running an experiment:
    from cnvrg import Experiment
    e = Experiment.run('python3 train.py',title='my title',compute='gpu')
    e.artifacts()
    

    Response:

    [{'name': 'artifact.txt',
      'url': 'URL LINK',
      'type': 'blob',
      'size': '121 Bytes',
      'fullpath': 'artifact.txt',
      'updated': '14 minutes ago'}]
    
    • Pull artifacts of a specific experiment:
    from cnvrg import Experiment
    e = Experiment("experiment_id")
    e.pull_artifacts()
    
    Parameter type description
    commit string set a specific commit to list artifacts from. Default is the latest experiment's commit.
    wait_until_success boolean wait until experiment has finished successfully and then pull the artifacts
    • Pull artifacts to a specific path:
    from cnvrg import Experiment
    e = Experiment("experiment_id")
    e.pull_artifacts("path_to_folder")
    
    • Wait until the experiment has finished running successfully and then pull the artifacts from the end commit:
    from cnvrg import Experiment
    e = Experiment.run("experiment_id")
    e.pull_artifacts(wait_until_success=True)
    
    • Pull artifacts after running an experiment:
    from cnvrg import Experiment
    e = Experiment.run('python3 train.py',title='my title',compute='gpu')
    e.pull_artifacts()
    

    # Flow Operations

    Flows can be created and run from any environment using the SDK. Creating flows requires using a flow configuration YAML file.

    # Create a flow

    You can use a flow YAML to create a flow inside a project. You can use either the absolute path to a YAML file or include the YAML content directly. Use the Flow.create command:

    from cnvrg import Flow
    f = Flow.create(yaml_content=string, project=Project('Project_Name'))
    #OR
    f = Flow.create(file=path_to_YAML, project=Project('Project_Name'))
    
    Parameter type description
    yaml_content string Required (if not using file). A string with the contents of the YAML configuration.
    file path Required (if not using yaml_content). A path to the YAML configuration file.
    project Project object A cnvrg Project object for the project to create the flow in. This can be left out if you are in a cnvrg project (linked with the CLI).

    # Examples

    1. Using path:

      from cnvrg import Flow
      from cnvrg import Project
      f = Flow.create(file='/local_path/flow.yaml', project=Project('Project_Name'))
      
    2. Using YAML directly:

      ## Define your flow
      example_yaml = '''---
      flow: Flow Example
      recurring: 
      tasks:
      - title: Training Task
        type: exec
        input: python3 train.py
        computes:
        - medium
        image: cnvrg:v5.0
      relations: []'''
      
      from cnvrg import Flow
      from cnvrg import Project
      f = Flow.create(yaml_content=example_yaml, project=Project('Project_Name'))
      

    # Run a flow

    To trigger a flow and run it as described in the YAML file or as constructed using the online flow builder:

    from cnvrg import Flow
    fv = f.run()
    

    # Monitor the status of a running flow

    To check the status of a running flow:

    from cnvrg import Flow
    fv = f.run()
    fv.info()
    

    # Pull latest task artifacts

    Pull latest task artifacts. This command will pull the artifacts that were created in the experiment, from the last version that ran.

    NOTE

    You can only pull artifacts from a task that runs only a single experiment

    from cnvrg import Flow
    flow = Flow("training") #name of the flow
    flow.pull_artifacts(task="SVM")
    
    Parameter type description
    task string Required The name of the task of the flow you want too pull artifacts

    # Endpoint Operations

    # Endpoint ID

    In many commands, you will need to use an endpoint ID. The endpoint ID can be found in the URL for the endpoint.

    For example, if you have an endpoint that lives at: https://app.cnvrg.io/my_org/projects/my_project/endpoints/show/j46mbomoyyqj4xx5f53f, the endpoint ID is j46mbomoyyqj4xx5f53f.

    Then to use e=Endpoint(), the command would be e=Endpoint('j46mbomoyyqj4xx5f53f').

    # Load an endpoint

    To load an endpoint and use it with the SDK:

    from cnvrg import Endpoint
    e = Endpoint("endpoint_id")
    

    # Query an endpoint

    • Send a single prediction to endpoint:
    from cnvrg import Endpoint
    e = Endpoint("endpoint_id")
    e.predict("my_predict")
    
    
    • Send batch predictions to endpoint:
    from cnvrg import Endpoint
    e = Endpoint("endpoint_id")
    e.batch_predict(["my_predict1", "my_predict2",...,"my_predictn"], pool_size=100)
    
    

    # Endpoint tracking and visualizing

    To log a metric in your endpoint, use the log_metric() method in your code:

    from cnvrg import Endpoint
    e = Endpoint()
    e.log_metric(k,v)
    

    k is the name of the metric, v is value.

    For example: e.log_metric(‘confidence’, 0.8)

    The triggers in your endpoint leverage the metrics you log using this command. To set a trigger for confidence, you would have to be logging that metric using the cnvrg SDK. Tracked metrics will also be used for creating live graphs on the Charts tab.

    WARNING

    e.log_metric() currently only supports logging numbers.

    # Get predictions

    To get the previous 1000 predictions made by a model, use the get_predictions() method:

    from cnvrg import Endpoint
    e = Endpoint("endpoint_id")
    predictions_list = e.get_predictions()
    

    This method will return a list with the last 1000 predictions made by the endpoint.

    # Rollback to previous model

    To rollback a model that is in production use the rollback() method. This will rollback the latest model that was deployed and the endpoint will serve 100% of the traffic using the previous model:

    from cnvrg import Endpoint
    e = Endpoint("endpoint_id")
    e.rollback()
    

    # Miscellaneous

    • Log a prediction in your endpoint:
    from cnvrg import Endpoint
    e = Endpoint("endpoint_id")
    e.log_request(input, output, start time (in ms))
    
    • Disable logging input in logs:
    from cnvrg import Endpoint
    e = Endpoint("endpoint_id")
    e.disable_log_input()
    
    • Disable logging output in logs:
    from cnvrg import Endpoint
    e = Endpoint("endpoint_id")
    e.disable_log_output()
    

    # Dataset Operations

    # Dataset ID

    In many methods, you will need to use a dataset ID. The dataset ID is the name used for the dataset in its URL.

    For example, if you have a dataset that lives at: https://app.cnvrg.io/my_org/datasets/dataset_name, the dataset ID is dataset_name.

    Then to use ds=Dataset(), the command would be ds=Dataset(dataset_name).

    # Load a dataset

    To load a dataset with the SDK use the Dataset module.

    from cnvrg import Dataset
    ds = Dataset('dataset_id')
    

    Ensure that the dataset slug is the url of the dataset inside cnvrg.

    # List dataset query files

    from cnvrg import Dataset
    ds = Dataset('dataset_id') # Initializing dataset
    query_files = ds.get_query("query_slug")
    # [{'name': 'create_tags.rb',
    #  'fullpath': 'create_tags.rb',
    #  'url': 'https://object_storage_url...',
    #  'metadata': {'color': 'blue', 'animal': 'dino'}
    #  },
    # {'name': 'dataset_tags.rb',
    #  'fullpath': 'dataset_tags.rb',
    #  'url': 'https://object_storage_url...',
    #  'metadata': {'color': 'white', 'animal': 'dog'}
    #  }]
    

    Ensure that the dataset slug is the url of the dataset inside cnvrg and the query slug is the url of the query in cnvrg.

    # Parameters

    Parameter type description
    data_dir string dataset local directory, if files exist in a local directory
    filter string filter results (comma separated): fullpath, name, metadata, url

    # Examples

    • Using the data_dir parameter:

      from cnvrg import Dataset
      ds = Dataset('dataset_id') # Initializing dataset
      query_files = ds.get_query("query_slug",data_dir="/home/my_data")
      # [{'name': 'create_tags.rb',
      #  'fullpath': '/home/my_data/create_tags.rb',
      #  'url': 'https://object_storage_url...',
      #  'metadata': {'color': 'blue', 'animal': 'dino'}
      #  },
      # {'name': 'dataset_tags.rb',
      #  'fullpath': '/home/my_data/dataset_tags.rb',
      #  'url': 'https://object_storage_url...',
      #  'metadata': {'color': 'white', 'animal': 'dog'}
      #  }]
      
    • Using a filter:

      from cnvrg import Dataset
      ds = Dataset('dataset_id') # Initializing dataset
      query_files = ds.get_query("query_slug",data_dir="/home/my_data",filter="name, fullpath")
      # [{'name': 'create_tags.rb',
      #  'fullpath': '/home/my_data/create_tags.rb',}
      #  },
      # {'name': 'dataset_tags.rb',
      #  'fullpath': '/home/my_data/dataset_tags.rb',}
      #  }]
      

    # Search dataset using a query

    from cnvrg import Dataset
    ds = Dataset('dataset_id') # Initializing dataset
    query_files = ds.search("{'type':'dog'}")
    # [{'name': 'create_tags.rb',
    #  'fullpath': 'create_tags.rb',
    #  'url': 'https://object_storage_url...',
    #  'metadata': {'color': 'blue', 'animal': 'dino'}
    #  },
    # {'name': 'dataset_tags.rb',
    #  'fullpath': 'dataset_tags.rb',
    #  'url': 'https://object_storage_url...',
    #  'metadata': {'color': 'white', 'animal': 'dog'}
    #  }]
    

    Ensure that the dataset slug is the url of the dataset inside cnvrg and the query slug is the url of the query in cnvrg.

    # Parameters

    Parameter type description
    data_dir string dataset local directory, if files exist in a local directory
    filter string filter results (comma separated): fullpath, name, metadata, url

    # Examples

    • Using the data_dir parameter:

      ds = Dataset('dataset_id') # Initializing dataset
      query_files = ds.search("{'type':'dog'}",data_dir="/home/my_data")
      # [{'name': 'create_tags.rb',
      #  'fullpath': '/home/my_data/create_tags.rb',
      #  'url': 'https://object_storage_url...',
      #  'metadata': {'color': 'blue', 'animal': 'dino'}
      #  },
      # {'name': 'dataset_tags.rb',
      #  'fullpath': '/home/my_data/dataset_tags.rb',
      #  'url': 'https://object_storage_url...',
      #  'metadata': {'color': 'white', 'animal': 'dog'}
      #  }]
      
    • Using a filter:

      ds = Dataset('dataset_id') # Initializing dataset
      query_files = ds.search("{'type':'dog'}",data_dir="/home/my_data",filter="name, fullpath")
      # [{'name': 'create_tags.rb',
      #  'fullpath': '/home/my_data/create_tags.rb',}
      #  },
      # {'name': 'dataset_tags.rb',
      #  'fullpath': '/home/my_data/dataset_tags.rb',}
      #  }]
      

    # Fetch the metadata of dataset files

    from cnvrg import Dataset
    ds = Dataset("dataset_id")
    metadata = ds.get_metadata(["dogs/black_dog.png", "cats/black_cat.png", "cats/white_cat.png", "non_existent_file", "file_without_metadata"])
    # {
    #     "dogs/black_dog.png": {
    #                     "color": "black",
    #                     "animal": "dog"
    #                  },
    #     "cats/black_cat.png": {
    #                     "color": "black",
    #                     "animal": "cat"
    #                  }
    #     "cats/white_cat.png": {
    #                     "color": "white",
    #                     "animal": "cat"
    #                  }
    #     "file_without_metadata": {}
    # }
    

    Ensure that the dataset slug is the url of the dataset inside cnvrg.

    # Cache and clear commits

    • To cache a commit to an external NFS disk:

      from cnvrg import Dataset
      ds = Dataset('dataset_id')
      ds.cache_commit(commit_sha1='sha1', external_disk_slug='disk_slug')
      
    • To clear a commit from an external NFS disk:

      from cnvrg import Dataset
      ds = Dataset('dataset_id')
      ds.cache_clear(commit_sha1='sha1', external_disk_slug='disk_slug')
      
    Parameter type description
    commit_sha1 sha1 (string) The sha1 of the commit.
    external_disk_slug string The slug of the external disk.

    # Dataset metadata (commits and external disks)

    • To return the current commit:

      from cnvrg import Dataset
      ds = Dataset('dataset_id')
      ds.get_current_commit()
      
    • To return a tree-structured dictionary containing the current project files commits:

      from cnvrg import Dataset
      ds = Dataset('dataset_id')
      ds.get_current_tree()
      

    # AI Library Operations

    You can run a component from the AI Library using the following method:

    from cnvrg import Library
    library = Library('/cnvrg_libraries/library_name')
    experiment = library.run(arg1="val1",arg2="val2",...)
    

    You can also run a grid search by using an array for an argument. cnvrg will identify how many experiments need to be run for each permutation of the library and starting them all. Use thw following format:

    from cnvrg import Library
    library = Library('/cnvrg_libraries/library_name')
    experiment = library.run(arg=["val1","val2"])
    

    All of the components arguments can be included in the library.run() method. If there is mo default value, the argument must be provided in the method call.

    You can also set environmental conditions for the experiment using with the following arguments:

    command
    arguments
    grid
    title
    project
    computes
    datasets
    local
    library
    working_directory
    sync_before
    sync_after
    prerun
    requirements
    image
    

    For example, to run a library locally:

    from cnvrg import Library
    library = Library('/cnvrg_libraries/library_name')
    experiment = library.run(computes='local')
    

    # Troubleshooting the cnvrg SDK

    If you experience issues running or installing the cnvrg SDK, troubleshoot by performing the following checks:

    # Update the cnvrg SDK

    Sometimes updating the SDK can help resolve the issue.

    To update the SDK, run the command:

    pip install cnvrg -U
    

    Or if you're using pip for Python 3, use the following:

    pip3 install cnvrg -U
    

    If you do not have the latest version, the newest version will be downloaded and installed.

    After updating the SDK, check if your issue persists.

    # Install Xcode Command Line Tools (MacOS)

    If you are having repeated errors installing pip or installing the SDK, you may not have the Xcode Command Line Tools installed on your MacOS machine. Run the following command and then try again:

    xcode-select --install
    

    NOTE

    Sometimes, updating your version of MacOS requires a reinstallation of the Xcode Command Line Tools to fix common command line issues.

    # Update or remove the snowflake-connector-python

    If you are using the python package: snowflake-connector-python in your experiment, please upgrade it to version 2.2.7 or above using the following command:

    pip3 install --upgrade 'snowflake-connector-python>=2.2.7'
    

    If you no longer need the package at all, you can delete it with the following command:

    pip3 uninstall snowflake-connector-python
    
    Last Updated: 10/13/2021, 1:54:34 PM