# Flows

Flows in cnvrg are production-ready machine learning pipelines that allow you to build complex DAG (directed acyclic graph) pipelines and run your ML components (tasks) with just drag-n-drop.

Each task in a flow is an ML component that is fully customizable and can run on different computes with different docker images. For example, you can have feature engineering running on a Spark cluster, followed by a training task running on a GPU instance on AWS.

Each run of the DAG will produce an experiment for fully tracked and reproducible machine learning.

Flows

The topics in this page:

# Flows Components

A Flow can have an unlimited number of components. A component can be a data task, custom task or deploy task. Each task is flexible and able to leverage different types of computes, environments, and frameworks.

# Data Tasks

A data task represents your datasets that are hosted and accessible via the cnvrg platform. You may select any dataset, with a specific version or query. Additionally, you can add multiple data tasks as well.

Adding a data task to your Flow, will automatically mount the chosen dataset to the connected task. For example, if you add the hotdogs dataset and connect it to VGG/ResNet/InceptionV3 (as in the image above), then the dataset hotdogs will be accessible in each of the tasks.

TIP

A mounted dataset will be accessible at /data/dataset_name.

# Custom tasks and AI Library tasks

A task in cnvrg represents a machine learning component. A task could be any component that you'd want and there is full flexibility to design and code it however you need.

A task holds the following information:

# Parameters tab

  • Command: Every task starts with a command. It could be your Python script or any other executable (R/Bash/Java/etc). For example: python3 train.py
  • Parameters: Hyperparameters, data paramaters, or any kind of argument you would like to pass to your command. All parameters are automatically captured for reproducible data science.

TIP

For parameters, you can also use comma separated values and cnvrg will automatically run an experiment for each permutation as in a grid search.

# Conditions tab

  • Add a condition that will dictate wether or not the task is executed.
  • Choose users who will approve the execution of the task.

# Advanced tab

  • (If project connected to git) Git Branch and Git Commit: Which git branch and commit to clone when executing the task.
  • Compute: The compute template/s that the task will try to run on.
  • (If project connected to git) Output Folder: Which folder you will save artifacts to. The contents of only this folder will be available in the following tasks.
  • Container: The Docker image to use for the viryal environment of the task.

Task popup

# YAML

The YAML config representation of the task.

# Deploy tasks

# Setup tab

Fill in the information for the web service you will be creating or updating:

To update an existing web service, choose the name of the endpoint from the Select Endpoint menu.

To create a new web service, click + Add New from the Select Endpoint menu. Then fill in the required details for the service:

  • Endpoint Title: The name of the service. This is also used as part of the url for the REST API and as such can only contain alphanumeric characters and hyphens.
  • Compute: The compute tha the service will run on.
  • Image: The Docker container that will serve as the virtual environment for the endpoint.
  • File: The file containing your predict function.
  • Function: The function name of your predict function. Click Advanced Settings to reveal advanced options:
  • Number of Pods: Set the lower and upper bounds for auto-scaling of the service due to demand. Set them according to the predicted traffic to the endpoint.
  • Language: Which language your service is in.
  • Flask Config: Change any flask config variables for any specific configuration you desire.
  • Function accepts file as input: Enable or disable the ability for the model to accept files as an input.

# Conditions tab

  • Add a condition that will dictate wether or not the task is executed.
  • Choose users who will approve the execution of the task.

# The Files in a Task

Similar to the execution of any other machine learning workload in cnvrg, when a task is executed, it constructs its environment as follows:

  1. Pull and execute the chosen container.
  2. Clone the chosen git branch and commit (if project connected to git) into /cnvrg.
  3. Clone the latest version of the files from your projects Files tab and artifacts from preceeding tasks into /cnvrg.
  4. Install packages from the requirements.txt file (if it exists).
  5. Execute the prerun.sh script (if it exists).

Only after all these steps have been completed will the task execute the given Command. That means you can call or use any of the files made available by the prepeartion of the environment as the Command or from within you code.

For more information see Environment.

# Artifacts Flow

Each task in the flow has access to all of the artifacts of each of the previous connected tasks. They can be found and accessed from the work directory of the task which is /cnvrg.

For example, if Task A creates a file named output.txt, that file will be in the workdir of the following tasks and could be accessed from the full path ~/cnvrg/output.txt.

TIP

If your project is connected to git, only the files that are saved to the chosen Output Folder (/output by default) will be available in the following tasks. They will be available in ~/cnvrg/PATH-OF-OUTPUT-FOLDER.

# Tags & Parameters Flow

Each task can produce tags like hyperparameters, metrics, and more. The parameters and tags of preceeding tasks are available as environment variables within the following tasks.

The key (name) of the environment variable will be in the following format:

CNVRG_TASKNAME_TAGNAME

The variable will always be in all uppercase (even if the original parameter was not). Spaces in task and tag names will be substituted for underscores. For example, if you had a parameter in Task A named model accuracy, the corresponding environment variable in following tasks will be: CNVRG_TASK_A_MODEL_ACCUARCY.

You can use a tag from a previous experiment as a parameter by using the folloing format:

{ { task_name.tag_name } }

The tag will retain the same casing (upercase and lower case letters) as was recorded in the task where it was created. Spaces in task and tag names are preserved. For example, if you had a parameter in Task A named model accuracy, the corresponding format would be: { { Task A.model accuracy } }.

Task params flow

# Execution and Reproducibility

Running the Flow by clicking the Play button in the flow bar will execute the Flow's DAG in an optimized way using your cloud/on-premise compute resources.

# Experiments

Each run will generate an experiment page so you can track in real-time the progress of your flow. Using the experiment page, you'll be able to track metrics, hyperparameters, algorithms, and more. Additionally, you also have the option to stop and terminate runs. Flow route experiments

TIP

You can use the experiment comparison feature to visualize different models and routes to understand your Flow performance better.

# Routes

cnvrg will automatically calculate all the routes inside your flow and will create a tracking page for each route.

Each task will wait for any preceding connected tasks to complete before starting to run. This way, multiple tasks can be inputs to a single task that runs on the output of several tasks.

Flow route

In the above image, Task C will only run once Task A and Task B have concluded. Task C will have access to the outputs of both Task A and Task B. There is only 1 route.

1 route

# Grid Searches

Each task can conduct a grid search over the parameters you enter. A single task will run once for each combination of parameters. Any following tasks will then run once for each of the preceding task's combinations and once for each experiment in its grid search.

Specifically, each task in the graph will run N times where N is the number of permutations in its internal grid search X times its parents number of permutations Y. N=X*Y

# Example: 2 tasks

In the above example, the first task Grid Search A consists of 3 experiments which is followed by a second task Grid Search B of 3 experiments. This will result in 9 (3*3) different flow routes, accommodating all the permutations.

# Example: 3 tasks

In the above example, both Grid Search A and Grid Search B are connected to Grid Search C. Grid Search A consists of 2 experiments, Grid Search B consists of 3 experiments and Grid Search C consists of 5 experiments. When the flow is run, cnvrg will execute 30 (2*3*5) flow routes.

# Conditions

You are able to add conditions on custom tasks, deploy tasks and AI Library tasks within your flow which will control which of the previous experiments will continue to the new task (which you are adding the conditions to). A condition can be set for any of the cnvrg Research Assistant Tags within your experiments.

To set a new condition, click the chosen task and then click the Conditions tab in on the card that appears.

  1. Choose which previous tasks in the flow will have the conditioning tested against them.
  2. Write the name of the cnvrg Research Assistant tag you will be testing.
  3. Choose the type of test (see the list below).
  4. And choose the value (if applicable) that will need to be passed in the condition.

A condition can be any of the following:

  • Greater than (>): any experiments with the matching tags value greater than the value will continue.
  • Less than (<): any experiments with the matching tags value less than the value will continue.
  • Equal (=): any experiments with the matching tags value equalling the value will continue.
  • The Maximum: the experiment with the highest value for the matching tag will continue.
  • The Minimum: the experiment with the highest value.

# Human Validation

You are able to set mandatory approvals on exec tasks, deploy tasks and AI Library tasks within your flow. If human validation is enabled for a task, the task will not begin executing until the approval is received from one of the chosen users.

When the flow reaches a task with human validation enabled, the chosen users will receive an email notification alerting them that the task is awaiting their confirmation. The email will contain a link to the flow and the pending task. They can then access the flows page, read information about the previous tasks, and decide wether or not to approve the next task to run. If they approve the task, the flow will continue, otherwise the flow will be aborted.

A flow that contains a task that is pending approval, will have the status Pending Approval as per the experiments table and within the page for the flow run. The individual task awaiting approval will also have status Pending Approval.

# Enable Human Validation on a task

To enable approvals for a task, click on the desired task and then click on the Conditions tab:

  1. Enable approvals by clicking the toggle under Human Validation.
  2. In the user selector, choose the users that will be given the authority to approve the task.

# Approve or reject a task

If a task is awaiting approval, the relevant users will receive an email notification. To approve or reject the task:

  1. Open the page for the task awaiting approval.
  2. The approval pop up will appear automatically, either select Approve or Cancel as desired.

If you would like to view more information about the previous tasks, click away from the pop up and it will disappear. Once you are ready to approve or reject the task:

  1. Click View next to the status Pending Approval.
  2. In the pop up, either select Continue Flow or Stop Flow as desired.

The flow will continue to run or be aborted according to the decision made.

# Settings

# Edit flow title


You can easily edit your Flow title by clicking the title on the top left of the Flow canvas.

# Versioning


cnvrg allows you to manage and create versions of your Flows. You can save a Flow and revert back to previous versions with just a click. To do so, click the dropdown next to your Flow title.

# Continual Learning


Continual learning is a method of machine learning, in which input data is continuously used to extend the existing models knowledge that is, to further train the model.

cnvrg made it easy for you to set Continual Learning to your flow. Click the magic-wand icon    in the Flow bar to open the Continual Learning popup.

# Continual Learning Triggers

  • Dataset updates - The flow will be triggered whenever there's a new version of the chosen dataset.
  • Webhook - The flow will be triggered when the webhook URL will be triggered

TIP

Your latest version of the flow will run when triggered.

# Scheduling


You may schedule your Flow to run periodically by clicking the Clock icon in your Flow bar    and select the schedule in the popup:

# YAML Files

A YAML file may be used to create or edit a flow. Flow YAML files store the flow title, all task parameters and relationships between tasks. To create (or edit) a flow using a YAML file click the    icon, and an editor will open in a popup. Any changes made in the popup will be saved to the file only after clicking Save.
Alternatively you can import a flow to the project using the CLI:

cnvrg flow import --file=MY_FILE.yaml

Or run a flow directly from a YAML file:

cnvrg flow run --file=MY_FILE.yaml

A flow YAML file consists of the following sections:

Key Value Required Description
flow: TITLE Yes The title of the flow
tasks: LIST_OF_TASKS Yes The tasks that exist in the flow
relations: -RELATION Yes List of the relationships between tasks. A relationship consists of from: (input) and to: (output).
schedule: DATE TIME No* When to run the flow. You can specify the time using either a 24- or 12-hour clock format. For example: To specify 4:25 PM on the 31st of January 2012, you can specify any of these: 31.01.2021 16:25, 31.01.2021 4:25PM, 2021.01.31 16:25, 2021.01.31 4:25PM.
recurring: CRON No* Set a schedule for executing a recurring flow. Specify the schedule using Cron format. For example, to execute the flow at 30 minutes past every hour, specify: "30 * * * *"

NOTE

schedule: and recurring: cannot be used together. if neither are used, the flow will run immediately when triggered.

For example:

    # Tasks

    For each task there are both required and optional fields. Some are unique per task type and other are general for all tasks.

    # Fields for all tasks

    The following table includes fields that are relevant for each of the types of flow tasks:

    Key Value Required Description
    title: TASK_TITLE Yes This field is the name for the task.
    type: data OR exec OR deploy Yes Set the type of task you are defining. Either data, exec or deploy.
    description: "task description" No The description of the task to appear in the documentation.
    top: int No The position of task card in the UI, relative to the top.
    left: int No The position of task card in the UI, relative to the left.

    NOTE

    If top and left keys aren't specified corresponding task blobs will appear at the top left corner.

    # Condition fields for custom and deploy tasks

    The condition field for an custom or deploy task can contain a condition that must be reached in order for the flow to advance. Each condition consists of a target:, task:, objective:, tag_key: and value:. This format is used both for regular conditions (where a metric is checked automatically) and approval conditions (where a user's approval is required).

    For regular conditions, only target:, task:, objective: and tag_key: are used.
    For approval conditions, only objective: and value: are used.

    Key Value Description
    target: float The value that your metric is checked against. Leave this field blank for Human-in-the-loop conditions.
    task: name_of_previous_task Which previous task this condition will be run on. If blank, all tasks will be subjected to the condition. Leave this field blank for Human-in-the-loop conditions.
    objective: min/max/gt/lt/eq/human_validation The form of comparison. Either min (minimum), max (maximum), gt (greater than), lt (less than), eq (equals) or human_validation (Human-in-the-loop)
    tag_key: KEY The metric that is being checked in the condition. Should be a metric being tracked in the experiment. Leave this field blank for Human-in-the-loop conditions.
    value: USERNAME1,USERNAME2,... The comma-separated usernames of users who will be asked to approve the running of this task. Leave this field blank for normal conditions.

    See the below examples for clarification:

      # Data task fields

      The following table includes the fields that can be specified for a data task. Make sure to include the required tasks from the general table as well:

      Key Value Required Description
      dataset: DATASET_NAME Yes The name of the dataset to be used.
      data_commit: COMMIT_SHA1 No Which commit to use of the dataset. If blank, the latest version will be used. Leave blank if using data_query.
      data_query: QUERY_NAME No Which query to use of the dataset. Leave blank if using data_commit.
      use_cached: boolean No Wether to use the cache of the chosen commit.

      # Custom task fields

      The following table includes the fields that can be specified for a custom task. Make sure to include the required tasks from the general table as well:

      Key Value Required Description
      input: COMMAND Yes The command to be run for this task. For example, python3 train.py.
      computes: -TEMPLATE Yes The list of compute templates to try and use. The compute templates should be under the computes: heading and each on their own line, preceded by a -. See the examples for clarification. You must include at least one entry.
      image: REPO:TAG No The container to use for the task. If left empty, the project's default container will be used.
      conditions: CONDITION No The condition for wether the task should be executed. See here for information.
      params: PARAMETERS No The parameters that the task will use. See here for information
      git_branch: BRANCH No Which git branch to clone. If empty, will use master. Only relevant when project is connected to git.
      git_commit: COMMIT No Which git commit to clone. If empty, will use latest. Only relevant when project is connected to git.
      mount_folders: MOUNT_STRINGS No Used to mount network drives in the pod. Each should be in the format: <ip_of_network_drive>:/<name_of_folder>.
      periodic_sync: BOOLEAN No Wether or not to enable periodic sync for the task.
      restart_if_stuck: BOOLEAN No wether or not to restart if the experiment has an error.
      prerun: BOOLEAN No Wether or not to run the prerun.sh script if it exists.
      requirements: BOOLEAN No Wether or not to use the requirements.txt list if it exists.
      notify_on_error: BOOLEAN No Wether or not to send an email notification if the experiment reaches an error.
      notify_on_success: BOOLEAN No Wether or not to send an email notification if the experiment finishes successfully.
      emails: EMAILS No Additional recipients of email notifications for this experiment.
      # Parameter fields

      The parameter field for a custom task can contain multiple parameters. Each parameter consists of two fields: key: and value:. There can be one key for each parameter, but each parameter can have multiple values.

      cnvrg will automatically calculate the different permutations of parameters and run a grid search. Each possible combination of parameters is it's own experiment and run of the flow.

      See the below examples for clarification:

        params:
        - key: data
          value: "data.csv" # single value example
        - key: epochs 
          value: [3,5,12]   # array example
        - key: batch_size
          value:            # list example
          - '128'
          - '256'
      

      # Deploy Task Keys

      Key Value Required Description
      endpoint_title: ENDPOINT_NAME Yes* For new endpoint: The name of the desired endpoint. Used to create a new endpoint. A new endpoint will be created with this name.
      endpoint_id: ENDPOINT_SLUG Yes* For existing endpoint: Supply the ID of the existing endpoint to update the service.
      computes: -TEMPLATE Yes The list of compute templates to try and use. The compute templates should be under the computes: heading and each on their own line, preceded by a -. See the examples for clarification. You must include at least one entry.
      image: REPO:TAG No The container to use for the task. If left empty, the project's default container will be used.
      conditions: CONDITION No The condition for wether the task should be executed. See here for information.
      file_name: FILE_NAME Yes The file which contains the function to be used for the service.
      function_name: FUNCTION_NAME Yes The function used to manage the input and output of the service.
      env_setup: python_3 OR r_endpoint No Wether the endpoint is Python or R based. The default is python_3.
      min_replica: int Yes Minimum number of pods to use for auto-scaling.
      max_replica: int Yes Maximum number of pods to use for auto-scaling.
      config_vars: [key1=val1,...,key=val] No Used to and different key value pairs for the Flask config of the endpoint.
      accept_files: boolean No Wether or not to accept files as an input. Default is False.
      git_branch: BRANCH No Which git branch to clone. If empty, will use master. Only relevant when project is connected to git.
      git_commit: COMMIT No Which git commit to clone. If empty, will use latest. Only relevant when project is connected to git.

      Note

      Use either endpoint_title or endpoint_id, not both!

      # Example YAMLs

      Below are various examples of different YAML tasks and files:

        Last Updated: 7/2/2020, 3:28:37 PM