# Setup and Make a Batch Prediction

cnvrg can help you get to production quickly and easily. The platform supports many different types of endpoints, including batch prediction endpoints.

A batch prediction endpoint allows you to quickly productionize a model for batch inferences. cnvrg will take your model and encase it as an API and use the predict function you supply to query the model. When the model isn't being queried, the service will not use any pods or any of your resources.

In this tutorial, we will step through a simple example, creating two flows:

  • A flow to train a model and deploy a model as a batch prediction endpoint
  • A flow to run a batch prediction in your new endpoint

We will be using the IMDB example project.

# About IMDB

In this example, we’ll be training a text classification model using the IMDB dataset, which is a set of 50,000, highly polarized reviews from the Internet Movie Database.

The model will be a simple binary classification model and if all goes according to plan, should be able to accept a review and tell if it is a ‘positive’ or ‘negative’ review.

Our dataset has already been preprocessed and the written reviews translated into integers, where each integer corresponds with a word in a dictionary.

# Getting the project started

On the website, navigate to the Projects tab.

Welcome to the home of your code, experiments, flows and deployments. Here everything lives and works together.

For this example, we’ll use the prebuilt example project. On the top right, click Example Projects.

Select Text Classification with Keras and IMDB dataset.

Now you’ve created a cnvrg project titled imdb. The imdb project dashboard is displayed. Let’s have a closer look at what’s inside the project and files.

# Train and Deploy a Batch Prediction Endpoint

Let's create a flow to train and deploy a model.

# Create a new flow

  1. Go to the Flows tab for the project.
  2. Open an existing flow or create a new one by clicking New Flow.
  3. Click the name of the flow and rename it as "Train & Deploy".

# Add a custom task to train a model

  1. Click New Task > Custom Task.
  2. In the panel that appears, set the command as: python3 train.py. Click the Create a Custom Script to confirm.
  3. Click "Task 1" to rename the task as "train".
  4. In the Advanced tab, set the Compute to large.
  5. Click Save Changes.

# Add a deploy task to deploy the batch prediction endpoint

  1. Click New Task > Deploy Task.
  2. Click Select Endpoint > + Add New.
  3. Click "Deploy Task 1" to rename the task as "Deploy Model".
  4. Fill in the endpoint form:
    • Click Batch.
    • Endpoint Title: Batch-Endpoint
    • Compute: medium
    • Number of Pods: 1 to 3
    • Predict:
      • File: predict.py
      • Function: predict
  5. Click Save Changes.
  6. Connect the the right dot of the Train Model task to the left dot of the new Deploy Model task.

# Run flow

Great! You will now have a finished two task flow. The flow will train a model and the deploy the model as a batch predict endpoint. When we run the flow everything will be handled by cnvrg, compute scheduling, experiment tracking and deployment.

Click Run (the blue arrow). The flow will start running and you can track its progress from the flow run page.

# Run a Batch Prediction

While our model is training and deployment, let's work on the second flow. This flow will run a batch prediction.

# Create a new dataset for the prediction

We are going to use a CSV for our batch prediction. We have created a CSV file to use, but first you will create a dataset to save it in.

  1. Go to the Datasets tab.
  2. Click + New Dataset.
  3. In the dataset setup page, set:
    • Dataset Name: movie-reviews
    • Type: Tabular
  4. Click Save Dataset.

# Download the CSV

Download the below CSV by right-clicking and downloading the linked file:

Download this CSV

# Add the CSV to the dataset

Use drag-and-drop to add the downloaded CSV file to your dataset. Then click Save to upload the file.

Great! Now we have some data to use for our batch prediction!

# Create a new flow

  1. Go to the Flows tab for the IMDB project.
  2. Open an existing flow or create a new one by clicking New Flow.
  3. Click the name of the flow and rename it as "Batch Prediction".

# Create a data task

  1. Click New Task > Data Task.
  2. In the panel that appears, set the dataset as: movie-reviews
  3. Click Save Changes.

# Create a Batch Predict task

  1. Click New Task > Batch Predict.
  2. Set the correct parameters:
    • endpoint_id: For this parameter you will need to go to your deployed batch prediction endpoint. In the URL you will see a ID. Copy that ID for use here. For example, if your URL was https://app.cnvrg.io/cnvrg/projects/imdb/endpoints/show/iztpzuntqzw2g9ksydxk, the ID would be iztpzuntqzw2g9ksydxk.
    • input_file: /data/movie-reviews/reviews.csv (this is where the dataset and CSV file will be located in the experiment)
    • output_file: predictions.csv
    • dataset: movie-reviews
  3. Click Save Changes.
  4. Connect the the right dot of the movie-reviews task to the left dot of the new Batch Predict task.

# Run flow

Awesome! You will now have created a flow that will take the dataset and CSV file we created and use it in a batch prediction in our new endpoint. It will create and output CSV file called 'predictions.csv' and sync that back to our original movie-reviews dataset.

Click Run (the blue arrow). The flow will start running and the batch prediction will start. You can track it's performance from the experiments page.

# Check the Results

When the batch prediction has concluded, you can go to the dataset and find the output CSV.

  1. Go to the Datasets tab.
  2. Click movie-reviews.
  3. Click the file predictions.csv.

The file will load. Here you can see the inputs and predicted sentiments as predicted in our batch prediction.

Additionally, everything in cnvrg is tracked, so you can also go to the endpoints page and see comprehensive information about the health and performance of the service.

# Conclusion

cnvrg is the easiest way to set up and make batch predictions. You can create your own using cnvrg as well! The combination of flows for end-to-end pipelines and comprehensive easy to use serving MLOps makes cnvrg the best solution for getting to production quickly and easily.

Last Updated: 8/29/2022, 1:10:15 PM