# Serving

Publishing and deploying your models has never been this easy. With one click you're able to deploy models on Kubernetes clusters quickly and easily.

cnvrg supports three types of endpoints: web services, kafka streams and batch predictions.

For web services, cnvrg will automatically encase your code and models in a lightweight REST API and expose it to the web. It will generate an authentication token to protect the use of the endpoint and automatically build comprehensive model tracking and monitoring tools.

For kafka streams, cnvrg supports integrating model predictions with existing Kafka streams. This allows you to ingest Kafka streams and preform realtime predictions with complex model monitoring features.

For batch predictions, cnvrg will create a model endpoint that will be remain inactive with 0 pods until a prediction is made. The endpoint will then scale up as needed and scale back down after the prediction is completed.

The topics in this page:

# Requirements for a cnvrg Service

Creating services in cnrvg is a simple process. There are, however, a few requirements that your code must meet for everything to work. Before deploying an endpoint, make sure your code adheres to the following requirements:

  • You must create a function that handles the querying of the supplied model and returns the output.
  • The above function must return a Simple Object or a type that flask’s jsonify function can handle.
  • You can not return an already jsonified object as cnvrg will perform that function for you.
  • The input from the query will be sent to your code exactly as entered, so in the predict function, it should be referenced similarly to this: def predict(*input):.
  • The name of the file containing the function cannot have a dash (-) in it.
  • When Function accepts file as input is enabled: The service will expect a Base64 encoded file. cnvrg will then decode the Base64 and pass the decoded file to your predict function.

TIP

Because the service is constructed using your code, you can execute any code from your git and project files from within the predict function. This allows you to process inputs and trigger other logic in addition to querying your model.

# Examples of predict functions

  • Example 1 - Receiving a sentence for sentiment analysis and processing it before using the model:

    def predict(sentence):
        encoded = encode_sentence(sentence)
        pred = np.array([encoded])
        pred = vectorize_sequences(pred)
        a = model.predict(pred)
        return str(a[0][0])
    
  • Example 2 - Receiving an array and parsing it before using the model:

    def predict(*data):
        data = np.asarray(data)
        a=model.predict(data)
        return round(a[0][0],0)
    
  • Example 3 - Receiving a file and processing it before using the model:

    def predict(file_path):
        img = load_img(file_path, grayscale=True, target_size=(28, 28))
        img = img_to_array(img)
        img = img.reshape(1, 28, 28, 1)
        img = img.astype('float32')
        img = img / 255.0
        digit = model.predict_classes(img)
        return str(digit[0])
    
  • Example 4 - Receiving multiple parameters:

    def predict(a, b, c):
        prediction = model.predict(a, b)
        result = prediction[0] - c
        return str(result)
    

# Preprocess Function

All endpoints can optionally include a preprocessing function. Similarly to how the prediction function works, all inputs will be received as an input for the preprocess function, whereupon they will be processed by the function. Afterwards, the output of the preprocess function will be used as the input for the predict function.

This makes it easy to apply any processing on the inputs that may be required before the prediction can be made.

The same requirements that apply to a prediction function also apply to the preprocessing function. However, you must verify that the output of your preprocessing function is compatible as the input of the prediction function.

# Web Services

cnvrg can help you set up a real time web service. cnvrg will take your code and model and encase it in a lightweight REST API service, enabling it to be queried easily through the web.

All that is required is for you to provide a compatible predict function and cnvrg will do the rest. Additionally, you can optionally use a preprocessing function, that will process each input before the prediction is made.

cnvrg will set up all the network requirements and deploy the service to Kubernetes, allowing you to leverage auto-scaling and ensure you can accommodate all incoming traffic. It will add an authentication token to secure access. Furthermore, it will add an entire suite of monitoring tools, simplifying the process of managing your web service and its performance.

# Publishing a new web service endpoint

You can create a new web service endpoint in cnvrg in two ways:

  1. Publishing a new model from the Serving tab.
  2. Using a Deploy Task in a flow.

    # Query a web service

    There are 2 main methods that can be used to query a cnvrg service:

    1. Using API calls
    2. Using the Python SDK

      # Batch Predictions

      cnvrg can help you set up all the required infrastructure to easily query your model with batch predictions. cnvrg will take your code and model and encase it in a lightweight REST API service, enabling it to be queried easily through the web. However, when not in use, the service will be scaled down to 0 pods - ensuring you are not using compute resources while not making predictions.

      You will also be able to track each batch prediction that is made through the service. A batch predict endpoint comes with the same features for monitoring and tracking that web services and kafka streams have.

      All that is required is for you to provide a compatible predict function. Additionally, you can optionally use a preprocessing function, that will process each input before the prediction is made.

      # Predictions Tab

      cnvrg will track all the batch predictions that are made to the endpoint. You can access them from the Predictions tab.

      There you can see all the batch predictions that ran using the specific endpoint. You can click on the name to go to the experiment page for the specific prediction. Clicking the Compare All button will take you to an experiment comparison of all of the batch prediction experiments.

      # Publishing a new batch predict endpoint

      You can create a new batch prediction endpoint in cnvrg in two ways:

      1. Publishing a new model from the Serving tab.
      2. Using a Deploy Task in a flow.

        # Make a batch prediction

        You can conduct a batch prediction using the Batch Predict AI Library in a flow or with the SDK.

        When the library is run, the chosen batch predict endpoint will be scaled up. A batch prediction will then be made using the CSV input file. The output predictions will be saved in a new output CSV file and then the output CSV will sync with the chosen dataset.

        The AI Library has four parameters:

        • endpoint_id: The name of the batch predict endpoint you are using. The endpoint must already be deployed.
        • input_file: The path of a CSV file with the data for the batch prediction.
        • output_file: The path of the CSV output file. It must not start with a /. For example, output/predictions.csv.
        • dataset: The name of an existing dataset that the output CSV will be uploaded to.

          # Kafka Streams

          cnvrg supports integrating model predictions with existing Kafka streams. This allows you to ingest Kafka streams and preform realtime predictions with complex model monitoring features.

          Kafka is an open source library that allows for low-latency streaming of data. This is done with 0 batching. With these key benefits in mind, streaming can be useful for moving and processing data as soon as it is ingested, avoiding complex triggering and scheduling as data comes in. With the integration to cnvrg, you can preform model predictions on a Kafka stream - unlocking the ability to predict in realtime on all incoming data. Both Kafka and cnvrg leverage clusters and scaling technology, ensuring that your service will be capable of dealing with any incoming demand.

          You will need to provide:

          • a compatible predict function. The input will be in the format of bytes literal.
          • A running Kafka service including:
            • a Kafka broker
            • Input topics
            • an Output topic

          Additionally, you can optionally use a preprocessing function, that will process each input before the prediction is made.

          cnvrg will stream the data from the input topics to the model via the predict function and stream the output back to the output topic.

          Using Kafka streams with cnvrg allows you to make realtime predictions while also leveraging all the other cnvrg model tracking features (logs, a/b testing, canary deployment, integration with flows, continual learning etc).

          # Streaming behavior

          When you start a new Kafka stream endpoint, the model will make predictions on messages at the latest point on the input stream. That means, that predictions will only be made on new entries to the input stream received once the endpoint has been deployed.

          When rolling over a new model into production for an existing endpoint, the old model will be used, up until the new model finishes deploying. Then, the new model will continue making predictions from the same point on the input stream that the previous model finished at. This ensures zero downtime and complete data continuity.

          # Autoscaling with Kafka

          As with web service endpoints, cnvrg supports autoscaling for Kafka streams. As Kafka streams are backed by Kubernetes, cnvrg will automatically replicate the service to accommodate for incoming demand, allowing you to have a powerful and stable service.

          You can control the lower and upper bounds for the autoscaling feature by setting the Number of Pods to indicate the minimum amount of pods for the stream and the maximum. The maximum number of pods is bounded by the number of partitions of the input topics.

          # Publishing a new Kafka stream

          You can create a new Kafka stream in cnvrg in two ways:

          1. Publishing a new model from the Serving tab.
          2. Using a Deploy Task in a flow.

            # Charts

            To make tracking your service as easy as possible, cnvrg supports automatic and custom tracking on all endpoints. These can be found in the Charts tab for your service.

            cnvrg will automatically create live charts for the Requests per Minute (traffic) and the Endpoint Latency (response time).

            You can create new line graphs using the Python SDK. Use the log_metric() method to track new metrics like accuracy and drift.

            # Logs and Endpoint Data

            You can find all the logs for each prediction in the Logs tab.

            Every prediction, input and output of the Serving endpoint is recorded and logged. All data is tracked and you can quickly see how your models work in production environment. You can also Export the data as a CSV and analyze it.

            # Feedback Loops

            With cnvrg feedback loops you can export endpoint data to a cnvrg dataset. Queries, predictions and endpoint logs are important pieces of data. Training on real predictions can help improve model performance, which is wh feedback loops can be incredibly useful.

            A feedback loop will export the endpoint data (queries, predictions, metadata etc.) either immediately, or according to the schedule that you set.

            # Create a feedback loop

            To create a feedback loop:

            1. Go to the endpoint of your choice.
            2. Go to the Logs tab.
            3. Click Configure Feedback Loop. A new form will appear.
            4. Click the Dataset selector to choose the dataset to sync the data to.
            5. For Scheduling, Choose wether to sync the data Now or Recurring.
            6. (If you choose recurring) Choose the schedule information by setting:
              • Save every: What frequency to run the feedback loop.
              • On the: The details of when exactly the repeated feedback loop should occur.
            7. Click Save.

            # End a feedback loop

            To end a recurring feedback loop:

            1. Go to the endpoint of your choice.
            2. Go to the Logs tab.
            3. Click Stop next to Configure Feedback Loop.

            # Monitoring Systems

            cnvrg's endpoints are backed by Kubernetes and cnvrg automatically installs advanced monitoring tools for production-grade readiness. cnvrg will install Grafana for dashboarding and Kibana for log visualizations, and more.

            # Grafana

            Grafana is platform for analytics and monitoring. Grafana is used for monitoring a Kubernetes cluster.

            You can see the Grafana dashboard directly from UI by going to the running endpoint and selecting the Grafana tab.

            cnvrg preconfigures a dashboard for monitoring the health of your pods including:

            • Memory usage: The memory usage of pod
            • CPU usage: The CPU usage of pod
            • Network I/O: The input and output of traffic of pod

            # Kibana

            Kibana allows you to visualize the Elasticsearch data. Kibana tool show all the prints in code, and visualizes them by turning them into graphs.

            You can access Kibana in the Kibana tab in your endpoint.

            # Update the Model

            For endpoints that are in the Running state, you can update models easily. Go to the Config tab and click Update Model. Then you will be prompted with the new model form, where you can specify file, function and/or a different commit version.

            When clicking Update, cnvrg will gradually roll-out updates to the Endpoint using Canary Deployment mechanism (see below).

            You can also update the model using a flow

            # Canary Release

            Canary release is a technique to reduce the risk of introducing a new software version in production. This allows you to slowly roll out the change to a small percentage of the traffic, before rolling it out to the entire infrastructure and making it available to everybody. This allows you to check your model during its gradual roll out, and if any issues are revealed, you can undo the change before it reaches too many users.

            When rolling out a new model to an existing service, you can choose a Canary Rollout. This will indicate how much of incoming traffic is served by the new model. You can subsequently increase that ratio as desired.

            You also have the ability to rollback the model if it is not fulfilling your needs. In this case, the previous model will be served to all traffic.

            Using flows, you can create a fully automated canary release with gradual rollout according to custom validation and the ability to rollback automatically if needed.

            # Rollout a new model with Canary Release

            You can roll out a new model to a chosen ratio of incoming traffic in two ways:

            1. On the Endpoint's Config tab.
            2. Using a Deploy Task in a flow.

              # Rollback a model

              You can rollback a new model using the rollback AI Library with the SDK or within a flow.

              When the library is run, the chosen endpoint will have its latest model rolled back. The previous model will then be used.

              The AI Library has one parameter:

              • endpoint_id: The name of the endpoint you are rolling back.

                # Continual Learning (Triggers)

                Trigger With cnvrg you can ensure your endpoint is performing accurately and not experiencing decay by leveraging triggers within the platform. A trigger allows you to run an action based on the metrics being tracked in your endpoint. These actions allow you to simply and easily manage the endpoint and automate the upkeep of your service. You can send emails to your team or even run an end-to-end flow to retrain your model and update your endpoint automatically, with zero downtime!

                The triggers are based on the metrics you log within your endpoint using the cnvrg SDK. To do so, first import the package: from cnvrg import Endpoint, then initialize an endpoint object: e = Endpoint() and finally log a metric: e.log_metric("key",value).
                Now that you are logging a metric, you can use that metric as the tag inside your new trigger.

                TIP

                To see more info on e.log_metric() click here.

                To create a new trigger, click the Continual Learning tab in your endpoint. Click New Alert then fill in the necessary details in the panel that appears.

                # Info:

                • Title: The name for the trigger.
                • Severity: An indication of the importance of this alert (Info, Warning or Critical).

                # If Condition:

                • Tag: The cnvrg SDK metric used for the trigger. (Only tags with numeric values are currently supported.)
                • Comparision Type: the type of comparison used for comparing with your set value (greater than or less than).
                • Value: The value you are comparing against.
                • Run Condition Every: How often to poll your endpoint to test for the trigger.
                • Minimum events: Number of times the condition needs to be fulfilled before the alert is triggered.

                # Action:

                • Trigger: What type of action will occur when the alert is triggered
                • Email (if applicable): The email address/es that will receive an email when the alert is triggered.
                • Flow (if applicable): The flow that will run when the alert is triggered.

                Finally, click Add and you will have successfully created a new trigger! Whenever the criteria you set in the trigger are fulfilled, the action you have set will automatically run.

                NOTE

                Make sure the tag you choose for the trigger matches the metric you are logging in the endpoint exactly.

                # Accepting files as an Input

                Web services and batch prediction endpoints in cnvrg can be set to accept files. Enable this functionality when launching a service or batch predict by enabling Function accepts file as input.

                To send a file to the endpoint, you must encode it in Base64 and then send the encoded string as the input to the model.

                cnvrg will handle the decoding of the file for you and pass the decoded file to your predict function. That means your predict function can simply expect a file as the input, and no further decoding is needed.

                # Flask Config

                Flask is a lightweight web framework that is used by cnvrg to create an API for Batch and Web Service endpoints.

                You can alter the configuration for Flask in your endpoint by adding KEY-VALUE pairs in the Flask Config section of the endpoint form.

                Each KEY-VALUE pair will be added to the Flask configuration file as follows:

                app.config['KEY'] = 'VALUE"
                

                These KEY-VALUE pairs will then be exported as environment variables that your deployment can access.

                Information on Flask configuration and values can be found in the Flask documentation.

                For example, to set the FLASK_ENV as development, fill in the following:

                • KEY: FLASK_ENV
                • VALUE: development The value will be exported in the config file and set in Flask.

                # Gunicorn Config

                Gunicorn is a simple and scalable framework that is used by cnvrg to run the server for Batch and Web Service endpoints.

                You can change the Gunicorn settings in your endpoint by adding KEY-VALUE pairs in the Gunicorn Config section of the endpoint form.

                There are two requirements:

                • You cannot alter the bind setting.
                • You can only change settings that work with a KEY-VALUE pair and not settings that are just a flag or KEY. For example, you cannot use reload as that does not have a key. Otherwise, you can use any of the config settings from the Gunicorn documentation.

                There are two default Gunicorn config settings that cnvrg sets:

                • workers - This will be set according to the cpu of the compute template you select.
                • timeout - This defaults to 1800. These default values can be overridden by submitting alternative values in the Gunicorn Config section.

                For example, to change the amount of threads on each worker for handling requests to 4, fill in the following:

                • KEY: threads
                • VALUE: 4 When the service is initialized, Gunicorn will be run with --threads 4.

                # Updating Endpoints with Flows

                You can roll over a endpoint to a new model within a flow. To do so, add a Deploy task to your flow and set it to update an existing endpoint of your choice:

                1. Go to the Flows tab of your project.
                2. Open an existing flow or create a new one.
                3. Click the New Task menu.
                4. Choose Deploy Task from the menu.
                5. In the Select Endpoint menu, click the name of the existing endpoint you would like to update.
                6. Click Save Changes.

                When the flow is run successfully, the endpoint will roll over to the end commit of the task that is connected to the newly created deploy task, with zero downtime.

                Learn more about flows on the flows docs.

                # Rerun Endpoints

                Reproducible code and environments are key elements of cnvrg.

                To quickly and easily rerun an endpoint using all the same settings (file, command, compute, docker image and so on), select an endpoint, click Menu and then select Rerun from the dropdown menu.

                You will be taken to the new endpoint page and all of the details will be pre-selected. You can then check and change anything before finally running this new of the endpoint.

                Last Updated: 6/16/2020, 8:54:02 AM