GuidesAPI reference
DiscordDashboard
DiscordDashboard

Create and deploy an ONNX pipeline

How to use the onnx_to_pipeline wrapper to create pipelines from ONNX files

Create pipeline from ONNX file

Create a pipeline from an ONNX file by passing the model filepath: ONNX_FILEPATH to the wrapper: onnx_to_pipeline.

from pipeline import onnx_to_pipeline

onnx_filepath = "{ONNX_FILEPATH}"
onnx_pipeline = onnx_to_pipeline(onnx_filepath)

onnx_pipeline is now a pipeline that can be run locally or uploaded to PipelineCloud and run from the api.

Get input and output names

In order to run an ONNX model, we need the input and output names of the model. These are defined when the ONNX model is constructed and can also be found by loading the model in onnxruntime:

import onnxruntime

ort_session = onnxruntime.InferenceSession(
    onnx_filepath,
    providers=[
        "CPUExecutionProvider",
    ],
)

# Get the list of output name strings
OUTPUT_NAMES = [output.name for output in ort_session.get_outputs()]

# Get the list of input name strings
INPUT_NAMES = [input.name for input in ort_session.get_inputs()]

Running locally

Now we have the input and output names we can try running the model locally.

onnx_pipeline.run() takes positional args of a list of output names followed by a dictionary of input name strings as keys with their corresponding model inputs as values.

Model inputs are the same as that accepted by onnxruntime and can be either a numpy array of the same data type as expected in the ONNX model or a list (which is automatically converted to the numpy array of the compatible data type).

For our example we will assume we only have 1 input and 1 output, with the model input: model_input.

result_local = onnx_pipeline.run(OUTPUT_NAMES, {INPUT_NAMES[0]: model_input}))

This takes the exact same form as when running from onnxruntime. The only difference is that with onnxruntime, you can run inference with OUTPUT_NAMES = None to avoid explicitly specifying the model's output names. For pipeline runs you should use OUTPUT_NAMES = [] for this purpose.

Running in PipelineCloud

To upload onnx_pipeline to PipelineCloud for it to be run from the api:

from pipeline import PipelineCloud

# Authenticate
api = PipelineCloud(token="{YOUR_API_KEY}")

# Upload onnx_pipeline to PipelineCloud
uploaded_pipeline = api.upload_pipeline(onnx_pipeline)

print(f"Uploaded pipeline: {uploaded_pipeline.id}")

Now we can call the API to run onnx_model on PipelineCloud.

api.run_pipeline() takes 2 positional arguments. The first argument references the uploaded pipeline and we can either use the uploaded pipeline object: uploaded_pipeline or its id string uploaded_pipeline.id. The second argument is a list of the arguments used when running locally.

# Run on pipeline
result_upload = api.run_pipeline(uploaded_pipeline.id, [OUTPUT_NAMES, {INPUT_NAMES[0]: model_input}])
# Filter out metadata
print(result_upload["result_preview"])

# if run_upload["result_preview"]

Worked example

In this example, we will be using an open-source image background removal ML model called MODNet — you can download the pretrained ONNX model as linked to on the official MODNet repo here.

The ONNX model takes a preprocessed image as input and returns an alpha matte. Compositing the alpha matte with the original image gives us the foreground result.

You can download the starting image of Dr. Mike Levin here.

Here is our complete script:

import cv2
import numpy as np
from PIL import Image
from pipeline import PipelineCloud, onnx_to_pipeline

# read image
img = cv2.imread('{IMAGE_FILEPATH}')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
im_h, im_w, im_c = img.shape

def preprocessing(im):
    # determines input resolution to MODNet's model
    ref_size = 512

    # Get resized dim for MODNet input
    def get_resize(im_h, im_w, ref_size):
        if im_w >= im_h:
            im_rh = ref_size
            im_rw = int(im_w / im_h * ref_size)
        elif im_w < im_h:
            im_rw = ref_size
            im_rh = int(im_h / im_w * ref_size)

        im_rw = im_rw - im_rw % 32
        im_rh = im_rh - im_rh % 32
        
        return im_rw, im_rh

    # unify image channels to 3
    if len(im.shape) == 2:
        im = im[:, :, None]
    if im.shape[2] == 1:
        im = np.repeat(im, 3, axis=2)
    elif im.shape[2] == 4:
        im = im[:, :, 0:3]

    # normalize values to scale it between -1 to 1
    im = (im - 127.5) / 127.5   
    # get resize dimensions for MODNet inference
    x, y = get_resize(im_h, im_w, ref_size) 

    # resize image
    im = cv2.resize(im, (x,y), interpolation = cv2.INTER_AREA)

    # prepare input shape
    im = np.transpose(im)
    im = np.swapaxes(im, 1, 2)
    im = np.expand_dims(im, axis = 0).astype('float32')

    return im 

def post_processing(result,im):
    matte = (np.squeeze(result) * 255).astype('uint8')
    # resize matte to original image dim
    matte = cv2.resize(matte, (im_w, im_h), interpolation = cv2.INTER_AREA)

    def combined_display(image, matte):
        # calculate display resolution
        w, h = image.width, image.height
        rw, rh = 800, int(h * 800 / (3 * w))

        # obtain predicted foreground
        image = np.asarray(image)
        if len(image.shape) == 2:
            image = image[:, :, None]
        if image.shape[2] == 1:
            image = np.repeat(image, 3, axis=2)
        elif image.shape[2] == 4:
            image = image[:, :, 0:3]
        matte = np.repeat(np.asarray(matte)[:, :, None], 3, axis=2) / 255
        foreground = image * matte + np.full(image.shape, 255) * (1 - matte)

        # combine image, foreground, and alpha into one line
        combined = np.concatenate((image, matte * 255, foreground), axis=1)
        combined = Image.fromarray(np.uint8(combined)).resize((rw, rh))
        return combined

    # show composite
    combined_display(Image.fromarray(im), matte).show()

im = preprocessing(img)

# authenticate with api
api = PipelineCloud(token="{YOUR_API_TOKEN}")

# create pipeline
onnx_filepath = "{ONNX_FILEPATH}"
onnx_pipeline = onnx_to_pipeline(onnx_filepath)
# upload pipeline
uploaded_pipeline = api.upload_pipeline(onnx_pipeline)
print(f"Uploaded pipeline: {uploaded_pipeline.id}")
# inference API call
result_detailed = api.run_pipeline(uploaded_pipeline.id,[["output"], {"input": im}])
# get MODNet result without metadata
result = result_detailed['result_preview'] 
# if result exceeds 2mb, we need to download the result in a separate call
if result_detailed['result_preview'] is None:
  result = api.download_result(result_detailed)
# create and show composite
post_processing(result,img)

Related Medium article: https://medium.com/@neil_wang/serverless-gpu-machine-learning-deployments-on-pipeline-ai-fe52d680ce24