GuidesAPI reference
DiscordDashboard
DiscordDashboard

Create and use a custom environment

How to use the Pipeline Library to create custom Python environments to run your pipelines.

By default, when you upload a pipeline, any runs performed on that pipeline will be executed using our default environment, with a set list of Python packages. But what if you want to run a pipeline using an additional package, or a specific version of a package?

In this guide, we'll show you how to create your own custom environment, add Python packages to it and upload a pipeline which uses that environment. To illustrate the key ideas and keep things simple, we'll create a very basic pipeline which simply computes a dot-product using pandas.

📘

First login with our CLI

We will be interacting with the Pipeline API using the CLI and assume you have authenticated. For more information about how to authenticate using the CLI, see our authentication guide

NOTE: This is a walkthrough, so many of the below code snippets are mere chunks of a larger script. If you're skimming or just want to see code, then skip to the conclusion where you'll find the complete script.

The default environment

Before creating a custom environment, you can check whether the packages you need already exist in the default environment. To see the packages used in the default environment, you can visit the environments page on your dashboard. Alternatively, you can retrieve the environment using the CLI command:

pipeline environments get --default

Here you should receive a response which looks something like:

{
  "id": "environment_35a54969b7474150b87afdf155431884",
  "name": "public/mystic-default-20230126",
  "python_requirements": [
    "torch==1.13.1",
    "torchvision==0.14.1",
    "torchaudio==0.13.1",
    "transformers==4.21.2",
    ...
  ],
  "locked": true
}

where you can see all the Python packages used in the environment. Notice also the "locked": true property of the environment, which indicates that no packages can be added or removed from the environment until it is unlocked. You can also retrieve the default environment through its' ID:

pipeline environments get environment_35a54969b7474150b87afdf155431884`	

However it's a bit annoying having to know the ID of an environment to be able to query it, whereas the environment name is a lot more memorable, especially when you'll be naming them yourself. To get an environment by name, just use the -n option, followed by the name:

pipeline environments get -n public/mystic-default-20230126

The -n option can also be used for updating and deleting environments, as we'll see below.

TIP: To see CLI command usage and options, use the --help flag (or -h), e.g. pipeline environments --help

Create a custom environment

To see all your environments, you can use the list sub-command:

pipeline environments list

If you have not previously created any environments, you should get an empty response. You may be wondering why the default environment we just fetched above isn't listed here. This is because the default environment is a public environment, whereas only your own (private) environments are listed using the above command. If you want to see all our publicly available environments, use the --public flag:

pipeline environments list --public

You should, at the very least, see the default environment listed here.

Let's create an environment named "generic" by using the create sub-command:

pipeline environments create generic

The new environment should appear when you run the list command above.

We can check whether the environment has been created correctly by trying to fetch it:

pipeline environments get -n generic

you should see an empty python_requirements list, since we have not added any packages yet.

Updating environments

By using the update sub-command, we can add packages to the environment one at a time, or multiple at a time. For instance, let's add specific versions of pandas and pillow:

pipeline environments update -n generic add pandas==1.5.3 pillow==9.4.0

You can check that the packages were successfully by fetching the environment details, which should produce a response similar to:

{
  "id": "environment_6aae191cd71f4968bcbd86bbc72c95bb",
  "name": "generic",
  "python_requirements": ["pillow==9.4.0", "pandas==1.5.3"],
  "locked": false
}

We actually aren't actually going to use the pillow package in our example, it was just to show you that you can add multiple packages at a time. In fact, if you had a list of packages saved in text file, e.g. packages-to-add.txt, and wanted to add them all in one sweep, you could leverage some shell functionality by running:

cat packages-to-add.txt | xargs pipeline environments update -n generic add

Seeing as we won't be needing pillow, we can remove it:

 pipeline environments update -n generic remove pillow==9.4.0

Now that we are happy with the current state of our environment, we can lock it:

pipeline environments update -n generic lock

to ensure we don't accidentally update it in the future.

Building the pipeline

Having created a custom environment containing the pandas package, it's time to build our compute pipeline. This is basically a computational graph where input variables are fed into operations, whose outputs are generally fed into other operations and so on. However, our compute pipeline will be a very simple one, which handles computing the dot-product of 2 pandas series:

PIPELINE_NAME = "pandas-dot-product"


@pipeline_function
def compute_dot_product(list_1: t.List[int], list_2: t.List[int]) -> int:
    series_1 = pd.Series(list_1)
    series_2 = pd.Series(list_2)
    return int(series_1.dot(series_2))


with Pipeline(PIPELINE_NAME) as pipeline:
    input_1 = Variable(type_class=list, is_input=True)
    input_2 = Variable(type_class=list, is_input=True)

    pipeline.add_variables(input_1, input_2)

    output = compute_dot_product(input_1, input_2)
    pipeline.output(output)

output_pipeline = Pipeline.get_pipeline(PIPELINE_NAME)

There are 2 main parts in this snippet: a decorated function and a context manager for building the compute-pipeline. The decorated function, compute_dot_product takes in 2 lists of integers, converts them into pandas series and returns the dot-product of both series. The decorator ensures that the runtime values of the input variables will be passed into the function when the pipeline is executed, rather than the Variable-type objects themselves. In general, any function called in the compute-pipeline whose parameters are evaluated at runtime, should bepipeline_function decorated.

In the context manager, we configure the pipeline to accept 2 input list variables, call the compute_dot_product at runtime with the provided input list values and output the result of this operation.

🚧

Pandas data types cannot be JSON-serialised

Annoyingly, pandas data types aren't JSON-serialisable, which means we haven't included them raw in our inputs or output, instead casting them into list or int.

Using the custom environment

Having configured our custom environment and pipeline, making remote inference calls using our custom environment is easily achieved by specifying which environment we want to use on upload:

    api = PipelineCloud()
    uploaded_pipeline: PipelineGetDetailed = api.upload_pipeline(
        output_pipeline, environment="YOUR_ENVIRONMENT_ID"
    )

Now, any run performed using uploaded_pipeline will be executed using the custom environment:

    run: RunGetDetailed = api.run_pipeline(uploaded_pipeline.id, [[1, 2, 3], [1, 2, 3]])
    print(f"RESULT: {run.result_preview}")

If all has gone well, you should see the result 14.

Conclusion

We've seen how to create a custom Python runtime environment using the Pipeline CLI and perform basic updates, such as adding/removing packages and locking the environment. We also built out a basic pipeline which required a package not included in the default environment, learned how to connect the pipeline to our custom environment and make remote calls on the pipeline.

Complete code

pipeline environments create generic
pipeline environments update -n generic add pandas==1.5.3
pipeline environments update -n generic lock

import typing as t

import pandas as pd
from pipeline import Pipeline, PipelineCloud, Variable, pipeline_function
from pipeline.schemas.pipeline import PipelineGetDetailed
from pipeline.schemas.run import RunGetDetailed

PIPELINE_NAME = "pandas-dot-product"


api = PipelineCloud()


@pipeline_function
def compute_dot_product(list_1: t.List[int], list_2: t.List[int]) -> int:
    series_1 = pd.Series(list_1)
    series_2 = pd.Series(list_2)
    return int(series_1.dot(series_2))


with Pipeline(PIPELINE_NAME) as pipeline:
    input_1 = Variable(type_class=list, is_input=True)
    input_2 = Variable(type_class=list, is_input=True)

    pipeline.add_variables(input_1, input_2)

    output = compute_dot_product(input_1, input_2)
    pipeline.output(output)

output_pipeline = Pipeline.get_pipeline(PIPELINE_NAME)
uploaded_pipeline: PipelineGetDetailed = api.upload_pipeline(
    output_pipeline, environment="YOUR_ENVIRONMENT_ID"
)
run: RunGetDetailed = api.run_pipeline(uploaded_pipeline.id, [[1, 2, 3], [1, 2, 3]])

print(f"RESULT: {run.result_preview}")