GuidesAPI reference
DiscordDashboard
DiscordDashboard

Create an deploy a Spacy pipeline

How to use the spacy_to_pipeline wrapper to create pipelines from spacy

Create pipeline using Spacy wrapper

spacy_to_pipeline is a convenient wrapper that allows you create a pipeline from any pretrained spacy tokeniser model.

Optional kwargs allow you to manipulate the output from spacy tokenisation within the pipeline as well as letting you name your pipeline.

See the function args and docstring for spacy_to_pipeline below:

from pipeline import spacy_to_pipeline

def spacy_to_pipeline(spacy_model: str, func: t.Optional[t.Callable] = None, name: str = "Spacy pipeline") -> Graph:
    """
    Create a pipeline using Spacy
        Parameters:
                spacy_model (str): tokenizer model name (trained Spacy "pipeline")
                func (Optional[Callable]): function to be called on spacy output
                name (str): Name to be given to this pipeline

        Returns:
                pipeline (Graph): Executable Pipeline Graph object
    """
		...

Let's see a complete example of it in action running locally:

from pipeline import spacy_to_pipeline

def func(doc):
    return [[token.text, token.lemma_, token.pos_] for token in doc]

spacy_pipeline = spacy_to_pipeline("en_core_web_sm", func=func, name="my-spacy-pipeline")

# run locally
input = "Apple is looking at buying U.K. startup for $1 billion"
[output] = spacy_pipeline.run(input)

print(output)

Above, we have created a pipeline from spacy model "en_core_web_sm". We name it "my spacy pipeline" and we postprocess the spacy model output (output tokens) using func().

The above pipeline is the equivalent to the following spacy code below:

import spacy

def func(doc):
    return [[token.text, token.lemma_, token.pos_] for token in doc]

input = "Apple is looking at buying U.K. startup for $1 billion"

nlp = spacy.load("en_core_web_sm")
doc = nlp(input)
[output] = [func(doc)]
# note pipelines return results in a list

print(output)

Run in PipelineCloud

Example of running the same example above in PipelineCloud. Note api.run_pipeline accepts the pipeline input in a list.

from pipeline import (
    PipelineCloud,
    spacy_to_pipeline
)

api = PipelineCloud(token="YOUR TOKEN HERE")

def func(doc):
    return [[token.text, token.lemma_, token.pos_] for token in doc]

spacy_pipeline = spacy_to_pipeline("en_core_web_sm", func=func, name="spacy-get-all")

uploaded_pipeline = api.upload_pipeline(spacy_pipeline)
print(f"Uploaded pipeline: {uploaded_pipeline.id}")

print("Run uploaded pipeline")

run_result = api.run_pipeline(
    uploaded_pipeline, ["Apple is looking at buying U.K. startup for $1 billion"]
)
try:
    result_preview = run_result.result_preview
    print("Run result:", result_preview)
except KeyError:
    print(api.download_result(run_result))