A pipeline surrounds a model with inference logic so that your user's inputs and the model's raw outputs can be processed to produce the desired output. For example, the GPT-J pipeline adds input arguments and logit sampling techniques to the underlying GPT-J model within it.
ML models typically run on GPUs because matrix multiplications (such as inference or training of a neural network) are extremely efficient on them. Nowadays, the state-of-the-art in many tasks is achieved by large models with many billions of parameters; models which don't fit on consumer graphics cards. So, users are looking to cloud infrastructure to train and serve their ML.
Pipeline Cloud is an all-in-one platform for building and deploying models without the hassle or trouble of a traditional cloud provider. There are no hourly GPU rental fees as we charge on a pay-as-you-go basis. Infrastructure automatically scales with your usage and our software handles the task success logic entirely for you. Simply submit a request to our REST API and you'll receive the output of the pipeline when the run completes.
A pipeline is a generic sequential workload for execution either on CPU or GPU. From simple addition tasks to the training of a multi-GPU neural network, a pipeline takes in inputs, performs tasks on them, and then returns outputs. They are flexible, diverse, and highly composable.
The smaller units of a pipeline might be a model or a function, helping you break down workloads into their most maintainable and reusable parts. For example, Pipeline B can borrow functions from Pipeline A, and Pipeline C can use the model from Pipeline D. Currently, pipelines are assembled in our Python library, either for local runs or for uploading and running in the cloud.
A 'deployed' pipeline is a pipeline which has been linked to your account and your project. As such, any pipeline uploaded by a user is by default 'deployed' because it has already been linked to their account. However, unlike uploaded pipelines, a Model Hub pipeline must be connected to a user before it can be run. This is why, when you select a pipeline from within the Model Hub, you'll be prompted to 'Deploy' it in a project.
Pipeline Cloud follows a shared infrastructure philosophy, where requests (runs) are not assigned to specific machines, but distributed to the next fastest machine. This gives our users great performance at a lower price, perfect for growing companies who don't want to rent GPUs by the hour. You will only pay for the milliseconds of compute that your run requires to complete.
For example, if you were using GPT-J and submitted 100 requests per day, each with an average compute time of 4500ms, then your GPU usage would cost $7.43 per month. 100 (requests) 30 (days in month) 4500 (milliseconds per request) * 0.00000055 (cost per millisecond) = 7.425 (rounded to nearest cent)
There is also a $12.99 flat monthly subscription fee which grants access to the Pipeline Cloud infrastructure.
There are no limits on the number of requests you can send to the API, nor on the size / computation time of pipelines. The only constraint is physical: can your pipeline fit on a GPU/CPU or is it too large? In the case of large models, please get in touch so we can enable multi-gpu support on your account.
Our task distribution and execution platform has no concept of simultaneous requests, so they'll be treated as two individual tasks like normal. We have our own on-premises server farm which handles the majority of our traffic, and there is a significant buffer so that concurrent requests can be served smoothly and neatly.
Yes. When usage spikes, or even when it climbs slowly, we automatically launch new machines internally and on external cloud providers, to ensure that API performance remains seamless. A sudden increase in traffic will be subject to the usual constraints, such as bandwidth and caching the model on GPU, but our proprietary software is designed to minimise latency in all cases. We designed Pipeline Cloud to be as worry-free for the user as possible, and our auto-scaling logic is a core component of the service.
Currently you can view public endpoints, expected request and response schemas, and methods here: <<<https://api.pipeline.ai/docs#/>>>.
We are working on adding a more user-friendly API reference to this docs site.
For sure! We can easily assemble pipelines from most popular open source models, especially the ones hosted on the Hugging Face hub. To request a new pipeline, join our Discord server and let us know!
We're happy to give you as much support as you need! Send an email to
[email protected] with your question and one of us will be in touch within 24 hours. Please include all relevant information to help us find a solution for you as quickly as possible. Alternatively you can join our Discord server and reach out to a member of the team (their names are in orange) who will be glad to help you.
Analytics and usage metrics can be found for each pipeline, in the dashboard. Traffic, including number of runs, as well as compute time statistics per run are visible in graph format over three time periods: 24 hours, 7 days, and 12 months. Data for the ten most recent runs is visible in a handy table on the right.
You can view information about individual runs too, by heading to the project where your pipeline is deployed, and selecting the
Runs view. From there you can find any run on that pipeline and its related metadata.
You can use our API directly in your scripts by sending HTTP requests, or you can use our library,
pipeline-ai. At the moment, the library is designed primarily to provide 'helper' functions which make creating and running ML pipelines in the cloud simple. In the future, the library will support end-to-end training and inference on your choice of compute platform, whether local or remote.
pipeline-ai library is served on PyPi.
!pip install -U pipeline-ai from pipeline import PipelineCloud
Yay! Our core library,
pipeline-ai is the open-source backbone of our product, providing the logic for pipeline assembly and execution. Any contributions on GitHub are very gratefully received.
Just by being active in our Discord server, you'll be helping us identify what users want most from an ML orchestration platform like ours. And if you're extremely keen to get involved, then check out our jobs page!
Updated 9 months ago