Skip to content

torchpipe/torchpipe

Repository files navigation

Torchpipe

torchpipe is an alternative choice for Triton Inference Server, mainly featuring similar functionalities such as Shared-momory, Ensemble, and BLS mechanism.

For serving scenarios, TorchPipe is designed to support multi-instance deployment, pipeline parallelism, adaptive batching, GPU-accelerated operators, and reduced head-of-line (HOL) blocking.It acts as a bridge between lower-level acceleration libraries (e.g., TensorRT, OpenCV, CVCUDA) and RPC frameworks (e.g., Thrift). At its core, it is an engine that enables programmable scheduling.

update

  • [20260104] We switched to tvm_ffi to provide clearer C++-Python interaction.

Usage

Below are some usage examples, for more check out the examples.

Initialize and Prepare Pipeline

from torchpipe import pipe
import torch

from torchvision.models.resnet import resnet101

# create some regular pytorch model...
model = resnet101(pretrained=True).eval().cuda()

# create example model
model_path = f"./resnet101.onnx"
x = torch.ones((1, 3, 224, 224)).cuda()
torch.onnx.export(model, x, model_path, opset_version=17,
                    input_names=['input'], output_names=['output'], 
                    dynamic_axes={'input': {0: 'batch_size'},
                                'output': {0: 'batch_size'}})

thread_safe_pipe = pipe({
    "preprocessor": {
        "backend": "S[DecodeTensor,ResizeTensor,CvtColorTensor,SyncTensor]",
        # "backend": "S[DecodeMat,ResizeMat,CvtColorMat,Mat2Tensor,SyncTensor]",
        'instance_num': 2,
        'color': 'rgb',
        'resize_h': '224',
        'resize_w': '224',
        'next': 'model',
    },
    "model": {
        "backend": "SyncTensor[TensorrtTensor]",
        "model": model_path,
        "model::cache": model_path.replace(".onnx", ".trt"),
        "max": '4',
        'batching_timeout': 4,  # ms, timeout for batching
        'instance_num': 2,
        'mean': "123.675, 116.28, 103.53",
        'std': "58.395, 57.120, 57.375",  # merged into trt
    }}
)

Execute

We can execute the returned thread_safe_pipe just like the original PyTorch model, but in a thread-safe manner.

data = {'data': open('/path/to/img.jpg', 'rb').read()}
thread_safe_pipe(data) # <-- this is thread-safe
result = data['result']

Setup

Note: compiling torchpipe depends on the TensorRT C++ API. Please follow the TensorRT Installation Guide. You may also try installing torchpipe inside one of the NGC PyTorch docker containers(e.g. nvcr.io/nvidia/pytorch:25.05-py3).

Installation

To install the torchpipe Python library, call the following

Inside NGC Docker Containers

test on 25.05, 24.05, 23.05, and 22.12

git clone https://github.com/torchpipe/torchpipe.git
cd torchpipe/

img_name=nvcr.io/nvidia/pytorch:25.05-py3

docker run --rm --gpus all -it --rm --network host \
    -v $(pwd):/workspace/ --privileged \
    -w /workspace/ \
    $img_name \
    bash

# pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

cd /workspace/plugins/torchpipe && python setup.py install --cv2

For other NGC docker containers.

How does it work?

See Basic Usage.

How to add (or override) a backend

WIP

Version Migration Notes

TorchPipe (v1, this version) is a collection of deep learning computation backends built on Omniback library. Not all computation backends from TorchPipe (v0) have been ported to TorchPipe (v1) yet.