ONNX Runtime for PHP

Run machine learning models in PHP using ONNX Runtime. This library provides a complete, type-safe interface to Microsoft's ONNX Runtime through PHP's Foreign Function Interface (FFI).

What is ONNX Runtime?

ONNX Runtime is a high-performance inference engine for machine learning models. It supports models from PyTorch, TensorFlow, scikit-learn, and many other frameworks that can be converted to the ONNX (Open Neural Network Exchange) format.

This library brings that power to PHP, allowing you to:

Run pre-trained ML models for image classification, text analysis, recommendations, and more
Integrate AI capabilities into your PHP applications without external services
Work with all major ML frameworks through the universal ONNX format

About This Library

This library is a reimagined and optimized version inspired by the original onnxruntime-php by Andrew Kane. While the original library provides excellent basic functionality, this version focuses on:

FFI-First Architecture: Direct FFI buffer handling for zero-copy operations with other libraries
Comprehensive Type Support: Full support for sequences, maps, and all ONNX value types
NDArray Interoperability: Convert between OrtValue and NDArray for numerical computing workflows
Exposed API: Direct access to OrtValue objects for inputs/outputs instead of PHP arrays only

The key difference: this library exposes OrtValue objects directly, allowing you to pass data from other FFI libraries without the overhead of copying through PHP arrays. NDArray users can still interoperate via OrtValue::fromNDArray() and OrtValue::toNDArray() when they explicitly want conversion.

Requirements

PHP Requirements

PHP 8.1 or higher
FFI extension enabled

Checking FFI Availability

Most PHP installations include FFI but it may be disabled. Check your php.ini:

php -m | grep ffi

If FFI is not listed, enable it in your php.ini:

; For PHP 7.4+
extension=ffi

; Ensure FFI is not disabled
ffi.enable=true

Installation

Install via Composer:

composer require phpmlkit/onnxruntime

By default, this installs the cpu runtime for your platform.

To use a different runtime, set a runtime override in your application's composer.json:

{
  "extra": {
    "platform-packages": {
      "phpmlkit/onnxruntime": {
        "runtime": "cuda12"
      }
    }
  }
}

And then reinstall the package to fetch the correct distribution archive:

composer reinstall phpmlkit/onnxruntime

Important

Run composer require or composer reinstall on your target platform. Release artifacts include platform-specific native binaries.

Runtime Variants

Platform	Supported Runtimes
Linux x86_64	`cpu`, `cuda12`, `cuda13`
Linux ARM64	`cpu`
macOS ARM64	`cpu`
Windows x64	`cpu`, `cuda12`, `cuda13`

Note

If your configured runtime is unavailable for your platform, composer will fall back to the cpu runtime.

Tip

For detailed information about using CUDA, CoreML, and TensorRT providers, see the Execution Providers section.

Manual Library Download

If the native library is missing from your installation, download it manually:

./vendor/bin/download-onnxruntime

Download with specific options:

./vendor/bin/download-onnxruntime --runtime cuda12
./vendor/bin/download-onnxruntime --runtime cuda13
./vendor/bin/download-onnxruntime --platform windows-x64
./vendor/bin/download-onnxruntime --version 1.24.3

Supported script options:

--runtime <cpu|cuda12|cuda13>
--platform <linux-x86_64|linux-arm64|darwin-arm64|windows-x64>
--version <onnx-runtime-version>

You might need this if:

You installed a dev version (branch/tag instead of a release)
Platform-specific package download failed and composer fell back to source
You moved vendor/ directory between different platforms

Quick Start

Here's a complete example to get you running your first model:

<?php
require_once 'vendor/autoload.php';

use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\OrtValue;
use PhpMlKit\ONNXRuntime\Enums\DataType;

$session = InferenceSession::fromFile('/path/to/model.onnx');

$inputData = [1.0, 2.0, 3.0, 4.0, 5.0];
$input = OrtValue::fromArray($inputData, DataType::FLOAT);

$outputs = $session->run(['input' => $input]);

$result = $outputs['output']->toArray();
print_r($result);

Where to Get Models

Wondering where to find ONNX models for the example above? Here are your options:

1. Hugging Face Hub (Recommended)

Hugging Face Hub is the world's largest collection of machine learning models, including thousands of ONNX-compatible models ready to use. You can browse and filter specifically for ONNX models: https://huggingface.co/models?library=onnx

The easiest way to download these models directly from PHP is using the Hugging Face PHP client:

composer require codewithkyrian/huggingface

use Codewithkyrian\HuggingFace\HuggingFace;

$hf = HuggingFace::client();

$modelPath = $hf->hub()
    ->repo('onnx-community/detr-resnet-50-ONNX')
    ->download('onnx/model.onnx');

$session = InferenceSession::fromFile($modelPath);

2. ONNX Model Zoo (Deprecated)

The official ONNX Model Zoo has been deprecated as of July 2025. Most models previously available there have been migrated to Hugging Face and can be found at:

https://huggingface.co/onnxmodelzoo

3. Convert from Other Frameworks

PyTorch:

import torch

model = MyModel()
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, 'model.onnx')

TensorFlow/Keras:

# Use tf2onnx to convert
# pip install tf2onnx
# python -m tf2onnx.convert --saved-model saved_model --output model.onnx

scikit-learn:

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

initial_type = [('float_input', FloatTensorType([None, 4]))]
onnx_model = convert_sklearn(model, initial_types=initial_type)

with open('model.onnx', 'wb') as f:
    f.write(onnx_model.SerializeToString())

4. Custom Training

Train your own models using any framework (PyTorch, TensorFlow, JAX, etc.) and export to ONNX format.

Core Concepts

InferenceSession

The InferenceSession is your main interface to ONNX Runtime. It loads models and runs inference.

Creating a Session

use PhpMlKit\ONNXRuntime\InferenceSession;

// From file
$session = InferenceSession::fromFile('path/to/model.onnx');

// From bytes
$modelBytes = file_get_contents('model.onnx');
$session = InferenceSession::fromBytes($modelBytes);

Running Inference

// Basic inference with OrtValue
$input = OrtValue::fromArray([1.0, 2.0, 3.0], DataType::FLOAT);
$outputs = $session->run(['input' => $input]);
$result = $outputs['output']->toArray();

// Get specific outputs only
$outputs = $session->run(
    ['input' => $input],
    ['output1', 'output2']
);

// With run options
$runOptions = RunOptions::default();
$outputs = $session->run(['input' => $input], options: $runOptions);

Inspecting Model Metadata

// Get input information
$inputs = $session->inputs();
foreach ($inputs as $name => $meta) {
    echo "Input: $name\n";
    echo "  Shape: " . implode(', ', $meta['shape']) . "\n";
    echo "  Type: {$meta['dtype']->name}\n";
}

// Get output information  
$outputs = $session->outputs();
foreach ($outputs as $name => $meta) {
    echo "Output: $name\n";
    echo "  Shape: " . implode(', ', $meta['shape']) . "\n";
    echo "  Type: {$meta['dtype']->name}\n";
}

Session Lifecycle

Sessions automatically clean up when they go out of scope, but you can explicitly close them:

$session = InferenceSession::fromFile('model.onnx');
// ... use session ...
$session->dispose();  // Explicit cleanup

// Or let PHP handle it automatically when $session goes out of scope

Important

The ONNX environment is shared across all sessions and uses reference counting. It will be automatically cleaned up when the last session closes.

OrtValue

OrtValue is the universal container for all data in ONNX Runtime. It handles:

Tensors: Multi-dimensional arrays of numbers or strings
Sequences: Ordered collections of values
Maps: Key-value pairs
Optional: Optional type wrappers

Creating Tensors

use PhpMlKit\ONNXRuntime\OrtValue;
use PhpMlKit\ONNXRuntime\Enums\DataType;

// 1D tensor
$tensor1D = OrtValue::fromArray([1.0, 2.0, 3.0], DataType::FLOAT);

// 2D tensor (matrix)
$tensor2D = OrtValue::fromArray(
    [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], 
    DataType::FLOAT
);

// 3D tensor
$tensor3D = OrtValue::fromArray(
    [[[1, 2], [3, 4]], [[5, 6], [7, 8]]], 
    DataType::INT32
);

// String tensor
$stringTensor = OrtValue::fromArray(
    ['hello', 'world', 'test'], 
    DataType::STRING
);

// With explicit shape
$data = [1, 2, 3, 4, 5, 6];
$tensor = OrtValue::fromArray($data, DataType::INT32, [2, 3]);

Converting Back to PHP Arrays

// Get data back as PHP array
$result = $tensor->toArray();

// Get tensor information
$shape = $tensor->shape();        // [2, 3]
$type = $tensor->dataType();      // DataType::FLOAT
$count = $tensor->elementCount(); // 6
$bytes = $tensor->sizeInBytes();  // 24 (6 elements × 4 bytes)

SessionOptions

Configure how the session runs with SessionOptions:

use PhpMlKit\ONNXRuntime\SessionOptions;
use PhpMlKit\ONNXRuntime\Enums\GraphOptimizationLevel;
use PhpMlKit\ONNXRuntime\Enums\ExecutionMode;

// Method 1: Create with specific options
$options = new SessionOptions(
    graphOptimizationLevel: GraphOptimizationLevel::ENABLE_ALL,
    executionMode: ExecutionMode::PARALLEL,
    interOpNumThreads: 4,
    intraOpNumThreads: 4
);

// Method 2: Use fluent builder
$options = SessionOptions::default()
    ->withGraphOptimizationLevel(GraphOptimizationLevel::ENABLE_ALL)
    ->withExecutionMode(ExecutionMode::PARALLEL)
    ->withInterOpThreads(4)
    ->withIntraOpThreads(4);

// Create session with options
$session = InferenceSession::fromFile('model.onnx', $options);

Presets

// CPU-optimized preset
$options = SessionOptions::cpuOptimized();

// GPU parallel preset
$options = SessionOptions::gpuParallel();

// Debug preset (verbose logging)
$options = SessionOptions::debug();

RunOptions

Configure individual inference runs:

use PhpMlKit\ONNXRuntime\RunOptions;
use PhpMlKit\ONNXRuntime\Enums\LoggingLevel;

$runOptions = new RunOptions(
    logVerbosityLevel: LoggingLevel::VERBOSE,
    runTag: 'inference_batch_123'
);

// Or use presets
$runOptions = RunOptions::debug();
$runOptions = RunOptions::withTag('my_batch');

// Run with options
$outputs = $session->run($inputs, options: $runOptions);

Working with Data

Inspecting Model Metadata

Access model-level metadata to understand the model's origin, version, and custom properties:

$metadata = $session->metadata();

echo $metadata->getProducerName();      // e.g., 'pytorch'
echo $metadata->getGraphName();         // e.g., 'torch-jit-export'
echo $metadata->getDomain();            // e.g., '' or 'com.example'
echo $metadata->getDescription();       // Model description
echo $metadata->getGraphDescription();  // Graph-level description
echo $metadata->getVersion();           // Version number (int)

// Custom metadata key-value pairs
$custom = $metadata->getCustomMetadataMap();
foreach ($custom as $key => $value) {
    echo "$key: $value\n";
}

Model Metadata Properties:

Property	Type	Description
`producerName`	string	Framework/tool that created the model
`graphName`	string	Name of the computation graph
`domain`	string	Model domain (namespace)
`description`	string	Human-readable model description
`graphDescription`	string	Graph-level description
`version`	int	Model version number
`customMetadataMap`	array	Key-value pairs of custom metadata

Inspecting Node Metadata (Inputs & Outputs)

Query input and output node information to understand data requirements:

use PhpMlKit\ONNXRuntime\Metadata\TensorMetadata;
use PhpMlKit\ONNXRuntime\Metadata\SequenceMetadata;
use PhpMlKit\ONNXRuntime\Metadata\MapMetadata;

$inputs = $session->inputs();
foreach ($inputs as $name => $metadata) {
    echo "Input: $name\n";
    echo "  Type: " . $metadata->getType()->name . "\n";
    
    if ($metadata instanceof TensorMetadata) {
        echo "  Shape: " . json_encode($metadata->getShape()) . "\n";
        echo "  Data Type: " . $metadata->getDataType()->name . "\n";
        echo "  Symbolic Shape: " . json_encode($metadata->getSymbolicShape()) . "\n";
    }
}

$outputs = $session->outputs();

// Or get just names
$inputNames = $session->inputNames();
$outputNames = $session->outputNames();

Node Metadata Types:

Type	Class	Key Properties
Tensor	`TensorMetadata`	`dataType`, `shape`, `symbolicShape`
Sequence	`SequenceMetadata`	`elementMetadata`
Map	`MapMetadata`	`keyType`, `valueMetadata`

Tensors

Tensors are the primary data structure in machine learning. This library supports:

Numeric tensors: FLOAT, DOUBLE, INT8/16/32/64, UINT8/16/32/64
String tensors: Variable-length strings
Boolean tensors: true/false values
Multi-dimensional: 1D, 2D, 3D, and higher dimensions

Shape Handling

// Shape is automatically inferred from nested arrays
$tensor = OrtValue::fromArray([[1, 2, 3], [4, 5, 6]], DataType::INT32);
echo $tensor->shape();  // [2, 3]

// Or explicitly specified
$tensor = OrtValue::fromArray([1, 2, 3, 4, 5, 6], DataType::INT32, [2, 3]);

Dynamic Shapes

Some models accept dynamic shapes (indicated by -1 in shape):

// Model accepts variable-length input
$meta = $session->inputs()['input'];
echo $meta->getShape();  // Might be [-1] or [-1, 3, 224, 224]

// You can provide any size
$input = OrtValue::fromArray([1, 2, 3], DataType::FLOAT);  // Works
$input = OrtValue::fromArray([1, 2, 3, 4, 5], DataType::FLOAT);  // Also works

Sequences

Sequences are ordered collections of values. Supported element types:

STRING, INT64, FLOAT, DOUBLE
Actually all tensor types work (though docs list only those four)

// Create a sequence of tensors
$tensor1 = OrtValue::fromArray([1, 2], DataType::INT32);
$tensor2 = OrtValue::fromArray([3, 4], DataType::INT32);
$tensor3 = OrtValue::fromArray([5, 6], DataType::INT32);

$sequence = OrtValue::sequence([$tensor1, $tensor2, $tensor3]);

// Get sequence length
$length = $sequence->sequenceLength();  // 3

// Access elements
$first = $sequence->getSequenceElement(0);
echo $first->toArray();  // [1, 2]

// Iterate over all elements
$sequence->foreachSequenceElement(function($value, $index) {
    echo "Element $index: " . json_encode($value->toArray()) . "\n";
});

// Convert to PHP array
$result = $sequence->toArray();  // [[1, 2], [3, 4], [5, 6]]

Maps

Maps are key-value pairs with specific type constraints.

Supported Map Types

Key Types: INT64, STRING
Value Types: INT64, FLOAT, DOUBLE, STRING

use PhpMlKit\ONNXRuntime\OrtValue;
use PhpMlKit\ONNXRuntime\Enums\DataType;

// INT64 keys with FLOAT values
$keys = OrtValue::fromArray([1, 2, 3], DataType::INT64);
$values = OrtValue::fromArray([10.0, 20.0, 30.0], DataType::FLOAT);
$map = OrtValue::map($keys, $values);

$result = $map->toArray();  // [1 => 10.0, 2 => 20.0, 3 => 30.0]

// STRING keys with STRING values
$keys = OrtValue::fromArray(['a', 'b'], DataType::STRING);
$values = OrtValue::fromArray(['x', 'y'], DataType::STRING);
$map = OrtValue::map($keys, $values);

$result = $map->toArray();  // ['a' => 'x', 'b' => 'y']

// Access keys and values separately
$mapKeys = $map->mapKeys();
$mapValues = $map->mapValues();

Important

Other type combinations will throw a FailException.

Maps in Sequences

ONNX Runtime also supports sequences of maps (specifically for FLOAT values):

// Create maps
$keys1 = OrtValue::fromArray([1, 2], DataType::INT64);
$values1 = OrtValue::fromArray([10.0, 20.0], DataType::FLOAT);
$map1 = OrtValue::map($keys1, $values1);

$keys2 = OrtValue::fromArray([3, 4], DataType::INT64);
$values2 = OrtValue::fromArray([30.0, 40.0], DataType::FLOAT);
$map2 = OrtValue::map($keys2, $values2);

// Create sequence of maps
$sequence = OrtValue::sequence([$map1, $map2]);
$result = $sequence->toArray();  // [[1 => 10.0, 2 => 20.0], [3 => 30.0, 4 => 40.0]]

Note

Sequences of maps only work with INT64/STRING keys and FLOAT values. Other combinations will fail.

Type Support

All ONNX tensor element types are supported:

Type	PHP Equivalent	Notes
`FLOAT`	float	32-bit floating point
`DOUBLE`	float	64-bit floating point
`INT8`	int	8-bit signed integer
`INT16`	int	16-bit signed integer
`INT32`	int	32-bit signed integer
`INT64`	int	64-bit signed integer
`UINT8`	int	8-bit unsigned integer
`UINT16`	int	16-bit unsigned integer
`UINT32`	int	32-bit unsigned integer
`UINT64`	int	64-bit unsigned integer
`BOOL`	bool	Boolean values
`STRING`	string	Variable-length strings

Note

FLOAT16, BFLOAT16, COMPLEX64, and COMPLEX128 are defined but may have limited support.

Memory Management

All major resources in this library implement the Disposable interface, providing automatic cleanup when objects go out of scope while still allowing explicit cleanup when you need to free resources early.

Automatic Cleanup

When a disposable resource goes out of scope or is no longer referenced, its destructor automatically releases the underlying native resources:

function processModel() {
    $session = InferenceSession::fromFile('model.onnx');
    $input = OrtValue::fromArray([1, 2, 3], DataType::FLOAT);
    $outputs = $session->run(['input' => $input]);
    
    return $outputs['result']->toArray();
    // $session, $input, $outputs all cleaned up automatically
}

This RAII-style pattern means you rarely need to think about cleanup - resources are managed naturally through PHP's object lifecycle.

Explicit Cleanup

When you need deterministic resource management or want to free memory before a variable goes out of scope, call dispose():

// Sessions
$session = InferenceSession::fromFile('model.onnx');
// ... use session ...
$session->dispose();  // Release session resources immediately

// OrtValues
$tensor = OrtValue::fromArray([1, 2, 3], DataType::FLOAT);
// ... use tensor ...
$tensor->dispose();  // Release tensor resources immediately

// Safe to call multiple times
$tensor->dispose();  // No error, already disposed

This is useful for:

Long-running scripts where you want to release memory as soon as possible
Processing large batches of data iteratively
Ensuring resources are freed at specific points in your code

Internal Buffer Management

From Array (Internal Buffer)

When you create an OrtValue from a PHP array, the library:

Allocates an FFI buffer
Copies data from PHP array to FFI buffer
Creates ONNX tensor referencing the buffer
Keeps buffer reference to prevent garbage collection
Automatically releases both on disposal

$tensor = OrtValue::fromArray([1, 2, 3], DataType::FLOAT);
// Buffer created internally, managed automatically
// Just let $tensor go out of scope or call dispose()

From Buffer (External Buffer)

When you create an OrtValue from an existing FFI buffer (zero-copy):

ONNX tensor references your existing buffer
You are responsible for ensuring buffer outlives the tensor
You must free the buffer if needed
dispose() only releases the tensor handle, not your buffer

$ffi = FFI::cdef();
$buffer = $ffi->new('float[100]');  // Your buffer

$tensor = OrtValue::fromBuffer($buffer, 400, DataType::FLOAT, [100]);
// ... use tensor ...

$tensor->dispose();  // Releases tensor only
// You must manage $buffer lifetime (PHP frees it in this case though!)

Use Case: External buffers are useful for:

Working with C libraries
Pre-allocated memory pools

Session Environment

Sessions share a global ONNX environment with reference counting:

$session1 = InferenceSession::fromFile('model1.onnx');
$session2 = InferenceSession::fromFile('model2.onnx');
// Both share the same environment

$session1->dispose();  // Environment kept alive by session2
$session2->dispose();  // Environment released (no more sessions)

This is handled automatically - you don't need to manage it.

Execution Providers

Execution providers are the computational backends that ONNX Runtime uses to run your models. By default, the CPU execution provider is used, which works on all platforms. For better performance, you can use hardware-accelerated providers like CUDA (NVIDIA GPUs), CoreML (Apple Neural Engine), or TensorRT (optimized NVIDIA inference).

Available Providers

Provider	Description	Runtime Required	Platforms
CPUExecutionProvider	Default CPU backend	`cpu` (included by default)	All platforms
CUDAExecutionProvider	NVIDIA GPU acceleration	`cuda12` or `cuda13`	Linux x86_64, Windows x64
CoreMLExecutionProvider	Apple Neural Engine/GPU	`cpu` (included on macOS)	macOS ARM64
TensorRTExecutionProvider	Optimized NVIDIA inference	`cuda12` or `cuda13`	Linux x86_64, Windows x64

Using the CPU Provider

The CPU provider is the default and works out of the box on all platforms. It uses optimized CPU instructions (AVX, AVX2, AVX-512) when available.

use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\SessionOptions;

// CPU is the default - no special configuration needed
$session = InferenceSession::fromFile('model.onnx');

// Or explicitly configure for CPU
$options = SessionOptions::default();
$session = InferenceSession::fromFile('model.onnx', $options);

Using CoreML (macOS)

CoreML provider is automatically included in the CPU runtime on macOS ARM64. It accelerates inference using the Apple Neural Engine (ANE) and GPU.

use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\SessionOptions;
use PhpMlKit\ONNXRuntime\Providers\CoreMLProviderOptions;

// Use CoreML with default settings
$options = SessionOptions::default()
    ->withCoreMLProvider();

$session = InferenceSession::fromFile('model.onnx', $options);

CoreML Configuration Options:

use PhpMlKit\ONNXRuntime\Enums\CoreMLComputeUnits;
use PhpMlKit\ONNXRuntime\Enums\CoreMLModelFormat;

// Configure CoreML for specific hardware
$options = SessionOptions::default()
    ->withCoreMLProvider(
        CoreMLProviderOptions::default()
            ->withComputeUnits(CoreMLComputeUnits::ALL)  // Use ANE + GPU + CPU
            ->withModelFormat(CoreMLModelFormat::ML_PROGRAM)
            ->withStaticShapes(true)  // Optimize for fixed-size inputs
    );

$session = InferenceSession::fromFile('model.onnx', $options);

Compute Units:

ALL - Use all available compute units (ANE, GPU, CPU)
CPU_AND_NEURAL_ENGINE - CPU and Apple Neural Engine only
CPU_AND_GPU - CPU and GPU only
CPU_ONLY - CPU only

Note

CoreML works best with models that use standard operations. Some complex operations may fall back to CPU execution.

Using CUDA (NVIDIA GPUs)

CUDA provider requires installing the CUDA runtime variant. First, switch your runtime:

# Update composer.json to use CUDA 12 or CUDA 13
composer reinstall phpmlkit/onnxruntime

Then configure your session:

use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\SessionOptions;
use PhpMlKit\ONNXRuntime\Providers\CudaProviderOptions;

// Use CUDA with default settings (device 0)
$options = SessionOptions::default()
    ->withCudaProvider();

$session = InferenceSession::fromFile('model.onnx', $options);

// Configure CUDA with specific options
$options = SessionOptions::default()
    ->withCudaProvider(
        CudaProviderOptions::default()
            ->withDeviceId(0)                    // GPU device ID
            ->withMemoryLimit(2147483648)       // 2GB memory limit
            ->withArenaExtendStrategy(ArenaExtendStrategy::NEXT_POWER_OF_TWO)
            ->withCudnnConvAlgoSearch(CudnnConvAlgoSearch::HEURISTIC)
    );

$session = InferenceSession::fromFile('model.onnx', $options);

CUDA Presets:

// High performance preset (may use more memory)
$options = SessionOptions::default()
    ->withCudaProvider(CudaProviderOptions::highPerformance());

// Memory-conservative preset (slower but uses less GPU memory)
$options = SessionOptions::default()
    ->withCudaProvider(CudaProviderOptions::memoryConservative());

Using TensorRT (Optimized NVIDIA Inference)

TensorRT provides highly optimized inference for NVIDIA GPUs by compiling models specifically for your hardware. It requires the CUDA runtime and builds on top of CUDA.

use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\SessionOptions;
use PhpMlKit\ONNXRuntime\Providers\TensorRTProviderOptions;

// Use TensorRT with default settings
$options = SessionOptions::default()
    ->withTensorRTProvider();

$session = InferenceSession::fromFile('model.onnx', $options);

TensorRT Configuration:

// Configure TensorRT with caching for faster subsequent loads
$options = SessionOptions::default()
    ->withTensorRTProvider(
        TensorRTProviderOptions::default()
            ->withCachePath('/path/to/trt_cache')     // Cache compiled engines
            ->withMaxWorkspaceSize(2147483648)        // 2GB workspace
            ->withFp16(true)                          // Enable FP16 precision
            ->withInt8(true)                          // Enable INT8 precision
            ->withMaxPartitionIterations(1000)
            ->withMinSubgraphSize(1)
    );

$session = InferenceSession::fromFile('model.onnx', $options);

TensorRT Presets:

// Maximum performance (FP16/INT8, aggressive optimization)
$options = SessionOptions::default()
    ->withTensorRTProvider(
        TensorRTProviderOptions::maximumPerformance()
    );

// With caching enabled for production
$options = SessionOptions::default()
    ->withTensorRTProvider(
        TensorRTProviderOptions::withCache('/app/cache/tensorrt')
    );

Important

TensorRT compiles models specifically for your GPU architecture. The first load of a model may take several minutes as TensorRT builds the optimized engine. Use caching to save compiled engines for faster subsequent loads.

Switching Runtime Variants

To use CUDA or TensorRT providers, you need the corresponding runtime:

1. Update your composer.json:

{
  "extra": {
    "platform-packages": {
      "phpmlkit/onnxruntime": {
        "runtime": "cuda12"
      }
    }
  }
}

Available runtimes: cpu, cuda12, cuda13

2. Reinstall the package:

composer reinstall phpmlkit/onnxruntime

Note

The CUDA runtime includes both CUDA and TensorRT providers. CoreML is included in the CPU runtime on macOS.

Checking Available Providers

You can check which execution providers are available at runtime:

use PhpMlKit\ONNXRuntime\FFI\Lib;

$api = Lib::api();
$providers = $api->getAvailableProviders();

print_r($providers);
// Output: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

Provider Fallback

If a configured provider fails to initialize (e.g., CUDA not available), ONNX Runtime automatically falls back to the CPU provider. You can check which provider is actually being used by profiling or checking the available providers list.

Error Handling

The library provides specific exceptions for different error conditions:

use PhpMlKit\ONNXRuntime\Exceptions\NoSuchFileException;
use PhpMlKit\ONNXRuntime\Exceptions\InvalidProtobufException;
use PhpMlKit\ONNXRuntime\Exceptions\InvalidArgumentException;
use PhpMlKit\ONNXRuntime\Exceptions\FailException;

try {
    $session = InferenceSession::fromFile('model.onnx');
    $outputs = $session->run(['input' => $data]);
} catch (NoSuchFileException $e) {
    // Model file doesn't exist
    echo "Model not found: " . $e->getMessage();
} catch (InvalidProtobufException $e) {
    // File exists but isn't a valid ONNX model
    echo "Invalid model format: " . $e->getMessage();
} catch (InvalidArgumentException $e) {
    // Wrong input name, shape mismatch, etc.
    echo "Invalid input: " . $e->getMessage();
} catch (FailException $e) {
    // General ONNX Runtime error
    echo "ONNX error: " . $e->getMessage();
}

Exceptions

All of these live under PhpMlKit\ONNXRuntime\Exceptions except the abstract base, which is PhpMlKit\ONNXRuntime\Exception.

Exception	Cause	Solution
`Exception` (abstract base)	Parent of most ONNX-specific errors	Catch this type to handle them together
`FailException`	Generic ONNX Runtime failure	Read the message; inspect model and inputs
`InvalidArgumentException`	Bad arguments (validation or ORT)	Check inputs, names, shapes, options
`NoSuchFileException`	Model path not found	Fix the file path
`InvalidProtobufException`	Invalid or corrupt ONNX model bytes	Re-export or verify the `.onnx` file
`NoModelException`	Operation needs a loaded model	Ensure the session is created correctly
`EngineErrorException`	Inference engine error	Read the message
`RuntimeException`	ONNX Runtime `RUNTIME_EXCEPTION` (extends PHP `\RuntimeException`)	Read the message
`ModelLoadedException`	Conflicts with an already-loaded model	Avoid double load / wrong API sequence
`NotImplementedException`	Feature not implemented in this ORT build or in the package	Use a supported model or API
`InvalidGraphException`	Invalid model graph	Fix or replace the model
`ExecutionProviderException`	Execution provider failed	Check provider config, drivers, GPU
`InvalidOperationException`	Wrong use of API (e.g. disposed session, wrong `OrtValue` kind)	Fix call order and resource lifetime

Advanced Usage

Working with Raw Buffers

For zero-copy operations with other FFI libraries:

$ffi = FFI::cdef();

// Create buffer
$bufferSize = 100 * 4;  // 100 floats × 4 bytes
$buffer = $ffi->new("uint8_t[{$bufferSize}]");

// Fill with data (from another library, file, etc.)
// ... fill $buffer ...

// Create tensor from buffer (zero-copy)
$tensor = OrtValue::fromBuffer(
    $buffer,
    $bufferSize,
    DataType::FLOAT,
    [100]
);

// Use tensor
$outputs = $session->run(['input' => $tensor]);

// Clean up
$tensor->dispose();  // Releases tensor only
// You must free $buffer separately if needed

NDArray Interoperability

When the phpmlkit/ndarray package is installed, you can convert between NDArray and OrtValue explicitly:

use PhpMlKit\NDArray\NDArray;
use PhpMlKit\NDArray\DType;
use PhpMlKit\ONNXRuntime\OrtValue;

// Create NDArray input
$input = NDArray::array([[1.0, 2.0], [3.0, 4.0]], DType::Float32);

// Run inference (InferenceSession accepts OrtValue inputs)
$outputs = $session->run(['input' => OrtValue::fromNDArray($input)]);

// Convert tensor output to NDArray when needed
$output = $outputs['output'];

echo $output->toNDArray();
// array(2, 2)
// [1. 2.]
// [3. 4.]

This provides seamless integration with the NDArray ecosystem for numerical computing in PHP.

Profiling

Enable profiling to analyze model performance:

use PhpMlKit\ONNXRuntime\SessionOptions;

$options = SessionOptions::default()
    ->withProfiling(true, 'my_model_profile');

$session = InferenceSession::fromFile('model.onnx', $options);

// Run inference multiple times
for ($i = 0; $i < 100; $i++) {
    $session->run($inputs);
}

$session->dispose();  // Profile saved to my_model_profile_*.json

Tip

For detailed information about execution providers (CPU, CUDA, CoreML, TensorRT), see the Execution Providers section.

FFI Direct Access

For advanced use cases, you can access the underlying FFI layer directly:

use PhpMlKit\ONNXRuntime\FFI\Lib;
use PhpMlKit\ONNXRuntime\FFI\Api;

// Get FFI instance
$ffi = Lib::get();

// Get typed API wrapper
$api = Lib::api();

// Access low-level C API functions
$memoryInfo = $api->createCpuMemoryInfo(
    AllocatorType::ARENA_ALLOCATOR,
    MemoryType::DEFAULT
);

// Don't forget to release resources
$api->releaseMemoryInfo($memoryInfo);

Warning: Direct FFI access requires knowledge of the ONNX Runtime C API. Use with caution as improper resource management can cause memory leaks or crashes.

C API Header

The C API header is located at vendor/phpmlkit/onnxruntime/include/onnxruntime.h. You can reference it for available functions and types.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

# Clone repository
git clone https://github.com/phpmlkit/onnxruntime.git
cd onnxruntime

# Install dependencies
composer install

# Generate test models (requires Python)
pip install onnx numpy
python scripts/generate_test_models.py

# Run tests
composer test

# Run tests (pretty)
composer test:pretty

# Check code style
composer cs:check

# Fix code style
composer cs:fix

# Run static analysis
composer lint

Code Style

This project follows PSR-12 coding standards. Please run the linter before submitting:

composer cs:fix

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Microsoft ONNX Runtime - The underlying inference engine
onnxruntime-php - The original PHP library that inspired this reimagined version
PHP FFI - Foreign Function Interface
Codewithkyrian Platform Package Installer - Automatic library distribution

Support

Issues: https://github.com/phpmlkit/onnxruntime/issues
Documentation: This README and inline PHPDoc
Examples: See examples/ directory

Happy inferencing! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
examples		examples
include		include
lib		lib
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.php-cs-fixer.php		.php-cs-fixer.php
LICENSE		LICENSE
ONNXRUNTIME_VERSION		ONNXRUNTIME_VERSION
README.md		README.md
composer.json		composer.json
phpunit.xml		phpunit.xml

Folders and files

Latest commit

History

Repository files navigation

ONNX Runtime for PHP

What is ONNX Runtime?

About This Library

Table of Contents

Requirements

PHP Requirements

Checking FFI Availability

Installation

Runtime Variants

Manual Library Download

Quick Start

Where to Get Models

1. Hugging Face Hub (Recommended)

2. ONNX Model Zoo (Deprecated)

3. Convert from Other Frameworks

4. Custom Training

Core Concepts

InferenceSession

Creating a Session

Running Inference

Inspecting Model Metadata

Session Lifecycle

OrtValue

Creating Tensors

Converting Back to PHP Arrays

SessionOptions

Presets

RunOptions

Working with Data

Inspecting Model Metadata

Inspecting Node Metadata (Inputs & Outputs)

Tensors

Shape Handling

Dynamic Shapes

Sequences

Maps

Supported Map Types

Maps in Sequences

Type Support

Memory Management

Automatic Cleanup

Explicit Cleanup

Internal Buffer Management

From Array (Internal Buffer)

From Buffer (External Buffer)

Session Environment

Execution Providers

Available Providers

Using the CPU Provider

Using CoreML (macOS)

Using CUDA (NVIDIA GPUs)

Using TensorRT (Optimized NVIDIA Inference)

Switching Runtime Variants

Checking Available Providers

Provider Fallback

Error Handling

Exceptions

Advanced Usage

Working with Raw Buffers

NDArray Interoperability

Profiling

FFI Direct Access

C API Header

Contributing

Development Setup

Code Style

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages