Run machine learning models in PHP using ONNX Runtime. This library provides a complete, type-safe interface to Microsoft's ONNX Runtime through PHP's Foreign Function Interface (FFI).
ONNX Runtime is a high-performance inference engine for machine learning models. It supports models from PyTorch, TensorFlow, scikit-learn, and many other frameworks that can be converted to the ONNX (Open Neural Network Exchange) format.
This library brings that power to PHP, allowing you to:
- Run pre-trained ML models for image classification, text analysis, recommendations, and more
- Integrate AI capabilities into your PHP applications without external services
- Work with all major ML frameworks through the universal ONNX format
This library is a reimagined and optimized version inspired by the original onnxruntime-php by Andrew Kane. While the original library provides excellent basic functionality, this version focuses on:
- FFI-First Architecture: Direct FFI buffer handling for zero-copy operations with other libraries
- Comprehensive Type Support: Full support for sequences, maps, and all ONNX value types
- NDArray Interoperability: Convert between
OrtValueand NDArray for numerical computing workflows - Exposed API: Direct access to
OrtValueobjects for inputs/outputs instead of PHP arrays only
The key difference: this library exposes OrtValue objects directly, allowing you to pass data from other FFI libraries without the overhead of copying through PHP arrays. NDArray users can still interoperate via OrtValue::fromNDArray() and OrtValue::toNDArray() when they explicitly want conversion.
- Requirements
- Installation
- Quick Start
- Where to Get Models
- Core Concepts
- Working with Data
- Type Support
- Memory Management
- Execution Providers
- Error Handling
- Examples
- Advanced Usage
- Supported Platforms
- FFI Direct Access
- Contributing
- PHP 8.1 or higher
- FFI extension enabled
Most PHP installations include FFI but it may be disabled. Check your php.ini:
php -m | grep ffiIf FFI is not listed, enable it in your php.ini:
; For PHP 7.4+
extension=ffi
; Ensure FFI is not disabled
ffi.enable=trueInstall via Composer:
composer require phpmlkit/onnxruntimeBy default, this installs the cpu runtime for your platform.
To use a different runtime, set a runtime override in your application's composer.json:
{
"extra": {
"platform-packages": {
"phpmlkit/onnxruntime": {
"runtime": "cuda12"
}
}
}
}And then reinstall the package to fetch the correct distribution archive:
composer reinstall phpmlkit/onnxruntimeImportant
Run composer require or composer reinstall on your target platform. Release artifacts include platform-specific native binaries.
| Platform | Supported Runtimes |
|---|---|
| Linux x86_64 | cpu, cuda12, cuda13 |
| Linux ARM64 | cpu |
| macOS ARM64 | cpu |
| Windows x64 | cpu, cuda12, cuda13 |
Note
If your configured runtime is unavailable for your platform, composer will fall back to the cpu runtime.
Tip
For detailed information about using CUDA, CoreML, and TensorRT providers, see the Execution Providers section.
If the native library is missing from your installation, download it manually:
./vendor/bin/download-onnxruntimeDownload with specific options:
./vendor/bin/download-onnxruntime --runtime cuda12
./vendor/bin/download-onnxruntime --runtime cuda13
./vendor/bin/download-onnxruntime --platform windows-x64
./vendor/bin/download-onnxruntime --version 1.24.3Supported script options:
--runtime <cpu|cuda12|cuda13>--platform <linux-x86_64|linux-arm64|darwin-arm64|windows-x64>--version <onnx-runtime-version>
You might need this if:
- You installed a dev version (branch/tag instead of a release)
- Platform-specific package download failed and composer fell back to source
- You moved
vendor/directory between different platforms
Here's a complete example to get you running your first model:
<?php
require_once 'vendor/autoload.php';
use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\OrtValue;
use PhpMlKit\ONNXRuntime\Enums\DataType;
$session = InferenceSession::fromFile('/path/to/model.onnx');
$inputData = [1.0, 2.0, 3.0, 4.0, 5.0];
$input = OrtValue::fromArray($inputData, DataType::FLOAT);
$outputs = $session->run(['input' => $input]);
$result = $outputs['output']->toArray();
print_r($result);Wondering where to find ONNX models for the example above? Here are your options:
Hugging Face Hub is the world's largest collection of machine learning models, including thousands of ONNX-compatible models ready to use. You can browse and filter specifically for ONNX models: https://huggingface.co/models?library=onnx
The easiest way to download these models directly from PHP is using the Hugging Face PHP client:
composer require codewithkyrian/huggingfaceuse Codewithkyrian\HuggingFace\HuggingFace;
$hf = HuggingFace::client();
$modelPath = $hf->hub()
->repo('onnx-community/detr-resnet-50-ONNX')
->download('onnx/model.onnx');
$session = InferenceSession::fromFile($modelPath);The official ONNX Model Zoo has been deprecated as of July 2025. Most models previously available there have been migrated to Hugging Face and can be found at:
https://huggingface.co/onnxmodelzoo
PyTorch:
import torch
model = MyModel()
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, 'model.onnx')TensorFlow/Keras:
# Use tf2onnx to convert
# pip install tf2onnx
# python -m tf2onnx.convert --saved-model saved_model --output model.onnxscikit-learn:
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
initial_type = [('float_input', FloatTensorType([None, 4]))]
onnx_model = convert_sklearn(model, initial_types=initial_type)
with open('model.onnx', 'wb') as f:
f.write(onnx_model.SerializeToString())Train your own models using any framework (PyTorch, TensorFlow, JAX, etc.) and export to ONNX format.
The InferenceSession is your main interface to ONNX Runtime. It loads models and runs inference.
use PhpMlKit\ONNXRuntime\InferenceSession;
// From file
$session = InferenceSession::fromFile('path/to/model.onnx');
// From bytes
$modelBytes = file_get_contents('model.onnx');
$session = InferenceSession::fromBytes($modelBytes);// Basic inference with OrtValue
$input = OrtValue::fromArray([1.0, 2.0, 3.0], DataType::FLOAT);
$outputs = $session->run(['input' => $input]);
$result = $outputs['output']->toArray();
// Get specific outputs only
$outputs = $session->run(
['input' => $input],
['output1', 'output2']
);
// With run options
$runOptions = RunOptions::default();
$outputs = $session->run(['input' => $input], options: $runOptions);// Get input information
$inputs = $session->inputs();
foreach ($inputs as $name => $meta) {
echo "Input: $name\n";
echo " Shape: " . implode(', ', $meta['shape']) . "\n";
echo " Type: {$meta['dtype']->name}\n";
}
// Get output information
$outputs = $session->outputs();
foreach ($outputs as $name => $meta) {
echo "Output: $name\n";
echo " Shape: " . implode(', ', $meta['shape']) . "\n";
echo " Type: {$meta['dtype']->name}\n";
}Sessions automatically clean up when they go out of scope, but you can explicitly close them:
$session = InferenceSession::fromFile('model.onnx');
// ... use session ...
$session->dispose(); // Explicit cleanup
// Or let PHP handle it automatically when $session goes out of scopeImportant
The ONNX environment is shared across all sessions and uses reference counting. It will be automatically cleaned up when the last session closes.
OrtValue is the universal container for all data in ONNX Runtime. It handles:
- Tensors: Multi-dimensional arrays of numbers or strings
- Sequences: Ordered collections of values
- Maps: Key-value pairs
- Optional: Optional type wrappers
use PhpMlKit\ONNXRuntime\OrtValue;
use PhpMlKit\ONNXRuntime\Enums\DataType;
// 1D tensor
$tensor1D = OrtValue::fromArray([1.0, 2.0, 3.0], DataType::FLOAT);
// 2D tensor (matrix)
$tensor2D = OrtValue::fromArray(
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
DataType::FLOAT
);
// 3D tensor
$tensor3D = OrtValue::fromArray(
[[[1, 2], [3, 4]], [[5, 6], [7, 8]]],
DataType::INT32
);
// String tensor
$stringTensor = OrtValue::fromArray(
['hello', 'world', 'test'],
DataType::STRING
);
// With explicit shape
$data = [1, 2, 3, 4, 5, 6];
$tensor = OrtValue::fromArray($data, DataType::INT32, [2, 3]);// Get data back as PHP array
$result = $tensor->toArray();
// Get tensor information
$shape = $tensor->shape(); // [2, 3]
$type = $tensor->dataType(); // DataType::FLOAT
$count = $tensor->elementCount(); // 6
$bytes = $tensor->sizeInBytes(); // 24 (6 elements × 4 bytes)Configure how the session runs with SessionOptions:
use PhpMlKit\ONNXRuntime\SessionOptions;
use PhpMlKit\ONNXRuntime\Enums\GraphOptimizationLevel;
use PhpMlKit\ONNXRuntime\Enums\ExecutionMode;
// Method 1: Create with specific options
$options = new SessionOptions(
graphOptimizationLevel: GraphOptimizationLevel::ENABLE_ALL,
executionMode: ExecutionMode::PARALLEL,
interOpNumThreads: 4,
intraOpNumThreads: 4
);
// Method 2: Use fluent builder
$options = SessionOptions::default()
->withGraphOptimizationLevel(GraphOptimizationLevel::ENABLE_ALL)
->withExecutionMode(ExecutionMode::PARALLEL)
->withInterOpThreads(4)
->withIntraOpThreads(4);
// Create session with options
$session = InferenceSession::fromFile('model.onnx', $options);// CPU-optimized preset
$options = SessionOptions::cpuOptimized();
// GPU parallel preset
$options = SessionOptions::gpuParallel();
// Debug preset (verbose logging)
$options = SessionOptions::debug();Configure individual inference runs:
use PhpMlKit\ONNXRuntime\RunOptions;
use PhpMlKit\ONNXRuntime\Enums\LoggingLevel;
$runOptions = new RunOptions(
logVerbosityLevel: LoggingLevel::VERBOSE,
runTag: 'inference_batch_123'
);
// Or use presets
$runOptions = RunOptions::debug();
$runOptions = RunOptions::withTag('my_batch');
// Run with options
$outputs = $session->run($inputs, options: $runOptions);Access model-level metadata to understand the model's origin, version, and custom properties:
$metadata = $session->metadata();
echo $metadata->getProducerName(); // e.g., 'pytorch'
echo $metadata->getGraphName(); // e.g., 'torch-jit-export'
echo $metadata->getDomain(); // e.g., '' or 'com.example'
echo $metadata->getDescription(); // Model description
echo $metadata->getGraphDescription(); // Graph-level description
echo $metadata->getVersion(); // Version number (int)
// Custom metadata key-value pairs
$custom = $metadata->getCustomMetadataMap();
foreach ($custom as $key => $value) {
echo "$key: $value\n";
}Model Metadata Properties:
| Property | Type | Description |
|---|---|---|
producerName |
string | Framework/tool that created the model |
graphName |
string | Name of the computation graph |
domain |
string | Model domain (namespace) |
description |
string | Human-readable model description |
graphDescription |
string | Graph-level description |
version |
int | Model version number |
customMetadataMap |
array | Key-value pairs of custom metadata |
Query input and output node information to understand data requirements:
use PhpMlKit\ONNXRuntime\Metadata\TensorMetadata;
use PhpMlKit\ONNXRuntime\Metadata\SequenceMetadata;
use PhpMlKit\ONNXRuntime\Metadata\MapMetadata;
$inputs = $session->inputs();
foreach ($inputs as $name => $metadata) {
echo "Input: $name\n";
echo " Type: " . $metadata->getType()->name . "\n";
if ($metadata instanceof TensorMetadata) {
echo " Shape: " . json_encode($metadata->getShape()) . "\n";
echo " Data Type: " . $metadata->getDataType()->name . "\n";
echo " Symbolic Shape: " . json_encode($metadata->getSymbolicShape()) . "\n";
}
}
$outputs = $session->outputs();
// Or get just names
$inputNames = $session->inputNames();
$outputNames = $session->outputNames();Node Metadata Types:
| Type | Class | Key Properties |
|---|---|---|
| Tensor | TensorMetadata |
dataType, shape, symbolicShape |
| Sequence | SequenceMetadata |
elementMetadata |
| Map | MapMetadata |
keyType, valueMetadata |
Tensors are the primary data structure in machine learning. This library supports:
- Numeric tensors: FLOAT, DOUBLE, INT8/16/32/64, UINT8/16/32/64
- String tensors: Variable-length strings
- Boolean tensors: true/false values
- Multi-dimensional: 1D, 2D, 3D, and higher dimensions
// Shape is automatically inferred from nested arrays
$tensor = OrtValue::fromArray([[1, 2, 3], [4, 5, 6]], DataType::INT32);
echo $tensor->shape(); // [2, 3]
// Or explicitly specified
$tensor = OrtValue::fromArray([1, 2, 3, 4, 5, 6], DataType::INT32, [2, 3]);Some models accept dynamic shapes (indicated by -1 in shape):
// Model accepts variable-length input
$meta = $session->inputs()['input'];
echo $meta->getShape(); // Might be [-1] or [-1, 3, 224, 224]
// You can provide any size
$input = OrtValue::fromArray([1, 2, 3], DataType::FLOAT); // Works
$input = OrtValue::fromArray([1, 2, 3, 4, 5], DataType::FLOAT); // Also worksSequences are ordered collections of values. Supported element types:
- STRING, INT64, FLOAT, DOUBLE
- Actually all tensor types work (though docs list only those four)
// Create a sequence of tensors
$tensor1 = OrtValue::fromArray([1, 2], DataType::INT32);
$tensor2 = OrtValue::fromArray([3, 4], DataType::INT32);
$tensor3 = OrtValue::fromArray([5, 6], DataType::INT32);
$sequence = OrtValue::sequence([$tensor1, $tensor2, $tensor3]);
// Get sequence length
$length = $sequence->sequenceLength(); // 3
// Access elements
$first = $sequence->getSequenceElement(0);
echo $first->toArray(); // [1, 2]
// Iterate over all elements
$sequence->foreachSequenceElement(function($value, $index) {
echo "Element $index: " . json_encode($value->toArray()) . "\n";
});
// Convert to PHP array
$result = $sequence->toArray(); // [[1, 2], [3, 4], [5, 6]]Maps are key-value pairs with specific type constraints.
Key Types: INT64, STRING
Value Types: INT64, FLOAT, DOUBLE, STRING
use PhpMlKit\ONNXRuntime\OrtValue;
use PhpMlKit\ONNXRuntime\Enums\DataType;
// INT64 keys with FLOAT values
$keys = OrtValue::fromArray([1, 2, 3], DataType::INT64);
$values = OrtValue::fromArray([10.0, 20.0, 30.0], DataType::FLOAT);
$map = OrtValue::map($keys, $values);
$result = $map->toArray(); // [1 => 10.0, 2 => 20.0, 3 => 30.0]
// STRING keys with STRING values
$keys = OrtValue::fromArray(['a', 'b'], DataType::STRING);
$values = OrtValue::fromArray(['x', 'y'], DataType::STRING);
$map = OrtValue::map($keys, $values);
$result = $map->toArray(); // ['a' => 'x', 'b' => 'y']
// Access keys and values separately
$mapKeys = $map->mapKeys();
$mapValues = $map->mapValues();Important
Other type combinations will throw a FailException.
ONNX Runtime also supports sequences of maps (specifically for FLOAT values):
// Create maps
$keys1 = OrtValue::fromArray([1, 2], DataType::INT64);
$values1 = OrtValue::fromArray([10.0, 20.0], DataType::FLOAT);
$map1 = OrtValue::map($keys1, $values1);
$keys2 = OrtValue::fromArray([3, 4], DataType::INT64);
$values2 = OrtValue::fromArray([30.0, 40.0], DataType::FLOAT);
$map2 = OrtValue::map($keys2, $values2);
// Create sequence of maps
$sequence = OrtValue::sequence([$map1, $map2]);
$result = $sequence->toArray(); // [[1 => 10.0, 2 => 20.0], [3 => 30.0, 4 => 40.0]]Note
Sequences of maps only work with INT64/STRING keys and FLOAT values. Other combinations will fail.
All ONNX tensor element types are supported:
| Type | PHP Equivalent | Notes |
|---|---|---|
FLOAT |
float | 32-bit floating point |
DOUBLE |
float | 64-bit floating point |
INT8 |
int | 8-bit signed integer |
INT16 |
int | 16-bit signed integer |
INT32 |
int | 32-bit signed integer |
INT64 |
int | 64-bit signed integer |
UINT8 |
int | 8-bit unsigned integer |
UINT16 |
int | 16-bit unsigned integer |
UINT32 |
int | 32-bit unsigned integer |
UINT64 |
int | 64-bit unsigned integer |
BOOL |
bool | Boolean values |
STRING |
string | Variable-length strings |
Note
FLOAT16, BFLOAT16, COMPLEX64, and COMPLEX128 are defined but may have limited support.
All major resources in this library implement the Disposable interface, providing automatic cleanup when objects go out of scope while still allowing explicit cleanup when you need to free resources early.
When a disposable resource goes out of scope or is no longer referenced, its destructor automatically releases the underlying native resources:
function processModel() {
$session = InferenceSession::fromFile('model.onnx');
$input = OrtValue::fromArray([1, 2, 3], DataType::FLOAT);
$outputs = $session->run(['input' => $input]);
return $outputs['result']->toArray();
// $session, $input, $outputs all cleaned up automatically
}This RAII-style pattern means you rarely need to think about cleanup - resources are managed naturally through PHP's object lifecycle.
When you need deterministic resource management or want to free memory before a variable goes out of scope, call dispose():
// Sessions
$session = InferenceSession::fromFile('model.onnx');
// ... use session ...
$session->dispose(); // Release session resources immediately
// OrtValues
$tensor = OrtValue::fromArray([1, 2, 3], DataType::FLOAT);
// ... use tensor ...
$tensor->dispose(); // Release tensor resources immediately
// Safe to call multiple times
$tensor->dispose(); // No error, already disposedThis is useful for:
- Long-running scripts where you want to release memory as soon as possible
- Processing large batches of data iteratively
- Ensuring resources are freed at specific points in your code
When you create an OrtValue from a PHP array, the library:
- Allocates an FFI buffer
- Copies data from PHP array to FFI buffer
- Creates ONNX tensor referencing the buffer
- Keeps buffer reference to prevent garbage collection
- Automatically releases both on disposal
$tensor = OrtValue::fromArray([1, 2, 3], DataType::FLOAT);
// Buffer created internally, managed automatically
// Just let $tensor go out of scope or call dispose()When you create an OrtValue from an existing FFI buffer (zero-copy):
- ONNX tensor references your existing buffer
- You are responsible for ensuring buffer outlives the tensor
- You must free the buffer if needed
dispose()only releases the tensor handle, not your buffer
$ffi = FFI::cdef();
$buffer = $ffi->new('float[100]'); // Your buffer
$tensor = OrtValue::fromBuffer($buffer, 400, DataType::FLOAT, [100]);
// ... use tensor ...
$tensor->dispose(); // Releases tensor only
// You must manage $buffer lifetime (PHP frees it in this case though!)Use Case: External buffers are useful for:
- Working with C libraries
- Pre-allocated memory pools
Sessions share a global ONNX environment with reference counting:
$session1 = InferenceSession::fromFile('model1.onnx');
$session2 = InferenceSession::fromFile('model2.onnx');
// Both share the same environment
$session1->dispose(); // Environment kept alive by session2
$session2->dispose(); // Environment released (no more sessions)This is handled automatically - you don't need to manage it.
Execution providers are the computational backends that ONNX Runtime uses to run your models. By default, the CPU execution provider is used, which works on all platforms. For better performance, you can use hardware-accelerated providers like CUDA (NVIDIA GPUs), CoreML (Apple Neural Engine), or TensorRT (optimized NVIDIA inference).
| Provider | Description | Runtime Required | Platforms |
|---|---|---|---|
| CPUExecutionProvider | Default CPU backend | cpu (included by default) |
All platforms |
| CUDAExecutionProvider | NVIDIA GPU acceleration | cuda12 or cuda13 |
Linux x86_64, Windows x64 |
| CoreMLExecutionProvider | Apple Neural Engine/GPU | cpu (included on macOS) |
macOS ARM64 |
| TensorRTExecutionProvider | Optimized NVIDIA inference | cuda12 or cuda13 |
Linux x86_64, Windows x64 |
The CPU provider is the default and works out of the box on all platforms. It uses optimized CPU instructions (AVX, AVX2, AVX-512) when available.
use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\SessionOptions;
// CPU is the default - no special configuration needed
$session = InferenceSession::fromFile('model.onnx');
// Or explicitly configure for CPU
$options = SessionOptions::default();
$session = InferenceSession::fromFile('model.onnx', $options);CoreML provider is automatically included in the CPU runtime on macOS ARM64. It accelerates inference using the Apple Neural Engine (ANE) and GPU.
use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\SessionOptions;
use PhpMlKit\ONNXRuntime\Providers\CoreMLProviderOptions;
// Use CoreML with default settings
$options = SessionOptions::default()
->withCoreMLProvider();
$session = InferenceSession::fromFile('model.onnx', $options);CoreML Configuration Options:
use PhpMlKit\ONNXRuntime\Enums\CoreMLComputeUnits;
use PhpMlKit\ONNXRuntime\Enums\CoreMLModelFormat;
// Configure CoreML for specific hardware
$options = SessionOptions::default()
->withCoreMLProvider(
CoreMLProviderOptions::default()
->withComputeUnits(CoreMLComputeUnits::ALL) // Use ANE + GPU + CPU
->withModelFormat(CoreMLModelFormat::ML_PROGRAM)
->withStaticShapes(true) // Optimize for fixed-size inputs
);
$session = InferenceSession::fromFile('model.onnx', $options);Compute Units:
ALL- Use all available compute units (ANE, GPU, CPU)CPU_AND_NEURAL_ENGINE- CPU and Apple Neural Engine onlyCPU_AND_GPU- CPU and GPU onlyCPU_ONLY- CPU only
Note
CoreML works best with models that use standard operations. Some complex operations may fall back to CPU execution.
CUDA provider requires installing the CUDA runtime variant. First, switch your runtime:
# Update composer.json to use CUDA 12 or CUDA 13
composer reinstall phpmlkit/onnxruntimeThen configure your session:
use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\SessionOptions;
use PhpMlKit\ONNXRuntime\Providers\CudaProviderOptions;
// Use CUDA with default settings (device 0)
$options = SessionOptions::default()
->withCudaProvider();
$session = InferenceSession::fromFile('model.onnx', $options);
// Configure CUDA with specific options
$options = SessionOptions::default()
->withCudaProvider(
CudaProviderOptions::default()
->withDeviceId(0) // GPU device ID
->withMemoryLimit(2147483648) // 2GB memory limit
->withArenaExtendStrategy(ArenaExtendStrategy::NEXT_POWER_OF_TWO)
->withCudnnConvAlgoSearch(CudnnConvAlgoSearch::HEURISTIC)
);
$session = InferenceSession::fromFile('model.onnx', $options);CUDA Presets:
// High performance preset (may use more memory)
$options = SessionOptions::default()
->withCudaProvider(CudaProviderOptions::highPerformance());
// Memory-conservative preset (slower but uses less GPU memory)
$options = SessionOptions::default()
->withCudaProvider(CudaProviderOptions::memoryConservative());TensorRT provides highly optimized inference for NVIDIA GPUs by compiling models specifically for your hardware. It requires the CUDA runtime and builds on top of CUDA.
use PhpMlKit\ONNXRuntime\InferenceSession;
use PhpMlKit\ONNXRuntime\SessionOptions;
use PhpMlKit\ONNXRuntime\Providers\TensorRTProviderOptions;
// Use TensorRT with default settings
$options = SessionOptions::default()
->withTensorRTProvider();
$session = InferenceSession::fromFile('model.onnx', $options);TensorRT Configuration:
// Configure TensorRT with caching for faster subsequent loads
$options = SessionOptions::default()
->withTensorRTProvider(
TensorRTProviderOptions::default()
->withCachePath('/path/to/trt_cache') // Cache compiled engines
->withMaxWorkspaceSize(2147483648) // 2GB workspace
->withFp16(true) // Enable FP16 precision
->withInt8(true) // Enable INT8 precision
->withMaxPartitionIterations(1000)
->withMinSubgraphSize(1)
);
$session = InferenceSession::fromFile('model.onnx', $options);TensorRT Presets:
// Maximum performance (FP16/INT8, aggressive optimization)
$options = SessionOptions::default()
->withTensorRTProvider(
TensorRTProviderOptions::maximumPerformance()
);
// With caching enabled for production
$options = SessionOptions::default()
->withTensorRTProvider(
TensorRTProviderOptions::withCache('/app/cache/tensorrt')
);Important
TensorRT compiles models specifically for your GPU architecture. The first load of a model may take several minutes as TensorRT builds the optimized engine. Use caching to save compiled engines for faster subsequent loads.
To use CUDA or TensorRT providers, you need the corresponding runtime:
1. Update your composer.json:
{
"extra": {
"platform-packages": {
"phpmlkit/onnxruntime": {
"runtime": "cuda12"
}
}
}
}Available runtimes: cpu, cuda12, cuda13
2. Reinstall the package:
composer reinstall phpmlkit/onnxruntimeNote
The CUDA runtime includes both CUDA and TensorRT providers. CoreML is included in the CPU runtime on macOS.
You can check which execution providers are available at runtime:
use PhpMlKit\ONNXRuntime\FFI\Lib;
$api = Lib::api();
$providers = $api->getAvailableProviders();
print_r($providers);
// Output: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']If a configured provider fails to initialize (e.g., CUDA not available), ONNX Runtime automatically falls back to the CPU provider. You can check which provider is actually being used by profiling or checking the available providers list.
The library provides specific exceptions for different error conditions:
use PhpMlKit\ONNXRuntime\Exceptions\NoSuchFileException;
use PhpMlKit\ONNXRuntime\Exceptions\InvalidProtobufException;
use PhpMlKit\ONNXRuntime\Exceptions\InvalidArgumentException;
use PhpMlKit\ONNXRuntime\Exceptions\FailException;
try {
$session = InferenceSession::fromFile('model.onnx');
$outputs = $session->run(['input' => $data]);
} catch (NoSuchFileException $e) {
// Model file doesn't exist
echo "Model not found: " . $e->getMessage();
} catch (InvalidProtobufException $e) {
// File exists but isn't a valid ONNX model
echo "Invalid model format: " . $e->getMessage();
} catch (InvalidArgumentException $e) {
// Wrong input name, shape mismatch, etc.
echo "Invalid input: " . $e->getMessage();
} catch (FailException $e) {
// General ONNX Runtime error
echo "ONNX error: " . $e->getMessage();
}All of these live under PhpMlKit\ONNXRuntime\Exceptions except the abstract base, which is PhpMlKit\ONNXRuntime\Exception.
| Exception | Cause | Solution |
|---|---|---|
Exception (abstract base) |
Parent of most ONNX-specific errors | Catch this type to handle them together |
FailException |
Generic ONNX Runtime failure | Read the message; inspect model and inputs |
InvalidArgumentException |
Bad arguments (validation or ORT) | Check inputs, names, shapes, options |
NoSuchFileException |
Model path not found | Fix the file path |
InvalidProtobufException |
Invalid or corrupt ONNX model bytes | Re-export or verify the .onnx file |
NoModelException |
Operation needs a loaded model | Ensure the session is created correctly |
EngineErrorException |
Inference engine error | Read the message |
RuntimeException |
ONNX Runtime RUNTIME_EXCEPTION (extends PHP \RuntimeException) |
Read the message |
ModelLoadedException |
Conflicts with an already-loaded model | Avoid double load / wrong API sequence |
NotImplementedException |
Feature not implemented in this ORT build or in the package | Use a supported model or API |
InvalidGraphException |
Invalid model graph | Fix or replace the model |
ExecutionProviderException |
Execution provider failed | Check provider config, drivers, GPU |
InvalidOperationException |
Wrong use of API (e.g. disposed session, wrong OrtValue kind) |
Fix call order and resource lifetime |
For zero-copy operations with other FFI libraries:
$ffi = FFI::cdef();
// Create buffer
$bufferSize = 100 * 4; // 100 floats × 4 bytes
$buffer = $ffi->new("uint8_t[{$bufferSize}]");
// Fill with data (from another library, file, etc.)
// ... fill $buffer ...
// Create tensor from buffer (zero-copy)
$tensor = OrtValue::fromBuffer(
$buffer,
$bufferSize,
DataType::FLOAT,
[100]
);
// Use tensor
$outputs = $session->run(['input' => $tensor]);
// Clean up
$tensor->dispose(); // Releases tensor only
// You must free $buffer separately if neededWhen the phpmlkit/ndarray package is installed, you can convert between NDArray and OrtValue explicitly:
use PhpMlKit\NDArray\NDArray;
use PhpMlKit\NDArray\DType;
use PhpMlKit\ONNXRuntime\OrtValue;
// Create NDArray input
$input = NDArray::array([[1.0, 2.0], [3.0, 4.0]], DType::Float32);
// Run inference (InferenceSession accepts OrtValue inputs)
$outputs = $session->run(['input' => OrtValue::fromNDArray($input)]);
// Convert tensor output to NDArray when needed
$output = $outputs['output'];
echo $output->toNDArray();
// array(2, 2)
// [1. 2.]
// [3. 4.]This provides seamless integration with the NDArray ecosystem for numerical computing in PHP.
Enable profiling to analyze model performance:
use PhpMlKit\ONNXRuntime\SessionOptions;
$options = SessionOptions::default()
->withProfiling(true, 'my_model_profile');
$session = InferenceSession::fromFile('model.onnx', $options);
// Run inference multiple times
for ($i = 0; $i < 100; $i++) {
$session->run($inputs);
}
$session->dispose(); // Profile saved to my_model_profile_*.jsonTip
For detailed information about execution providers (CPU, CUDA, CoreML, TensorRT), see the Execution Providers section.
For advanced use cases, you can access the underlying FFI layer directly:
use PhpMlKit\ONNXRuntime\FFI\Lib;
use PhpMlKit\ONNXRuntime\FFI\Api;
// Get FFI instance
$ffi = Lib::get();
// Get typed API wrapper
$api = Lib::api();
// Access low-level C API functions
$memoryInfo = $api->createCpuMemoryInfo(
AllocatorType::ARENA_ALLOCATOR,
MemoryType::DEFAULT
);
// Don't forget to release resources
$api->releaseMemoryInfo($memoryInfo);Warning: Direct FFI access requires knowledge of the ONNX Runtime C API. Use with caution as improper resource management can cause memory leaks or crashes.
The C API header is located at vendor/phpmlkit/onnxruntime/include/onnxruntime.h. You can reference it for available functions and types.
Contributions are welcome! Please feel free to submit a Pull Request.
# Clone repository
git clone https://github.com/phpmlkit/onnxruntime.git
cd onnxruntime
# Install dependencies
composer install
# Generate test models (requires Python)
pip install onnx numpy
python scripts/generate_test_models.py
# Run tests
composer test
# Run tests (pretty)
composer test:pretty
# Check code style
composer cs:check
# Fix code style
composer cs:fix
# Run static analysis
composer lintThis project follows PSR-12 coding standards. Please run the linter before submitting:
composer cs:fixThis project is licensed under the MIT License - see the LICENSE file for details.
- Microsoft ONNX Runtime - The underlying inference engine
- onnxruntime-php - The original PHP library that inspired this reimagined version
- PHP FFI - Foreign Function Interface
- Codewithkyrian Platform Package Installer - Automatic library distribution
- Issues: https://github.com/phpmlkit/onnxruntime/issues
- Documentation: This README and inline PHPDoc
- Examples: See
examples/directory
Happy inferencing! 🚀