Local AI inference in PHP — run ONNX models in-process via FrankenPHP + ONNX Runtime.
Models are loaded once at startup and shared across all requests. Inference is a function call, not a network request — sub-10ms for text, ~15ms for images. No APIs, no Python, no microservices.
Note: This is a companion repo for my FrankenPHP conference talks. It's meant as inspiration and a reference implementation, not a production framework. Feel free to explore, fork, and adapt the patterns for your own projects.
No talks yet — but here's the abstract if you want to present this:
Running AI Models Natively in PHP
What if PHP could run AI models directly — no APIs, no Python, no microservices? With ONNX Runtime and FrankenPHP, this is now possible. Sentiment analysis in 5 ms. Text embeddings in 3 ms. Object detection in 15 ms. Text-to-speech in 200 ms. All running in-process, using pre-trained models from HuggingFace, with a two-line PHP API. No network latency, no API keys, no external dependencies at runtime.
This talk shows how to bridge PHP to ONNX Runtime via a Go inference layer, why in-process inference changes the economics of AI features, and how any PHP developer can add ML capabilities to their application.
See talk.md for the full narrative, demo walkthrough, and architecture breakdown.
Four demo models, all open-source from HuggingFace:
| Model | Task | Input | Output | Time |
|---|---|---|---|---|
| DistilBERT SST-2 | Sentiment analysis | Text | Positive/Negative + score | ~5 ms |
| all-MiniLM-L6-v2 | Text embeddings | Text | 384-dim float vector | ~3 ms |
| YOLOv8n | Object detection | JPEG/PNG bytes | Bounding boxes + labels | ~15 ms |
| Piper lessac | Text to speech | Text | WAV audio (16-bit PCM) | ~200 ms |
Models run on CPU — no GPU required. Files are downloaded once to dist/models/ (local) or baked into the Docker image.
FrankenONNX combines three key pieces:
- FrankenPHP — embeds PHP in a Go process, giving us a Go server that serves PHP pages
- ONNX Runtime — Microsoft's high-performance inference engine for
.onnxmodels, with hardware-specific optimizations (SIMD, CoreML, CUDA) - hugot — Go library for HuggingFace transformer pipelines (tokenization + inference)
Because FrankenPHP runs PHP inside Go, and Go has mature ONNX Runtime bindings, the Go process becomes the bridge — a thin C extension connects PHP to Go, and Go connects to ONNX Runtime.
PHP Go Host ONNX Runtime
┌───────────────────────┐ ┌──────────────────────────────┐ ┌───────────────────────┐
│ │ │ │ │ │
│ $m = ONNX::load(..) │ │ ┌─────────┐ ┌───────────┐ │ │ ┌─────────────────┐ │
│ $m->run($input) ├───►│ │ C ext ├─►│ Go onnx/ │ │ │ │ sentiment.onnx │ │
│ │ │ │ (CGo) │ │ registry ├─┼───►│ │ embedding.onnx │ │
│ // => PHP array │◄───┤ │ │◄─┤ │ │ │ │ yolov8n.onnx │ │
│ │ │ └─────────┘ └─────┬─────┘ │◄───┤ │ │ │
│ │ │ ┌─────┴─────┐ │ │ └─────────────────┘ │
│ │ │ │ hugot │ │ │ │
│ │ │ │ pipelines │ │ │ libonnxruntime.dylib │
│ │ │ └───────────┘ │ │ (C++ engine) │
└───────────────────────┘ └──────────────────────────────┘ └───────────────────────┘
FrankenPHP Model Registry ONNX Runtime
(PHP embedded in Go) (lazy-loading singletons) (inference engine)
- Models are
.onnxfiles downloaded from HuggingFace with their tokenizer configs - The registry (
onnx/) lazy-loads models on first use, keeps them for the process lifetime - PHP code calls model inference through the
FrankenPHP\ONNXclass - Results are JSON-encoded in Go and decoded to PHP arrays by the C extension
Models are process-lifetime singletons — no per-request context needed. FrankenWASM needs per-request WASM plugin instances (sandbox isolation). FrankenAsync needs per-request task managers. FrankenONNX needs neither — models are global read-only resources. The CGo bridge is simpler: no thread index, no request context, just load(name) and run(name, input).
FrankenONNX requires a fork of FrankenPHP that adds frankenphp.RegisterExtension() for registering C Zend extensions from Go — the same fork used by FrankenWASM and FrankenAsync.
Unlike those projects, FrankenONNX does not need frankenphp.Thread() or frankenphp_thread_index() since models are global singletons, not per-request state.
The fork is referenced via a replace directive in go.mod:
replace github.com/dunglas/frankenphp v1.11.2 => ../frankenphp
docker build -t frankenonnx .
docker run -p 8082:8082 frankenonnxThe multi-stage Dockerfile handles everything — PHP build, ONNX Runtime + libtokenizers download, model download from HuggingFace (~415 MB), and the host binary. Models are baked into the image. Open http://localhost:8082 to see the demos.
The PHP build stage uses static-php-cli which can download pre-built libraries from GitHub instead of compiling from source. This requires a GitHub token to avoid API rate limits:
GITHUB_TOKEN=$(gh auth token) docker build \
--secret id=github_token,env=GITHUB_TOKEN \
-t frankenonnx .Without the token the build still works — it just compiles all libraries from source, which takes longer.
- macOS Apple Silicon (ARM64)
- Go 1.24+
- The FrankenPHP fork cloned as a sibling directory (
../frankenphp)
make php # Build PHP 8.3 (ZTS, embed) via static-php-cli (one-time)
make env # Generate env.yaml with CGO flags from the PHP build
make ort # Download ONNX Runtime + libtokenizers to dist/
make models # Download model files to ~/.frankenonnx/models/
make run # Build the host binary + start the server on :8082The PHP build is cached in build/.php/ — subsequent runs skip the build if libphp.a exists. To rebuild PHP from scratch:
make php-clean # Remove cached downloads and build artifacts
make php # Rebuild
make env # Regenerate env.yamlmake ort downloads pre-built binaries from GitHub releases (~30 MB total):
libonnxruntime.dylib— ONNX Runtime shared librarylibtokenizers.a— Rust-based tokenizer (used by hugot)
make models downloads model files from HuggingFace (~350 MB total):
sentiment/— DistilBERT SST-2 (model + tokenizer)embedding/— all-MiniLM-L6-v2 (model + tokenizer)
If you prefer to use your own PHP build, create an env.yaml manually:
HOME: "/Users/you"
GOPATH: "/Users/you/go"
GOFLAGS: "-tags=nowatcher,ORT"
CGO_ENABLED: "1"
CGO_CFLAGS: "-I/path/to/php/include ..."
CGO_CPPFLAGS: "-I/path/to/php/include ..."
CGO_LDFLAGS: "-L/path/to/php/lib -lphp -L/path/to/frankenonnx/dist -lonnxruntime ..."The CGO flags must point to your PHP build's include headers and libraries. PHP must be built with ZTS (--enable-zts) and embed (--enable-embed). The -tags ORT build tag is required by hugot's ONNX Runtime backend.
Install the EnvFile plugin, then in your Run Configuration enable EnvFile and add env.yaml to load the CGO flags automatically.
| Variable | Default | Description |
|---|---|---|
FRANKENONNX_DOC_ROOT |
demo |
PHP document root directory |
FRANKENONNX_ORT_LIB |
system default | Path to libonnxruntime.dylib |
FRANKENONNX_MODELS_DIR |
~/.frankenonnx/models |
Directory containing model files |
use FrankenPHP\ONNX;
// Load a model (lazy-loads on first call, singleton for process lifetime)
$model = ONNX::load('sentiment');
// Run inference
$result = $model->run($input);$model = ONNX::load('sentiment');
$result = $model->run('FrankenPHP is amazing!');
// => [['label' => 'POSITIVE', 'score' => 0.9999]]
$result = $model->run('This is terrible.');
// => [['label' => 'NEGATIVE', 'score' => 0.9995]]$model = ONNX::load('embedding');
$vec = $model->run('Hello world');
// => [0.0623, -0.0418, 0.1201, ...] (384 floats)
// Cosine similarity between two texts
$vecA = $model->run('I love programming');
$vecB = $model->run('I enjoy coding');
// cosine_similarity($vecA, $vecB) => 0.89$model = ONNX::load('yolov8n');
$result = $model->run(file_get_contents('photo.jpg'));
// => [['label' => 'person', 'confidence' => 0.95,
// 'x' => 0.1, 'y' => 0.2, 'w' => 0.3, 'h' => 0.4], ...]ONNX::load()throws\RuntimeExceptionif model name is unknown or loading fails->run()throws\RuntimeExceptionon inference failure- No
false/nullreturns — always throws on failure
frankenonnx/
├── main.go # HTTP server + FrankenPHP init
├── Makefile # Build orchestration
├── go.mod
├── env.yaml # Generated CGO flags (via make env)
├── phpext/
│ ├── phpext.c # Zend extension — module lifecycle
│ ├── phpext.h # Module declarations
│ ├── phpext.go # CGo exports (registered via init())
│ ├── phpext_cgo.h # CGo header binding
│ ├── onnxmodel.c # FrankenPHP\ONNX class — method implementations
│ └── onnxmodel.h # Class declarations + arginfo
├── onnx/
│ ├── registry.go # Model registry + lazy loading
│ ├── nlp.go # hugot pipelines (sentiment, embedding)
│ ├── yolo.go # YOLOv8n via onnxruntime_go
│ └── tts.go # Piper VITS text-to-speech
├── scripts/
│ └── download-models.sh # Download models from HuggingFace
├── dist/ # Built binary + native libs
│ ├── frankenonnx # The compiled binary
│ ├── libonnxruntime.dylib
│ └── libtokenizers.a
├── build/
│ └── php/
│ └── Makefile # PHP build via static-php-cli (ZTS + embed)
└── demo/
├── index.php # Landing page with card grid
├── style.php # Shared CSS (dark theme support)
├── _header.php # Shared header template
├── _footer.php # Shared footer template
├── sentiment/
│ └── index.php # Sentiment analysis demo
└── embedding/
└── index.php # Text embeddings + cosine similarity demo
FrankenONNX depends on two native libraries, downloaded to dist/ by make ort:
| Library | Version | Source | Purpose |
|---|---|---|---|
libonnxruntime.dylib |
1.22.0 | microsoft/onnxruntime | ONNX model inference engine |
libtokenizers.a |
1.26.0 | daulet/tokenizers | Rust-based HuggingFace tokenizer |
Code is MIT — see LICENSE.md. Talk material is licensed under CC BY 4.0 — free to share and adapt with attribution.
