OpenArc Overview

Relevant source files

OpenArc is a high-performance inference engine specifically designed for Intel hardware. It enables serving Large Language Models (LLMs), Vision-Language Models (VLMs), Whisper (ASR), Kokoro (TTS), Qwen3 (ASR/TTS), Embeddings, and Rerankers via OpenAI-compatible API endpoints. Powered by the OpenVINO toolkit, OpenArc provides a local, private, and open-source solution for AI inference on CPUs, GPUs (integrated and discrete), and NPUs README.md12-14

System Architecture

OpenArc employs a three-tier architecture that decouples the HTTP interface from the underlying hardware-specific inference engines. This design allows for model concurrency, where multiple models can be loaded and queried simultaneously README.md51-52 The system is built using modern Python standards, requiring version 3.12 or higher pyproject.toml10

High-Level Component Flow

The following diagram illustrates how a request moves from the "Natural Language Space" (API) into the "Code Entity Space" (Internal Registries and Engines).

Request Orchestration Diagram

Sources: README.md40-47 docs/index.md27-33 pyproject.toml15-39

Key Subsystems

1. Server & API Layer

The server is built with FastAPI and provides a suite of OpenAI-compatible endpoints, including /v1/chat/completions, /v1/audio/transcriptions, /v1/audio/speech, and /v1/embeddings README.md41-47 It supports advanced features like streaming cancellation for LLMs/VLMs, speculative decoding for increased throughput, and tool calling with parallel support README.md35-49

For details, see Getting Started and OpenAI-Compatible API Endpoints.

2. Model & Worker Registries

The ModelRegistry handles the lifecycle of models, while the WorkerRegistry manages request routing by maintaining per-model asyncio.Queue instances. This ensures that inference requests for one model do not block others, supporting full model concurrency and automatic unloading on inference failure README.md51-53

For details, see Model Registry and Lifecycle and Worker Registry and Request Orchestration.

3. Inference Engines

OpenArc abstracts hardware complexity through specialized engine families:

OVGenAI: High-performance LLM, VLM, and Whisper pipelines utilizing openvino-genai README.md42-44 pyproject.toml24
Optimum: Feature Extraction (Embeddings) and Reranking powered by optimum-intel README.md46-47 pyproject.toml25
OpenVINO Native: Native implementations for Kokoro TTS and the Qwen3 suite, including ASR and specialized TTS modes like voice_clone, voice_design, and custom_voice README.md64-65 docs/models.md84-86
For details, see Supported Model Types and Engines and Inference Engines.

Hardware Targets

OpenArc is optimized for the Intel AI PC ecosystem and data center hardware:

CPU: Standard Intel processors.
GPU: Integrated Intel Graphics and discrete Intel GPUs. Supports Multi GPU Pipeline Parallel and CPU offload/Hybrid modes README.md37-38
NPU: Intel AI Boost (Neural Processing Unit) found in Core Ultra processors README.md39

Engine to Code Entity Mapping

Sources: README.md42-47 README.md64-65 docs/models.md20-96 pyproject.toml24-25

Getting Started & Support

New users should begin by installing the requirements and obtaining pre-converted OpenVINO IR models from sources like the HuggingFace OpenVINO collection or the Echo9Zulu repository docs/models.md5-15

Quickstart Guide: See Getting Started.
Model Reference: See Supported Model Types and Engines.
Benchmarking: Use the openarc bench CLI to measure metrics such as TTFT, TPOT, and throughput, with results stored in an automatic SQLite database README.md54-61

Sources:

README.md:12-65
docs/index.md:14-56
docs/models.md:5-96
pyproject.toml:1-41

OpenArc Overview

Relevant source files

System Architecture

High-Level Component Flow

The following diagram illustrates how a request moves from the "Natural Language Space" (API) into the "Code Entity Space" (Internal Registries and Engines).

Request Orchestration Diagram

Sources: README.md40-47 docs/index.md27-33 pyproject.toml15-39

Key Subsystems

1. Server & API Layer

For details, see Getting Started and OpenAI-Compatible API Endpoints.

2. Model & Worker Registries

For details, see Model Registry and Lifecycle and Worker Registry and Request Orchestration.

3. Inference Engines

OpenArc abstracts hardware complexity through specialized engine families:

OVGenAI: High-performance LLM, VLM, and Whisper pipelines utilizing openvino-genai README.md42-44 pyproject.toml24
Optimum: Feature Extraction (Embeddings) and Reranking powered by optimum-intel README.md46-47 pyproject.toml25
OpenVINO Native: Native implementations for Kokoro TTS and the Qwen3 suite, including ASR and specialized TTS modes like voice_clone, voice_design, and custom_voice README.md64-65 docs/models.md84-86
For details, see Supported Model Types and Engines and Inference Engines.

Hardware Targets

OpenArc is optimized for the Intel AI PC ecosystem and data center hardware:

CPU: Standard Intel processors.
GPU: Integrated Intel Graphics and discrete Intel GPUs. Supports Multi GPU Pipeline Parallel and CPU offload/Hybrid modes README.md37-38
NPU: Intel AI Boost (Neural Processing Unit) found in Core Ultra processors README.md39

Engine to Code Entity Mapping

Sources: README.md42-47 README.md64-65 docs/models.md20-96 pyproject.toml24-25

Getting Started & Support

Quickstart Guide: See Getting Started.
Model Reference: See Supported Model Types and Engines.
Benchmarking: Use the openarc bench CLI to measure metrics such as TTFT, TPOT, and throughput, with results stored in an automatic SQLite database README.md54-61

Sources:

README.md:12-65
docs/index.md:14-56
docs/models.md:5-96
pyproject.toml:1-41

OpenArc Overview

System Architecture

High-Level Component Flow

Key Subsystems

1. Server & API Layer

2. Model & Worker Registries

3. Inference Engines

Hardware Targets

Getting Started & Support

On this page

OpenArc Overview

System Architecture

High-Level Component Flow

Key Subsystems

1. Server & API Layer

2. Model & Worker Registries

3. Inference Engines

Hardware Targets

Getting Started & Support

On this page