OpenArc is a high-performance inference engine specifically designed for Intel hardware. It enables serving Large Language Models (LLMs), Vision-Language Models (VLMs), Whisper (ASR), Kokoro (TTS), Qwen3 (ASR/TTS), Embeddings, and Rerankers via OpenAI-compatible API endpoints. Powered by the OpenVINO toolkit, OpenArc provides a local, private, and open-source solution for AI inference on CPUs, GPUs (integrated and discrete), and NPUs README.md12-14
OpenArc employs a three-tier architecture that decouples the HTTP interface from the underlying hardware-specific inference engines. This design allows for model concurrency, where multiple models can be loaded and queried simultaneously README.md51-52 The system is built using modern Python standards, requiring version 3.12 or higher pyproject.toml10
The following diagram illustrates how a request moves from the "Natural Language Space" (API) into the "Code Entity Space" (Internal Registries and Engines).
Request Orchestration Diagram
Sources: README.md40-47 docs/index.md27-33 pyproject.toml15-39
The server is built with FastAPI and provides a suite of OpenAI-compatible endpoints, including /v1/chat/completions, /v1/audio/transcriptions, /v1/audio/speech, and /v1/embeddings README.md41-47 It supports advanced features like streaming cancellation for LLMs/VLMs, speculative decoding for increased throughput, and tool calling with parallel support README.md35-49
The ModelRegistry handles the lifecycle of models, while the WorkerRegistry manages request routing by maintaining per-model asyncio.Queue instances. This ensures that inference requests for one model do not block others, supporting full model concurrency and automatic unloading on inference failure README.md51-53
OpenArc abstracts hardware complexity through specialized engine families:
openvino-genai README.md42-44 pyproject.toml24optimum-intel README.md46-47 pyproject.toml25voice_clone, voice_design, and custom_voice README.md64-65 docs/models.md84-86OpenArc is optimized for the Intel AI PC ecosystem and data center hardware:
Engine to Code Entity Mapping
Sources: README.md42-47 README.md64-65 docs/models.md20-96 pyproject.toml24-25
New users should begin by installing the requirements and obtaining pre-converted OpenVINO IR models from sources like the HuggingFace OpenVINO collection or the Echo9Zulu repository docs/models.md5-15
openarc bench CLI to measure metrics such as TTFT, TPOT, and throughput, with results stored in an automatic SQLite database README.md54-61Sources:
README.md:12-65docs/index.md:14-56docs/models.md:5-96pyproject.toml:1-41