Please see our website for the most up to date information.
Orla is a library for building and running LLM-based agentic systems. Modern agentic applications are workflows that combine multiple LLM calls, tool invocations, and heterogeneous infrastructure. Today, developers often stitch these pieces together manually using orchestration code, LLM serving engines, and tool execution logic.
Orla simplifies this process by separating workflow-level decisions from request execution. Developers define workflows as stages, while Orla handles how those stages are mapped to models and backends, scheduled and executed, and coordinated through shared inference state.
The system exposes three core components: a Stage Mapper for heterogeneous model routing, a Workflow Orchestrator for executing and scheduling stages, and a Memory Manager that manages KV cache across workflow stages.
Orla is a project of Dr. Minlan Yu's lab at Harvard SEAS.
We welcome any and all open-source contributions to orla. Orla is designed to be a community-focused project and runs on individual contributions from amazing people around the world. This document provides guidelines and instructions for contributing to the project.
Installing the orla daemon:
brew install --cask harvard-cns/orla/orlaInstalling the orla client SDK:
pip install pyorlaVisit our website to learn more.
If you use Orla for your research, we would greatly appreciate it if you cite our demo paper.
@misc{shahout2026orlalibraryservingllmbased,
title={Orla: A Library for Serving LLM-Based Multi-Agent Systems},
author={Rana Shahout and Hayder Tirmazi and Minlan Yu and Michael Mitzenmacher},
year={2026},
eprint={2603.13605},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.13605},
}
- For technical questions and feature requests, please use GitHub Issues
- For security disclosures, please see SECURITY.md.
