This is an OpenAI-compatible API server for Anemll models. It provides a /v1/chat/completions endpoint that follows the OpenAI API format. It also provides the /v1/models endpoint which allows it to work with Open WebUI.
- OpenAI-compatible API
- Streaming responses
- System prompt, conversation history supported
- Works with Open WebUI (other frontends not tested but should work aswell)
A version of chat_full.py (from the swift-inference branch, which runs the fastest for me) is included for ease of use.
- Install the required dependencies, preferably in a conda or venv environment
pip install -r requirements.txt- You will also need to download an Anemll model. I have used this one from the official Anemll Huggingface, 0.1.1 should also work fine.
Modify the MODEL_DIR variable in server.py to your Anemll model path.
# Hardcoded model directory path
MODEL_DIR = "/example-path/anemll-Meta-Llama-3.2-1B-ctx2048_0.1.2"Run the server with:
python server.pyThe server will start on 0.0.0.0:8000 by default.
In order to connect Open WebUI to it, simply go to "Connections" in the settings and enter this as the base URL: http://0.0.0.0:8000/v1.
Sometimes, but rarely, when you start the server you will get a GIL issue when you try to generate a response. Just restart the server and it will most likely work the next time you run it, and keep working from then on.
Anemll is still in its early stages, with a limited amount of models on Hugging Face and development of the core library still ongoing. This presents a unique opportunity to become an early contributor to this emerging technology. Whether you're interested in experimenting with the library, converting models, contributing code, or simply raising awareness - your involvement can help shape the future of on-device AI acceleration. The ANE represents a significant advancement in efficient ML inference, and community participation is vital to realizing its full potential.
This endpoint follows the OpenAI API format for chat completions.
Example request:
{
"model": "anemll-model",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"stream": true
}Lists available models. Needed to work with Open WebUI.
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anemll-model",
"messages": [
{"role": "system", "content": "Whatever you do, always reply in ALL CAPS!"},
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"stream": false
}'curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anemll-model",
"messages": [
{"role": "system", "content": "Whatever you do, always reply in ALL CAPS!"},
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"stream": true
}'curl http://localhost:8000/v1/modelsUnofficial Discord server for Anemll:
If you're interested in a general AI system performing any kind of digital labour for you, visit:
MIT - Do whatever you want with this.