A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs. Works seamlessly with OpenAI, Azure OpenAI, 火山引擎 (Volcengine), DeepSeek, Groq, Together AI, and any other provider that implements the OpenAI API specification.
Designed for simplicity and compatibility – no heavy dependencies, just pure Ruby with Net::HTTP.
- 🔌 Universal compatibility – Works with any OpenAI-compatible API provider
- 🌊 Streaming support – Native SSE streaming for chat completions
- 🧵 Fiber-friendly – Compatible with Ruby 3 Fiber scheduler, works great with Falcon
- 🔧 Flexible configuration – Customizable API prefix for non-standard endpoints
- 🎯 Simple interface – Receive-an-Object / Return-an-Object style API
- 📦 Zero runtime dependencies – Uses only Ruby standard library
Add to your Gemfile:
gem "simple_inference"Then run:
bundle installrequire "simple_inference"
# Connect to OpenAI
client = SimpleInference::Client.new(
base_url: "https://api.openai.com",
api_key: ENV["OPENAI_API_KEY"]
)
result = client.chat(
model: "gpt-4o-mini",
messages: [{ "role" => "user", "content" => "Hello!" }]
)
puts result.content
p result.usage| Option | Env Variable | Default | Description |
|---|---|---|---|
base_url |
SIMPLE_INFERENCE_BASE_URL |
http://localhost:8000 |
API base URL |
api_key |
SIMPLE_INFERENCE_API_KEY |
nil |
API key (sent as Authorization: Bearer <token>) |
api_prefix |
SIMPLE_INFERENCE_API_PREFIX |
/v1 |
API path prefix (e.g., /v1, empty string for some providers) |
timeout |
SIMPLE_INFERENCE_TIMEOUT |
nil |
Request timeout in seconds |
open_timeout |
SIMPLE_INFERENCE_OPEN_TIMEOUT |
nil |
Connection open timeout |
read_timeout |
SIMPLE_INFERENCE_READ_TIMEOUT |
nil |
Read timeout |
raise_on_error |
SIMPLE_INFERENCE_RAISE_ON_ERROR |
true |
Raise exceptions on HTTP errors |
headers |
– | {} |
Additional headers to send with requests |
adapter |
– | Default |
HTTP adapter (see Adapters) |
client = SimpleInference::Client.new(
base_url: "https://api.openai.com",
api_key: ENV["OPENAI_API_KEY"]
)火山引擎的 API 路径不包含 /v1 前缀,需要设置 api_prefix: "":
client = SimpleInference::Client.new(
base_url: "https://ark.cn-beijing.volces.com/api/v3",
api_key: ENV["ARK_API_KEY"],
api_prefix: "" # 重要:火山引擎不使用 /v1 前缀
)
result = client.chat(
model: "deepseek-v3-250324",
messages: [
{ "role" => "system", "content" => "你是人工智能助手" },
{ "role" => "user", "content" => "你好" }
]
)
puts result.contentclient = SimpleInference::Client.new(
base_url: "https://api.deepseek.com",
api_key: ENV["DEEPSEEK_API_KEY"]
)client = SimpleInference::Client.new(
base_url: "https://api.groq.com/openai",
api_key: ENV["GROQ_API_KEY"]
)client = SimpleInference::Client.new(
base_url: "https://api.together.xyz",
api_key: ENV["TOGETHER_API_KEY"]
)# Ollama
client = SimpleInference::Client.new(
base_url: "http://localhost:11434"
)
# vLLM
client = SimpleInference::Client.new(
base_url: "http://localhost:8000"
)Some providers use non-standard authentication headers:
client = SimpleInference::Client.new(
base_url: "https://my-service.example.com",
api_prefix: "/v1",
headers: {
"x-api-key" => ENV["MY_SERVICE_KEY"]
}
)result = client.chat(
model: "gpt-4o-mini",
messages: [
{ "role" => "system", "content" => "You are a helpful assistant." },
{ "role" => "user", "content" => "Hello!" }
],
temperature: 0.7,
max_tokens: 1000
)
puts result.content
p result.usageresult = client.chat(
model: "gpt-4o-mini",
messages: [{ "role" => "user", "content" => "Tell me a story" }],
stream: true,
include_usage: true
) do |delta|
print delta
end
puts
p result.usageLow-level streaming (events) is also available, and can be used as an Enumerator:
stream = client.chat_completions_stream(
model: "gpt-4o-mini",
messages: [{ "role" => "user", "content" => "Hello" }]
)
stream.each do |event|
# process event
endOr as an Enumerable of delta strings:
stream = client.chat_stream(
model: "gpt-4o-mini",
messages: [{ "role" => "user", "content" => "Hello" }],
include_usage: true
)
stream.each { |delta| print delta }
puts
p stream.result&.usageresponse = client.embeddings(
model: "text-embedding-3-small",
input: "Hello, world!"
)
vector = response.body["data"][0]["embedding"]response = client.rerank(
model: "bge-reranker-v2-m3",
query: "What is machine learning?",
documents: [
"Machine learning is a subset of AI...",
"The weather today is sunny...",
"Deep learning uses neural networks..."
]
)response = client.audio_transcriptions(
model: "whisper-1",
file: File.open("audio.mp3", "rb")
)
puts response.body["text"]response = client.audio_translations(
model: "whisper-1",
file: File.open("audio.mp3", "rb")
)model_ids = client.models# Returns full response
response = client.health
# Returns boolean
if client.healthy?
puts "Service is up!"
endAll HTTP methods return a SimpleInference::Response with:
response.status # Integer HTTP status code
response.headers # Hash with downcased String keys
response.body # Parsed JSON (Hash/Array), raw String, or nil (SSE success)
response.success? # true for 2xxBy default, non-2xx responses raise exceptions:
begin
client.chat_completions(model: "invalid", messages: [])
rescue SimpleInference::Errors::HTTPError => e
puts "HTTP #{e.status}: #{e.message}"
p e.body # parsed body (Hash/Array/String)
puts e.raw_body # raw response body string (if available)
endOther exception types:
SimpleInference::Errors::TimeoutError– Request timed outSimpleInference::Errors::ConnectionError– Network errorSimpleInference::Errors::DecodeError– JSON parsing failedSimpleInference::Errors::ConfigurationError– Invalid configuration
To handle errors manually:
client = SimpleInference::Client.new(
base_url: "https://api.openai.com",
api_key: ENV["OPENAI_API_KEY"],
raise_on_error: false
)
response = client.chat_completions(model: "gpt-4o-mini", messages: [...])
if response.success?
# success
else
puts "Error: #{response.status} - #{response.body}"
endThe default adapter uses Ruby's built-in Net::HTTP. It's thread-safe and compatible with Ruby 3 Fiber scheduler.
For better performance or async environments, use the optional HTTPX adapter:
# Gemfile
gem "httpx"adapter = SimpleInference::HTTPAdapters::HTTPX.new(timeout: 30.0)
client = SimpleInference::Client.new(
base_url: "https://api.openai.com",
api_key: ENV["OPENAI_API_KEY"],
adapter: adapter
)Implement your own adapter by subclassing SimpleInference::HTTPAdapter:
class MyAdapter < SimpleInference::HTTPAdapter
def call(request)
# request keys: :method, :url, :headers, :body, :timeout, :open_timeout, :read_timeout
# Must return: { status: Integer, headers: Hash, body: String }
end
def call_stream(request, &block)
# For streaming support (optional)
# Yield raw chunks to block for SSE responses
end
endCreate an initializer config/initializers/simple_inference.rb:
INFERENCE_CLIENT = SimpleInference::Client.new(
base_url: ENV.fetch("INFERENCE_BASE_URL", "https://api.openai.com"),
api_key: ENV["INFERENCE_API_KEY"]
)Use in controllers:
class ChatsController < ApplicationController
def create
response = INFERENCE_CLIENT.chat_completions(
model: "gpt-4o-mini",
messages: [{ "role" => "user", "content" => params[:prompt] }]
)
render json: response.body
end
endUse in background jobs:
class EmbedJob < ApplicationJob
def perform(text)
response = INFERENCE_CLIENT.embeddings(
model: "text-embedding-3-small",
input: text
)
vector = response.body["data"][0]["embedding"]
# Store vector...
end
endThe client is thread-safe:
- No global mutable state
- Per-client configuration only
- Each request uses its own HTTP connection
MIT License. See LICENSE for details.