Skip to content

feat: AI model proxy, local model lifecycle, multi-provider routing, voice pipeline#10

Merged
bglusman merged 11 commits intomainfrom
rebase/proxy-voice-onto-main
Apr 21, 2026
Merged

feat: AI model proxy, local model lifecycle, multi-provider routing, voice pipeline#10
bglusman merged 11 commits intomainfrom
rebase/proxy-voice-onto-main

Conversation

@bglusman
Copy link
Copy Markdown
Owner

Summary

Adds substantial new capabilities on top of main with a clean single commit rebased directly onto main HEAD:

  • OpenAI-compatible proxy server[proxy] config block, POST /v1/chat/completions, GET /v1/models, GET /health
  • Multi-provider routing[[proxy.providers]] and [[proxy.model_routes]] pattern-matched to different backend URLs/keys/timeouts
  • Local model lifecycle[local_models] manages mlx_lm.server or llama-server processes; POST /control/local/switch hot-swaps models at runtime
  • Voice pipeline passthrough[proxy.voice.stt/tts] routes audio to any OpenAI-compatible server; GET /v1/tools/manifest surfaces capabilities as tool definitions; optional shell hooks for pre/post processing
  • Model alloys[[alloys]] weighted/round_robin blending across multiple providers
  • Traceloop and Helicone gateway integrations (feature-gated)
  • Matrix channel rewritten with raw HTTP, removing the matrix-sdk compile-time dependency conflict
  • Config validator--validate flag for config file checking at deploy time
  • Mock channel for integration testing
  • Persistent context (optional sqlx feature)

Replaces PR #9 (which had git history conflicts from the filter-branch operation).

Test plan

  • cargo test — all 18 test suites pass
  • cargo build --release -p zeroclawed succeeds
  • --validate validates config and exits cleanly
  • Proxy routes to named providers based on model pattern
  • POST /control/local/switch switches local model
  • GET /v1/tools/manifest reflects configured capabilities
  • Voice endpoints return 501 when not configured

…ng, and voice pipeline

Adds the following new capabilities on top of main:

- OpenAI-compatible HTTP proxy server ([proxy] config)
- Multi-provider routing via [[proxy.providers]] and [[proxy.model_routes]]
- Local model lifecycle management ([local_models] config) with mlx_lm/llama.cpp
  support and hot-swap via POST /control/local/switch
- Voice pipeline passthrough: POST /v1/audio/transcriptions, /v1/audio/speech,
  GET /v1/tools/manifest with optional shell hooks
- Model alloy blending ([[alloys]]) with weighted/round_robin strategies
- Traceloop and Helicone gateway integrations
- Matrix channel rewritten with raw HTTP (removes matrix-sdk dependency)
- Config validator (--validate flag)
- Mock channel for testing
- Persistent context store (optional feature)
- Agent delegation and slash command interception scaffold
- exclude loom-tests from default cargo test via workspace default-members
Copilot AI review requested due to automatic review settings April 20, 2026 18:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a substantial OpenAI-compatible proxy/voice surface to zeroclawed, plus infrastructure for multi-provider routing, local model lifecycle management, and (feature-gated) persistent conversation context.

Changes:

  • Introduces a new proxy server (/v1/chat/completions, /v1/models, /health) with provider routing, retries, and optional Helicone/Traceloop integrations.
  • Adds local model lifecycle management + a voice passthrough layer (STT/TTS forwarding + tool manifest endpoint).
  • Adds a unified context abstraction and a new persistent SQLite-backed context store (feature-gated).

Reviewed changes

Copilot reviewed 57 out of 58 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
crates/zeroclawed/src/voice/tools.rs Builds tool manifest JSON based on configured capabilities
crates/zeroclawed/src/voice/mod.rs Voice config schema for STT/TTS + hooks
crates/zeroclawed/src/voice/forward.rs Generic HTTP forwarder + hook runner for voice endpoints
crates/zeroclawed/src/unified_context.rs Unifies in-memory and persistent context store behind a trait/enum
crates/zeroclawed/src/sync.rs Loom-aware reexports for sync primitives
crates/zeroclawed/src/router.rs Adds dispatch API supporting optional model override
crates/zeroclawed/src/proxy/voice_handlers.rs Axum handlers for STT/TTS passthrough + tools manifest
crates/zeroclawed/src/proxy/traceloop/test.rs Adds Traceloop router unit tests (partial/placeholder)
crates/zeroclawed/src/proxy/traceloop/openai.rs Traceloop OpenAI-provider implementation
crates/zeroclawed/src/proxy/traceloop/kimi.rs Kimi provider wrapper via OpenAI-compatible API
crates/zeroclawed/src/proxy/traceloop/deepseek.rs DeepSeek provider implementation
crates/zeroclawed/src/proxy/traceloop/anthropic.rs Anthropic provider implementation + response conversion
crates/zeroclawed/src/proxy/streaming.rs Stub for SSE stream transformation (currently empty)
crates/zeroclawed/src/proxy/routing.rs Builds provider routing table from config patterns
crates/zeroclawed/src/proxy/retry_gateway.rs Adds a retrying gateway wrapper (configurable policy)
crates/zeroclawed/src/proxy/retry.rs Defines retry configuration + backoff iterator
crates/zeroclawed/src/proxy/openai.rs Adds OpenAI-compatible request/response types
crates/zeroclawed/src/proxy/mod.rs Proxy server bootstrapping + routes registration
crates/zeroclawed/src/proxy/helicone_router.rs HTTP-based Helicone gateway integration
crates/zeroclawed/src/proxy/handlers.rs Core proxy handlers: chat completions, models, health, local switch
crates/zeroclawed/src/proxy/auth_test.rs Adds auth tests (currently API-mismatched)
crates/zeroclawed/src/proxy/auth.rs Model access policy evaluation (auth scaffolding partially commented out)
crates/zeroclawed/src/proxy/alloy_router.rs Back-compat alloy router wrapper around Helicone router
crates/zeroclawed/src/providers/mod.rs Introduces provider registry + loom tests
crates/zeroclawed/src/providers/alloy.rs Implements alloy selection strategies and stats
crates/zeroclawed/src/persistent_context.rs Adds SQLite persistent context store (sqlx)
crates/zeroclawed/src/main.rs Adds --validate, --proxy-only, proxy startup, local model autoload, unified context init
crates/zeroclawed/src/local_model/mod.rs Adds local model manager and switch hooks
crates/zeroclawed/src/local_model/mlx_lm.rs Spawns/stops mlx_lm.server and waits for readiness
crates/zeroclawed/src/install/ssh.rs Switches mock client mutexes to loom-aware mutex
crates/zeroclawed/src/install/health.rs Switches mock health checker mutexes to loom-aware mutex
crates/zeroclawed/src/install/executor.rs Switches to loom-aware Arc
crates/zeroclawed/src/delegation.rs Adds delegation marker parsing + delegation engine (currently incomplete)
crates/zeroclawed/src/context.rs Switches to loom-aware sync primitives; adjusts exchange_count internals
crates/zeroclawed/src/config/validator.rs Adds config validation support + --validate plumbing
crates/zeroclawed/src/commands.rs Extends !model to list/activate alloys and trigger local/provider switches
crates/zeroclawed/src/channels/whatsapp.rs Adds post-auth !model handling
crates/zeroclawed/src/channels/telegram.rs Adds post-auth !model handling
crates/zeroclawed/src/channels/signal.rs Adds post-auth !model handling
crates/zeroclawed/src/channels/mod.rs Registers new mock channel module
crates/zeroclawed/src/channels/mock.rs Adds mock channel + control API server
crates/zeroclawed/src/auth.rs Updates tests/config construction for new config fields
crates/zeroclawed/src/adapters/openclaw_native.rs Plumbs model_override through DispatchContext in tests
crates/zeroclawed/src/adapters/openclaw_channel.rs Switches to loom-aware sync primitives; passes model_override through
crates/zeroclawed/src/adapters/openclaw.rs Adds model override support in OpenClaw + NZC webhook payload
crates/zeroclawed/src/adapters/nzc_native.rs Plumbs model_override through to inner adapter
crates/zeroclawed/src/adapters/mod.rs Adds model_override field to DispatchContext
crates/zeroclawed/src/adapters/cli.rs Makes /bin/false test OS-path tolerant
crates/zeroclawed/src/adapters/acp.rs Switches to loom-aware Arc (test)
crates/zeroclawed/Cargo.toml Adds features for proxy integrations + sqlx; adds deps (rand/uuid/backon/sqlx)
README.md Documents proxy, routing, local models, voice pipeline, tools manifest
Cargo.toml Sets workspace default-members to exclude loom-tests by default

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Comment on lines +219 to +222
curl -X POST http://localhost:8080/control/local/switch \
-H "Content-Type: application/json" \
-d '{"model_id": "gemma4-26b"}'
```
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README’s example request body uses {"model_id": "..."}, but the handler expects a model field. Either update the handler to accept model_id (or both for compatibility) or fix the README example so users can successfully switch local models.

Copilot uses AI. Check for mistakes.
Comment thread crates/zeroclawed/src/proxy/retry_gateway.rs Outdated
Comment thread crates/zeroclawed/src/channels/mock.rs Outdated
Comment on lines +114 to +117
// Use default control port (9090)
// TODO: Add control_port field to ChannelConfig
let control_port = 9090;

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This channel docstring advertises a configurable control_port, but the implementation ignores config and always binds to 9090. Either add control_port (and test_users, if intended) to the channel config schema and read it here, or remove it from the docs to avoid a misleading configuration surface.

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +9
use super::*;
use crate::config::PolyConfig;
use crate::proxy::auth::{check_model_access, model_matches, ProxyAccessPolicy};

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test module does not compile against the current config/auth APIs: PolyConfig.proxy is an Option<ProxyConfig> (so config.proxy.default_policy won’t compile), check_model_access returns bool (not Result), and model_matches is private to proxy::auth. Update the tests to use ProxyConfig directly (like proxy/auth.rs tests) or expose the necessary APIs in proxy::auth.

Copilot uses AI. Check for mistakes.
Comment thread crates/zeroclawed/src/delegation.rs Outdated
Comment on lines +268 to +293
// Build context for delegation based on mode
let delegate_text = match delegation.context {
ContextMode::None => &delegation.message,
ContextMode::Recent => {
// Get recent context and prepend
let recent = context_store.get_recent(chat_key, 5).await;
if recent.is_empty() {
&delegation.message
} else {
// Build message with context preamble
let preamble = recent.join("\n\n");
let combined = format!("{}\n\n{}", preamble, delegation.message);
// Store combined for this call
// TODO: Properly handle lifetime here
&delegation.message
}
}
ContextMode::Fork => {
// Fork: isolated context, just the message for now
&delegation.message
}
};

// Recursive call for delegation chain
self.dispatch_with_delegation(
delegate_text,
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContextStore has no get_recent(..) method (and this file currently builds a combined string but then discards it), so this won’t compile and delegation won’t include recent context as intended. Either add the needed API to the context store / unified context abstraction, or change this to use the existing context augmentation/preamble mechanism.

Suggested change
// Build context for delegation based on mode
let delegate_text = match delegation.context {
ContextMode::None => &delegation.message,
ContextMode::Recent => {
// Get recent context and prepend
let recent = context_store.get_recent(chat_key, 5).await;
if recent.is_empty() {
&delegation.message
} else {
// Build message with context preamble
let preamble = recent.join("\n\n");
let combined = format!("{}\n\n{}", preamble, delegation.message);
// Store combined for this call
// TODO: Properly handle lifetime here
&delegation.message
}
}
ContextMode::Fork => {
// Fork: isolated context, just the message for now
&delegation.message
}
};
// Recursive call for delegation chain
self.dispatch_with_delegation(
delegate_text,
// Build context for delegation based on mode.
//
// NOTE: Recent-context augmentation is not assembled here because the
// context store API used previously (`get_recent`) is not available in
// this file's abstraction. Until recent context is plumbed through the
// existing context augmentation/preamble mechanism, delegate the
// message payload directly.
let delegate_text = match delegation.context {
ContextMode::None => delegation.message.clone(),
ContextMode::Recent => delegation.message.clone(),
ContextMode::Fork => {
// Fork: isolated context, just the message for now
delegation.message.clone()
}
};
// Recursive call for delegation chain
self.dispatch_with_delegation(
&delegate_text,

Copilot uses AI. Check for mistakes.
Comment thread crates/zeroclawed/src/persistent_context.rs Outdated
Comment thread crates/zeroclawed/src/main.rs Outdated
Comment on lines +44 to +62
// Extract agent ID from header
let agent_id = headers
.get("x-agent-id")
.and_then(|h| h.to_str().ok())
.unwrap_or("anonymous");

// Check model access for this agent
if !crate::proxy::auth::check_model_access(&state.config, agent_id, &req.model) {
warn!(agent_id = %agent_id, model = %req.model, "Model access denied");
return api_error(
StatusCode::FORBIDDEN,
"model_access_denied",
&format!(
"Agent '{}' does not have access to model '{}'",
agent_id, req.model
),
None,
);
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/v1/chat/completions trusts the caller-provided x-agent-id header but does not enforce proxy.api_key (or any agent-specific API key) for authentication. As a result, a client can impersonate any agent and bypass model access controls. Add an Authorization check (global key and/or agent-specific keys) and derive agent_id from the authenticated key rather than from an untrusted header.

Copilot uses AI. Check for mistakes.
Adds /hooks/claude-code endpoint that speaks Claude Code's PreToolUse
hookSpecificOutput format, a Starlark policy tuned for Claude Code tools,
and scripts/setup-claude-hooks.sh to wire clashd as the policy engine
for Claude Code (builds, installs, launchd service, settings.json update).
- retry_gateway: use retry_if predicate so 4xx/non-retryable errors
  are not retried (was computing should_retry but ignoring it)
- persistent_context: remove unused Arc and Mutex imports
- mock channel: remove advertised but unimplemented control_port config
- README: fix model switch example (model_id -> model)
- style: cargo fmt --all (clashd missed in previous fmt commit)
Copilot AI review requested due to automatic review settings April 20, 2026 18:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 60 out of 61 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +191 to +195
let choices = response_json["choices"]
.as_array()
.unwrap_or(&vec![])
.iter()
.map(|choice| {
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwrap_or(&vec![]) takes a reference to a temporary Vec, which will not compile (borrowed value does not live long enough). Use unwrap_or(&[]) (slice) or bind a local empty Vec/&[] fallback before iterating.

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +187
let choices = response_json["choices"]
.as_array()
.unwrap_or(&vec![])
.iter()
.map(|choice| {
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwrap_or(&vec![]) takes a reference to a temporary Vec, which will not compile. Use unwrap_or(&[]) / unwrap_or_default() on an owned collection, or store an empty Vec in a local variable and reference that.

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +192
let content = response_json["content"]
.as_array()
.unwrap_or(&vec![])
.iter()
.filter_map(|item| {
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.unwrap_or(&vec![]) references a temporary Vec, which will not compile (borrowed value does not live long enough). Use a slice fallback (unwrap_or(&[])) or bind an empty Vec in a local variable before iterating.

Copilot uses AI. Check for mistakes.
Comment thread crates/zeroclawed/src/main.rs Outdated
Comment on lines +198 to +203
// TODO: Actually route the message through the system
// For now, just log it
debug!("Would route message from {}: {}", req.sender, req.text);

// Simulate a response
let response_text = format!("Mock response to: {}", req.text);
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/send doesn’t route the message through the real system yet (it just logs and returns a hard-coded "Mock response"). That makes the mock channel unsuitable for integration testing as described; either implement dispatch via router/command_handler/context_store, or clearly scope this as a stub.

Copilot uses AI. Check for mistakes.
Comment on lines +203 to +206
let tool_calls = response_json["content"]
.as_array()
.unwrap_or(&vec![])
.iter()
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.unwrap_or(&vec![]) again borrows a temporary Vec and won’t compile. Prefer unwrap_or(&[]) / a local empty collection to avoid referencing a temporary.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +93
// Thread 1: Register providers
let registry1 = registry.clone();
let p1 = provider1.clone();
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under cfg(all(test, loom)) these tests call registry.clone(), but ProviderRegistry does not implement Clone, so this won’t compile. Wrap ProviderRegistry in crate::sync::Arc (or implement Clone) for sharing across loom threads.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +8
use std::sync::Arc;

use async_trait::async_trait;
use backon::{ExponentialBuilder, Retryable};
use tracing::{info, warn};

use crate::proxy::gateway::{BackendError, ChatCompletionRequest, ChatCompletionResponse, Gateway, ModelInfo};

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is not referenced from proxy/mod.rs (no mod retry_gateway;), so it will never be compiled. As written, it also imports symbols that don’t exist in crate::proxy::gateway (e.g., Gateway, BackendError), so it’s likely stale/incomplete—either wire it up properly or remove it to avoid bitrot.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
// Unit tests for auth module
// These are the kinds of tests mutation testing works against

#[cfg(test)]
mod tests {
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file isn’t referenced by the module tree (no mod auth_test;), so it won’t be compiled or run. Consider moving these tests to crates/zeroclawed/tests/ (integration tests) or including the module from proxy/mod.rs under #[cfg(test)] so auth behavior is actually exercised.

Copilot uses AI. Check for mistakes.
bglusman and others added 6 commits April 20, 2026 15:19
Adds POST /hooks/zeroclaw-audit to receive zeroclaw webhook_audit
fire-and-forget payloads — evaluates against policy and logs/warns on
deny verdicts (monitoring only, zeroclaw does not read the response).

Adds scripts/setup-agents.sh: detect-or-install opencode (brew),
openclaw (npm), zeroclaw (brew) and wire clashd policy integration for
each — zeroclaw webhook_audit, openclaw exec-approvals (restricted+ask),
opencode plugin stub. Supports --configure-only, --install-only, --agents.
…ay naming

security-gateway crate renamed to security-proxy (binary, package name,
workspace deps, crate path). Default port changed 8080→8888 to avoid
conflict with the model gateway (8080). Adds SECURITY_PROXY_PORT env var
override consistent with CLASHD_PORT.

Removes misleading "Alloy proxy/Alloy Model Proxy Server" labels from
the model gateway (proxy/mod.rs, handlers.rs, backend.rs) — alloys are
one routing feature; the gateway also includes Traceloop observability,
retries, Helicone, multi-provider routing, and local model management.
Single entrypoint replacing setup-claude-hooks.sh + setup-agents.sh.
Builds zeroclawed, clashd, security-proxy (release), installs to
~/.local/bin/, creates launchd services for both clashd and security-proxy.

Wires clashd policy hooks for all four agents: Claude Code (PreToolUse
hook), opencode (plugin stub), openclaw (exec-approvals), zeroclaw
(webhook_audit + autonomy). Prompts before installing any missing tool
(--yes skips prompts; --configure-only skips installs entirely).

Correctly detects zeroclaw needing onboard before attempting service start.
…ters

install.sh gains --nodes-file <path> and --nodes-only flags.
For each node in the JSON config: cross-compile locally (cross/zigbuild)
or fall back to building on the remote via SSH; rsync binary + policy
files; install systemd service (Linux) or launchd plist (macOS).

Adds deploy/nodes.example.json documenting the node config format
(host, user, ssh_key, arch, os, services, install_dir, config_dir).
Supports x86_64/aarch64 Linux and macOS nodes from one build machine.
- security-proxy: security_gateway → security_proxy in integration tests
- adversary-detector: sort_by → sort_by_key
- host-agent/pct: collapse nested if into match guard
- zeroclawed: redundant field names (shorthand), manual checked_div,
  strip_suffix, collapsible if let in matrix.rs, useless vec!, assert!
  for bool, let-else for single-variant match in main.rs (extracted
  into_in_memory() method on UnifiedContextStore to avoid irrefutable
  pattern lint across feature configurations)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 20, 2026 20:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 70 out of 78 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Comment on lines +156 to +165
[[proxy.providers]]
id = "local"
models = ["local/*", "llama*", "qwen*", "gemma*"]
url = "http://localhost:8888/v1"

[[proxy.providers]]
id = "fast-provider"
models = ["fast/*"]
url = "https://api.fast-provider.example.com/v1"
api_key_file = "/etc/zeroclawed/secrets/fast-key"
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README example uses model patterns like "llama*", "qwen*", and "gemma*", but the routing implementation only supports exact match, *, and prefix/* globs. As written, these patterns will never match and routing will fall back to the default backend. Either update the docs to use supported patterns (e.g. qwen/*) or extend model_matches_pattern to support * suffix wildcards.

Copilot uses AI. Check for mistakes.
Comment thread README.md Outdated
Comment on lines +204 to +214
[[local_models.models]]
id = "qwen3-35b"
path = "~/.cache/huggingface/hub/models--unsloth--Qwen3.6-35B-A3B-MLX-8bit"
backend = "mlx_lm"
port = 8888

[[local_models.models]]
id = "gemma4-26b"
path = "~/.cache/huggingface/hub/models--unsloth--gemma-4-26b-a4b-it-MLX-8bit"
backend = "mlx_lm"
port = 8888
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [local_models] README example doesn’t match the actual config schema: [[local_models.models]] expects hf_id (and optional provider_type/display_name), while port/host are configured under [local_models.mlx_lm]. The example currently uses path, backend, and per-model port, which won’t deserialize. Please align the documentation with LocalModelsConfig/LocalModelDef.

Suggested change
[[local_models.models]]
id = "qwen3-35b"
path = "~/.cache/huggingface/hub/models--unsloth--Qwen3.6-35B-A3B-MLX-8bit"
backend = "mlx_lm"
port = 8888
[[local_models.models]]
id = "gemma4-26b"
path = "~/.cache/huggingface/hub/models--unsloth--gemma-4-26b-a4b-it-MLX-8bit"
backend = "mlx_lm"
port = 8888
[local_models.mlx_lm]
host = "127.0.0.1"
port = 8888
[[local_models.models]]
id = "qwen3-35b"
hf_id = "unsloth/Qwen3.6-35B-A3B-MLX-8bit"
provider_type = "mlx_lm"
display_name = "Qwen3 35B"
[[local_models.models]]
id = "gemma4-26b"
hf_id = "unsloth/gemma-4-26b-a4b-it-MLX-8bit"
provider_type = "mlx_lm"
display_name = "Gemma 4 26B"

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +97
#[tokio::test]
async fn test_caching_behavior() {
// Create a mock router with a single provider
let _router = TraceloopRouter::new(vec![ProviderConfig {
id: "test".to_string(),
r#type: ProviderType::OpenAI,
api_key: "test-key".to_string(),
base_url: Some("http://localhost:9999".to_string()),
default_model: "test-model".to_string(),
}])
.unwrap();

let _messages = [ChatMessage {
role: "user".to_string(),
content: Some(MessageContent::Text("Test message".to_string())),
name: None,
tool_calls: None,
tool_call_id: None,
reasoning: None,
reasoning_content: None,
}]; // test only verifies code compiles (mock provider not running)

// First request should miss cache
// Note: This will fail because the mock provider can't connect to localhost:9999
// In a real test, we'd use a mock HTTP server
// For now, we just verify the code compiles and the caching logic is sound
println!("Caching test would run here with proper mock setup");
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test module doesn’t currently test caching behavior: test_caching_behavior contains no assertions and just prints a message. Either convert it into a real test using a mock HTTP server (e.g. mockito) to assert cache hits/misses, or remove/mark it clearly as a placeholder so it doesn’t give a false sense of coverage.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +14
// Unit tests for auth module
// These are the kinds of tests mutation testing works against

#[cfg(test)]
mod tests {
use super::*;
use crate::config::PolyConfig;
use crate::proxy::auth::{check_model_access, model_matches, ProxyAccessPolicy};

#[test]
fn test_model_matches_exact() {
// Exact match should work
assert!(model_matches("deepseek-chat", "deepseek-chat"));
assert!(model_matches("kimi/kimi-for-coding", "kimi/kimi-for-coding"));
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/zeroclawed/src/proxy/auth_test.rs appears to be an orphaned test file: it isn’t referenced from the module tree, and it uses an old check_model_access(...).is_ok()/is_err() API that doesn’t match the current bool return type. Consider either wiring it in via #[cfg(test)] mod auth_test; (after updating it to compile) or deleting it to avoid confusion and bitrot.

Copilot uses AI. Check for mistakes.
Comment thread crates/clashd/src/main.rs Outdated
Comment on lines +130 to +156
let context = serde_json::json!({
"agent_id": "claude-code",
"cwd": req.cwd.unwrap_or_default(),
"session_id": req.session_id.unwrap_or_default(),
});

let result = state
.engine
.evaluate(&req.tool_name, &req.tool_input, Some("claude-code"))
.await;

let (decision, reason) = match result.verdict.to_string().as_str() {
"deny" => ("deny", result.reason),
"review" => ("ask", result.reason),
_ => ("allow", None),
};

info!(
tool = %req.tool_name,
decision = %decision,
reason = ?reason,
"claude-code hook evaluated"
);

// Suppress unused variable warning — context available for future use
let _ = context;

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handler builds a context object with cwd/session_id, but it isn’t passed into PolicyEngine::evaluate (the engine constructs its own context), so cwd will always be empty in Starlark policies. Either remove the unused context construction, or extend PolicyEngine::evaluate to accept and merge additional context fields so policies like claude-code-policy.star can actually use cwd/session_id.

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +106

// Thread 1: Register providers
let registry1 = registry.clone();
let p1 = provider1.clone();
let p2 = provider2.clone();
let t1 = thread::spawn(move || {
registry1.register(p1);
registry1.register(p2);
});

// Thread 2: Read while writing
let registry2 = registry.clone();
let t2 = thread::spawn(move || {
// This might read empty or partial state
let _ids = registry2.list_ids();
let _provider = registry2.get("provider1");
});
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Loom tests call registry.clone(), but ProviderRegistry doesn’t implement Clone, so this will not compile under cfg(loom). Consider wrapping the registry in crate::sync::Arc inside the Loom test (and cloning the Arc), or implement Clone for ProviderRegistry by storing the inner map behind an Arc<RwLock<...>>.

Copilot uses AI. Check for mistakes.
Comment thread crates/zeroclawed/src/voice/forward.rs Outdated
Comment on lines +55 to +90
let result = tokio::task::spawn_blocking({
let path = path.to_string();
let input = input.clone();
move || -> Result<Vec<u8>> {
use std::io::Write;
use std::process::{Command, Stdio};

let mut child = Command::new(&path)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.with_context(|| format!("failed to spawn hook {path}"))?;

child
.stdin
.take()
.unwrap()
.write_all(&input)
.context("writing hook stdin")?;

let out = child.wait_with_output().context("waiting for hook")?;

if !out.stderr.is_empty() {
let msg = String::from_utf8_lossy(&out.stderr);
warn!(hook = %path, stderr = %msg.trim(), "hook stderr");
}

if out.status.success() {
Ok(out.stdout)
} else {
anyhow::bail!("hook exited with status {}", out.status);
}
}
})
.await;
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_hook spawns an external process and waits for it to finish, but there’s no timeout/kill logic. A hung hook will stall the request indefinitely even though the comment implies bounded time. Consider enforcing a timeout (e.g., tokio::time::timeout around spawn_blocking and/or killing the child after a deadline) so voice requests can fail fast or degrade gracefully.

Copilot uses AI. Check for mistakes.
Comment thread crates/zeroclawed/src/voice/forward.rs Outdated
Comment on lines +118 to +121
let client = reqwest::Client::builder()
.timeout(Duration::from_secs(config.timeout_seconds))
.build()
.context("building reqwest client")?;
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new reqwest::Client is constructed for every STT/TTS request. Creating clients is relatively expensive (connection pools, TLS config) and can become a throughput bottleneck. Consider reusing a single client (e.g., store it in ProxyState or cache per-upstream config) rather than rebuilding per call.

Copilot uses AI. Check for mistakes.
Comment thread scripts/setup-claude-hooks.sh Outdated
Comment on lines +42 to +44
"$CARGO" build --release -p clashd -p zeroclawed 2>&1 \
| grep -E "^error|Compiling|Finished" || true
ok "Build complete"
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cargo build pipeline masks build failures (... | grep ... || true) even though set -euo pipefail is enabled. If the build fails, the script will still print "Build complete" and then cp non-existent binaries, producing confusing errors. Consider removing the || true (and/or using grep ... || { cat; exit 1; }) so the script aborts on build failure while still filtering output.

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +148
let unified_context_store = UnifiedContextStore::new(
config.context.buffer_size,
config.context.inject_depth,
config.context.persistent.as_ref(),
)
.await?;

let context_store_arc = unified_context_store.into_in_memory()?;

// Clone the inner ContextStore for channel functions
let context_store = (*context_store_arc).clone();
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnifiedContextStore::new(...) can return a persistent store when persistent-context is enabled and [context.persistent] is configured, but main immediately calls into_in_memory()?, which will bail for the persistent variant. This makes the persistent-context feature unusable in practice. Consider plumbing UnifiedContextStore (or a dyn ContextStoreTrait) through channels/CommandHandler instead of forcing an in-memory ContextStore clone.

Copilot uses AI. Check for mistakes.
- Delete three orphaned files never wired into the module tree
  (retry_gateway.rs, auth_test.rs, delegation.rs)
- Read mock channel control_port from config (was hardcoded 9090)
- Document mock /send as an intentional stub with clear scope note
- Remove unused context object in clashd claude-code hook; log
  cwd/session_id directly in the tracing fields instead
- Fix setup-claude-hooks.sh: propagate cargo build failures instead
  of masking with || true; gate launchctl section on macOS
- Share a single reqwest::Client across voice forward calls (OnceLock)
  with per-request timeout; add 30s timeout to run_hook subprocess
- Stop sending "Bearer no-key" to unauthenticated HTTP backends:
  pass None through BackendConfig and skip Authorization header
  when api_key is empty
- Fix README: model routing patterns (llama* → llama/*), local_models
  schema (path/backend/port → hf_id/provider_type + mlx_lm section)
- Add comment explaining persistent-context feature is intentionally
  deferred until plumbed through all channel/handler call sites

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bglusman bglusman merged commit 64c4fd2 into main Apr 21, 2026
13 checks passed
bglusman added a commit that referenced this pull request Apr 25, 2026
…voice pipeline (#10)

* feat: add AI model proxy, local model lifecycle, multi-provider routing, and voice pipeline

Adds the following new capabilities on top of main:

- OpenAI-compatible HTTP proxy server ([proxy] config)
- Multi-provider routing via [[proxy.providers]] and [[proxy.model_routes]]
- Local model lifecycle management ([local_models] config) with mlx_lm/llama.cpp
  support and hot-swap via POST /control/local/switch
- Voice pipeline passthrough: POST /v1/audio/transcriptions, /v1/audio/speech,
  GET /v1/tools/manifest with optional shell hooks
- Model alloy blending ([[alloys]]) with weighted/round_robin strategies
- Traceloop and Helicone gateway integrations
- Matrix channel rewritten with raw HTTP (removes matrix-sdk dependency)
- Config validator (--validate flag)
- Mock channel for testing
- Persistent context store (optional feature)
- Agent delegation and slash command interception scaffold
- exclude loom-tests from default cargo test via workspace default-members

* style: apply cargo fmt across zeroclawed and clashd

* feat(clashd): add Claude Code hook endpoint, policy, and setup script

Adds /hooks/claude-code endpoint that speaks Claude Code's PreToolUse
hookSpecificOutput format, a Starlark policy tuned for Claude Code tools,
and scripts/setup-claude-hooks.sh to wire clashd as the policy engine
for Claude Code (builds, installs, launchd service, settings.json update).

* fix: address AI review feedback

- retry_gateway: use retry_if predicate so 4xx/non-retryable errors
  are not retried (was computing should_retry but ignoring it)
- persistent_context: remove unused Arc and Mutex imports
- mock channel: remove advertised but unimplemented control_port config
- README: fix model switch example (model_id -> model)
- style: cargo fmt --all (clashd missed in previous fmt commit)

* feat(clashd): add zeroclaw audit hook endpoint and agent setup script

Adds POST /hooks/zeroclaw-audit to receive zeroclaw webhook_audit
fire-and-forget payloads — evaluates against policy and logs/warns on
deny verdicts (monitoring only, zeroclaw does not read the response).

Adds scripts/setup-agents.sh: detect-or-install opencode (brew),
openclaw (npm), zeroclaw (brew) and wire clashd policy integration for
each — zeroclaw webhook_audit, openclaw exec-approvals (restricted+ask),
opencode plugin stub. Supports --configure-only, --install-only, --agents.

* refactor: rename security-gateway→security-proxy, clarify model gateway naming

security-gateway crate renamed to security-proxy (binary, package name,
workspace deps, crate path). Default port changed 8080→8888 to avoid
conflict with the model gateway (8080). Adds SECURITY_PROXY_PORT env var
override consistent with CLASHD_PORT.

Removes misleading "Alloy proxy/Alloy Model Proxy Server" labels from
the model gateway (proxy/mod.rs, handlers.rs, backend.rs) — alloys are
one routing feature; the gateway also includes Traceloop observability,
retries, Helicone, multi-provider routing, and local model management.

* feat: add unified install.sh — builds all binaries, wires all agents

Single entrypoint replacing setup-claude-hooks.sh + setup-agents.sh.
Builds zeroclawed, clashd, security-proxy (release), installs to
~/.local/bin/, creates launchd services for both clashd and security-proxy.

Wires clashd policy hooks for all four agents: Claude Code (PreToolUse
hook), opencode (plugin stub), openclaw (exec-approvals), zeroclaw
(webhook_audit + autonomy). Prompts before installing any missing tool
(--yes skips prompts; --configure-only skips installs entirely).

Correctly detects zeroclaw needing onboard before attempting service start.

* feat(install): add multi-node SSH deployment for Proxmox/homelab clusters

install.sh gains --nodes-file <path> and --nodes-only flags.
For each node in the JSON config: cross-compile locally (cross/zigbuild)
or fall back to building on the remote via SSH; rsync binary + policy
files; install systemd service (Linux) or launchd plist (macOS).

Adds deploy/nodes.example.json documenting the node config format
(host, user, ssh_key, arch, os, services, install_dir, config_dir).
Supports x86_64/aarch64 Linux and macOS nodes from one build machine.

* fix: resolve all clippy and fmt failures from CI

- security-proxy: security_gateway → security_proxy in integration tests
- adversary-detector: sort_by → sort_by_key
- host-agent/pct: collapse nested if into match guard
- zeroclawed: redundant field names (shorthand), manual checked_div,
  strip_suffix, collapsible if let in matrix.rs, useless vec!, assert!
  for bool, let-else for single-variant match in main.rs (extracted
  into_in_memory() method on UnifiedContextStore to avoid irrefutable
  pattern lint across feature configurations)

* fix: rename security-gateway to security-proxy in CI workflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address all active Copilot review comments

- Delete three orphaned files never wired into the module tree
  (retry_gateway.rs, auth_test.rs, delegation.rs)
- Read mock channel control_port from config (was hardcoded 9090)
- Document mock /send as an intentional stub with clear scope note
- Remove unused context object in clashd claude-code hook; log
  cwd/session_id directly in the tracing fields instead
- Fix setup-claude-hooks.sh: propagate cargo build failures instead
  of masking with || true; gate launchctl section on macOS
- Share a single reqwest::Client across voice forward calls (OnceLock)
  with per-request timeout; add 30s timeout to run_hook subprocess
- Stop sending "Bearer no-key" to unauthenticated HTTP backends:
  pass None through BackendConfig and skip Authorization header
  when api_key is empty
- Fix README: model routing patterns (llama* → llama/*), local_models
  schema (path/backend/port → hf_id/provider_type + mlx_lm section)
- Add comment explaining persistent-context feature is intentionally
  deferred until plumbed through all channel/handler call sites

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@bglusman bglusman deleted the rebase/proxy-voice-onto-main branch May 1, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants