Skip to content

feat(zeph-llm): GonkaProvider chat / chat_stream / embed (signed transport) #3611

@bug-ops

Description

@bug-ops

Part of epic #3602.

Scope

Implement GonkaProvider covering the non-tools subset of LlmProvider: chat, chat_stream, embed, embed_batch. Tool-calling and structured output land in #10.

Design

GonkaProvider owns:

  • inner: OpenAiProvider — used only for body construction (debug_request_json) and supports flags (e.g., supports_embeddings).
  • signer: Arc<RequestSigner>
  • pool: Arc<EndpointPool>
  • client: reqwest::Client from crate::http::llm_client()
  • timeout: Duration

For each call:

  1. Build body via inner.debug_request_json(messages, &[], stream).
  2. body_bytes = serde_json::to_vec(&body)?.
  3. Loop: pick endpoint, sign with fresh timestamp_ns, POST with raw bytes via RequestBuilder::body(body_bytes.clone()), on retry re-sign with a fresh timestamp.
  4. Decode response via shared openai/wire.rs types (extract is in this PR).
  5. Streaming: reuse crate::sse::openai_sse_to_stream(response) after the signed POST.

Files to create / modify

  • crates/zeph-llm/src/openai/wire.rs (new) — extract OpenAiChatResponse, OpenAiUsage, ChatChoice, ChatMessage, EmbeddingResponse, EmbeddingData from openai/mod.rs (currently pub(crate)); make pub(crate) and re-import in both modules.
  • crates/zeph-llm/src/gonka/mod.rsGonkaProvider struct, partial LlmProvider impl (chat / chat_stream / embed / embed_batch / name / model_identifier / supports_*).
  • crates/zeph-llm/src/gonka/tests.rs — wiremock test:
    • Records inbound request, verifies Authorization is base64 of 64 bytes, X-Timestamp is decimal nanoseconds, X-Requester-Address matches the signer.
    • Returns canned OpenAI-shaped response; asserts decoded ChatResponse::Text matches.
    • Streaming test: returns SSE; asserts streamed deltas decoded correctly.
  • insta snapshot of the signed-request body for drift detection.

Await discipline (per .claude/rules/rust-code.md)

  • Every external .await wrapped in tokio::time::timeout(self.timeout, ...).
  • Tracing spans: llm.gonka.request, llm.gonka.sign, llm.gonka.endpoint.next.
  • debug! before / after each await.
  • No locks held across await; signer is shared via Arc and signing is &self.

Acceptance

  • cargo nextest run -p zeph-llm -E 'test(gonka)' green.
  • cargo clippy --workspace --features full -- -D warnings green.
  • cargo +nightly fmt --check green.
  • cargo insta test --workspace --features full --check --lib --bins green.
  • All pub items have /// doc comments with # Examples.
  • CHANGELOG.md [Unreleased] updated.

Depends on

#3607, #3608, #3609, #3610.

Size

L (~8h)

Metadata

Metadata

Assignees

Labels

P1High ROI, low complexity — do next sprintenhancementNew feature or requestfeatureNew functionalityllmzeph-llm crate (Ollama, Claude)size/LLarge PR (201-500 lines)streamingLLM response streaming

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions