Skip to content

Refactor clients.py to add modern features and more flexible configuration#459

Merged
VVoruganti merged 46 commits into
mainfrom
vineeth/dev-1454-full
Apr 20, 2026
Merged

Refactor clients.py to add modern features and more flexible configuration#459
VVoruganti merged 46 commits into
mainfrom
vineeth/dev-1454-full

Conversation

@VVoruganti

@VVoruganti VVoruganti commented Mar 28, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • Configuration Updates

    • Switched to per-feature nested MODEL_CONFIG env keys with per-feature base‑URL/API‑key overrides; added explicit embedding settings (vector dims, token limits) and simplified OpenAI API‑key guidance.
  • New Features

    • Added transport-backed support (OpenAI/Anthropic/Gemini), per-feature thinking‑budget/model overrides, structured-output validation & repair, Gemini prompt‑caching, and runtime embedding/vector-store dimension checks.
  • Removed

    • Deprecated legacy flat provider env keys and obsolete provider types.
  • Docs & Tests

    • Updated docs and expanded unit and live integration tests.

VVoruganti and others added 3 commits March 26, 2026 16:24
… and Gemini thinking budget support

LengthFinishReasonError from OpenAI-compatible providers (custom, openai, groq) was crashing the deriver
with 14k+ occurrences in production. The vLLM path already had repair logic but it was gated on
provider=="vllm", unreachable when routing through litellm as a custom provider.

- Extract shared _repair_response_model_json() helper for all providers
- Catch LengthFinishReasonError in OpenAI/custom parse() path and repair truncated JSON
- Add repair fallback to Anthropic and Gemini response_model paths
- Add repair fallback to Groq response_model path
- Pass thinking_budget_tokens to Gemini 2.5 models via thinking_config
- Add 14 tests covering repair paths for all providers and Gemini thinking budget

Fixes HONCHO-YC

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Mar 28, 2026

Copy link
Copy Markdown
Contributor

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b175a974-a5ad-4958-88dc-f5c89f980291

📥 Commits

Reviewing files that changed from the base of the PR and between 7020d8c and a151bf6.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • CHANGELOG.md
  • README.md
  • docs/v3/guides/integrations/paperclip.mdx
  • pyproject.toml

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.


Walkthrough

Restructures runtime/configuration to per-feature nested MODEL_CONFIGs; introduces a centralized src/llm package (honcho_llm_call) with provider backends (Anthropic/Gemini/OpenAI), request/runtime planning, structured-output validation/repair, Gemini caching, embedding config/validation, registry, and widespread callsite/test migrations to ModelConfig-based APIs.

Changes

Cohort / File(s) Summary
Config templates & docs
\.env.template, config.toml.example, docs/v3/contributing/configuration.mdx, docs/v3/contributing/self-hosting.mdx, docs/v3/contributing/troubleshooting.mdx, README.md
Replace flat PROVIDER/MODEL env vars with nested *_MODEL_CONFIG__* keys; add embedding VECTOR_DIMENSIONS/MAX_INPUT_TOKENS/MAX_TOKENS_PER_REQUEST; document supported transports and per-feature base_url/API-key overrides.
Core settings & schema
src/config.py
Add ModelConfig/ConfiguredModelSettings/EmbeddingSettings, legacy provider normalization, thinking-budget validation, resolve_model_config/resolve_embedding_model_config, and embedding→vector-store dimension propagation with validation.
LLM public API & types
src/llm/__init__.py, src/llm/types.py
New LLM package public surface and typed models for honcho calls, streaming chunks, reasoning/verbosity types, and ProviderClient union.
Orchestration & runtime
src/llm/api.py, src/llm/executor.py, src/llm/runtime.py, src/llm/tool_loop.py, src/llm/request_builder.py
Add honcho_llm_call orchestration (attempt planning, retries, trace/track hooks), single-attempt executor, tool-loop orchestration, per-attempt planning and request-building helpers.
Backend interfaces & registry
src/llm/backend.py, src/llm/registry.py, src/llm/credentials.py
Introduce ProviderBackend protocol and normalized CompletionResult/StreamChunk/ToolCallResult; client factories, cached override clients, credential resolution, and get_backend/client_for_model_config helpers.
Provider implementations
src/llm/backends/...
src/llm/backends/anthropic.py, .../gemini.py, .../openai.py, .../__init__.py
Add Anthropic/Gemini/OpenAI backend adapters implementing complete/stream, structured-output parsing, thinking-config translation, Gemini cached-content integration and schema sanitization.
Structured output & caching
src/llm/structured_output.py, src/llm/caching.py
Structured-output validation/repair APIs and policies; deterministic Gemini cache-key builder and in-memory LRU cache store.
Embedding client & vector-store
src/embedding_client.py, src/vector_store/lancedb.py
Embedding client becomes config-driven (EmbeddingModelConfig), removes OpenRouter branch, enforces returned vector-dimension validation; LanceDB schema uses configured embedding.vector_dimensions.
Callsite migrations
src/deriver/deriver.py, src/dialectic/core.py, src/dreamer/specialists.py, src/utils/summarizer.py
Callsites updated to pass model_config/ConfiguredModelSettings instead of legacy llm_settings and per-call args; output-token and thinking controls moved into model-config.
Client surface & utils rewrite
src/utils/clients.py, src/utils/types.py, src/telemetry/reasoning_traces.py
Refactor to runtime model-config selection, backend+history-adapter flow; remove legacy providers (groq/custom/vllm) and migrate telemetry to use model_config fields.
Embedding refs & validations
src/crud/*, src/schemas/api.py, src/utils/*, tests/integration/test_message_embeddings.py
Replace legacy embedding token-limit refs with settings.EMBEDDING.MAX_INPUT_TOKENS; adjust messages and propagate EMBEDDING.VECTOR_DIMENSIONS.
Tests & live infra
pyproject.toml, tests/**, tests/live_llm/**
Remove groq dep; add pytest markers; many new/updated tests for backends, request flow, structured-output repair, caching, FakeBackend fixture, live-LLM matrix and helpers; update mocks to new src.llm API.
Misc & housekeeping
various small files
Minor logging/test-fixture signature changes, stricter agent tool schemas, small docs/template edits and some test deletions/additions.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(220,240,255,0.5)
  participant Client as Caller
  participant Orch as honcho_llm_call
  participant Planner as RuntimePlanner
  participant Registry as BackendRegistry
  participant Backend as ProviderBackend
  end

  Client->>Orch: honcho_llm_call(model_config, prompt, ...)
  Orch->>Planner: plan_attempt / effective_config_for_call
  Orch->>Registry: client_for_model_config / get_backend
  Orch->>Backend: execute_completion/stream(model, messages, max_tokens, extra_params...)
  Backend-->>Orch: CompletionResult / StreamChunks
  Orch-->>Client: HonchoLLMCallResponse / StreamingResponseWithMetadata
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • Rajat-Ahuja1997
  • dr-frmr

Poem

🐰 I hopped through configs, tidy and bright,
New MODEL_CONFIGs set each feature right.
Backends hum—Anthropic, Gemini, OpenAI—
Caches and schemas neat beneath the sky.
The rabbit cheers the refactor's clear light!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch vineeth/dev-1454-full

@VVoruganti VVoruganti marked this pull request as ready for review March 30, 2026 15:13

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/embedding_client.py (1)

29-51: ⚠️ Potential issue | 🟠 Major

Reject unsupported embedding providers instead of silently routing them to OpenAI.

Lines 32-51 and 365-369 currently treat every non-gemini value as OpenAI. A typo or a lingering legacy provider value will therefore instantiate the OpenAI client with the wrong credentials/base URL and fail much later than the actual config mistake.

🐛 Suggested fix
+from src.exceptions import ValidationException
+
 from .config import settings
-        else:  # openai
+        elif self.provider == "openai":
             if api_key is None:
                 api_key = settings.LLM.OPENAI_API_KEY
             if not api_key:
                 raise ValueError("OpenAI API key is required")
             self.client = AsyncOpenAI(api_key=api_key)
             self.model = "text-embedding-3-small"
             self.max_embedding_tokens = settings.MAX_EMBEDDING_TOKENS
             self.max_batch_size = 2048  # OpenAI batch limit
+        else:
+            raise ValidationException(
+                f"Unsupported embedding provider: {self.provider}"
+            )
-                    if provider == "gemini":
-                        api_key = settings.LLM.GEMINI_API_KEY
-                    else:
-                        api_key = settings.LLM.OPENAI_API_KEY
+                    if provider == "gemini":
+                        api_key = settings.LLM.GEMINI_API_KEY
+                    elif provider == "openai":
+                        api_key = settings.LLM.OPENAI_API_KEY
+                    else:
+                        raise ValidationException(
+                            f"Unsupported embedding provider: {provider}"
+                        )

As per coding guidelines, "Use explicit error handling with appropriate exception types" and "Use custom exceptions defined in src/exceptions.py with specific exception types (ResourceNotFoundException, ValidationException, etc.)".

Also applies to: 362-373

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/embedding_client.py` around lines 29 - 51, The constructor (__init__)
currently treats any non-"gemini" provider as OpenAI, which hides configuration
typos; change __init__ to explicitly accept only supported providers ("gemini"
and "openai") and raise a ValidationException (from src.exceptions import
ValidationException) for any other value of self.provider, ensuring you do not
silently instantiate AsyncOpenAI for unknown providers; keep the existing gemini
and openai branches (symbols: self.provider, genai.Client, AsyncOpenAI,
self.model, self.client, self.max_embedding_tokens, self.max_batch_size) but add
the explicit else that raises ValidationException with a clear message. Also
apply the same explicit-provider validation and exception raising to the other
provider-handling block in this file (the second location handling provider
around lines ~362-373) so both paths reject unsupported providers instead of
defaulting to OpenAI.
src/dreamer/specialists.py (1)

207-228: ⚠️ Potential issue | 🟠 Major

Configured specialist token caps are still a no-op here.

This path always passes an explicit max_tokens=self.get_max_tokens(), so the new DREAM_MODEL_CONFIG.max_output_tokens / specialist inherited overrides never influence the actual LLM request.

🔧 Minimal fix
         model_config = self.get_model_config()
+        max_tokens = model_config.max_output_tokens or self.get_max_tokens()
 
         # Track iterations via callback
         iteration_count = 0
@@
         response: HonchoLLMCallResponse[str] = await honcho_llm_call(
             model_config=model_config,
             prompt="",  # Ignored since we pass messages
-            max_tokens=self.get_max_tokens(),
+            max_tokens=max_tokens,
             tools=self.get_tools(peer_card_enabled=peer_card_enabled),
             tool_choice=None,
             tool_executor=tool_executor,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dreamer/specialists.py` around lines 207 - 228, The code always passes
max_tokens=self.get_max_tokens() into honcho_llm_call, which ignores per-model
or specialist overrides (e.g. DREAM_MODEL_CONFIG.max_output_tokens); update the
honcho_llm_call invocation in this function so max_tokens is derived from the
resolved model_config (use model_config.max_output_tokens when set, falling back
to self.get_max_tokens()), thereby ensuring get_model_config() and any
specialist overrides actually control the LLM request.
🧹 Nitpick comments (10)
tests/unit/llm/test_backends/test_openai.py (1)

65-68: Unusual return type annotation formatting.

The function return type annotation is split across multiple lines in an unconventional way. While technically valid Python, it's non-standard.

-async def test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models() -> (
-    None
-):
+async def test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models() -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/llm/test_backends/test_openai.py` around lines 65 - 68, The test
function test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models
has its return type annotation split across lines; consolidate the annotation so
the signature is on a single line (e.g., move "-> None" to the same line as the
def) to match standard Python formatting and improve readability.
tests/unit/llm/conftest.py (1)

9-26: Clean test double implementation.

The FakeBackend provides a useful test helper for capturing call arguments and serving canned responses. One consideration: if tests don't queue enough responses, next(self._responses) will raise StopIteration, which will bubble up as-is. This is generally acceptable for tests (failing clearly), but a more descriptive error could aid debugging.

💡 Optional: wrap with descriptive error
     async def complete(self, **kwargs: Any) -> CompletionResult:
         self.calls.append(kwargs)
-        return next(self._responses)
+        try:
+            return next(self._responses)
+        except StopIteration:
+            raise RuntimeError("FakeBackend: no more queued responses") from None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/llm/conftest.py` around lines 9 - 26, The FakeBackend test double
currently calls next(self._responses) in both complete and stream which will
raise StopIteration with no context if tests run out of canned responses; update
FakeBackend (constructor: _responses, methods: complete and stream) to catch
StopIteration around next(self._responses) and raise a more descriptive error
(e.g. RuntimeError or AssertionError) that includes the method name and a hint
that no more fake responses were queued, so test failures clearly indicate
missing canned responses.
src/llm/__init__.py (1)

55-80: Don't memoize override clients by raw credential tuples.

@cache keeps every (base_url, api_key) pair for the full process lifetime. If these overrides are test-, tenant-, or rotation-specific, this turns into an unbounded client cache and keeps secret-bearing keys resident in memory longer than necessary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/__init__.py` around lines 55 - 80, The override client factory
functions (_get_openai_override_client, _get_anthropic_override_client,
_get_groq_override_client, _get_gemini_override_client) must not be memoized by
raw (base_url, api_key) tuples because that caches secret-bearing keys for the
process lifetime; remove the `@cache` decorator from those functions and make them
return a fresh client per call (or replace with a bounded/ephemeral cache if you
truly need caching), and update callers to create/use/close clients per
operation rather than relying on a global persistent cache.
src/utils/clients.py (1)

616-638: Groq provider uses OpenAIHistoryAdapter by default.

_history_adapter_for_provider returns OpenAIHistoryAdapter for both "openai" and "groq" providers (via the fallback return). This is likely intentional since Groq uses OpenAI-compatible APIs, but consider adding an explicit case for clarity.

💡 Explicit groq handling
 def _history_adapter_for_provider(
     provider: SupportedProviders,
 ) -> AnthropicHistoryAdapter | GeminiHistoryAdapter | OpenAIHistoryAdapter:
     if provider == "anthropic":
         return AnthropicHistoryAdapter()
     if provider == "google":
         return GeminiHistoryAdapter()
+    # Groq uses OpenAI-compatible message format
     return OpenAIHistoryAdapter()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/clients.py` around lines 616 - 638, The
_history_adapter_for_provider function currently falls through to
OpenAIHistoryAdapter for both "openai" and "groq"; make the Groq handling
explicit by adding an if branch for provider == "groq" that returns
OpenAIHistoryAdapter(), keeping the existing return type and behavior; update
the function body in _history_adapter_for_provider to include the explicit
"groq" case (so callers like _history_adapter_for_provider("groq") clearly
return OpenAIHistoryAdapter).
src/llm/request_builder.py (1)

102-152: Consider extracting shared parameter-building logic to reduce duplication.

execute_stream duplicates the entire parameter-building logic from execute_completion. This could be extracted into a shared helper.

♻️ Suggested extraction (conceptual)
def _build_request_params(
    config: ModelConfig,
    max_tokens: int,
    cache_policy: PromptCachePolicy | None,
    extra_params: dict[str, Any] | None,
) -> tuple[ModelCapabilities, dict[str, Any], int, int | None, dict[str, Any]]:
    capabilities = get_model_capabilities(config)
    credentials = resolve_credentials(config)
    thinking_budget_tokens = (
        config.thinking_budget_tokens
        if config.thinking_budget_tokens is not None
        and config.thinking_budget_tokens > 0
        else None
    )
    requested_output_tokens = config.max_output_tokens or max_tokens
    effective_max_tokens = _adjust_max_tokens_for_explicit_budget(
        capabilities, requested_output_tokens, thinking_budget_tokens
    )
    merged_extra_params = {
        **_build_config_extra_params(config),
        **(extra_params or {}),
    }
    if cache_policy is not None:
        merged_extra_params["cache_policy"] = cache_policy
    return capabilities, credentials, effective_max_tokens, thinking_budget_tokens, merged_extra_params
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/request_builder.py` around lines 102 - 152, execute_stream duplicates
the parameter-building logic from execute_completion; extract that shared logic
into a helper (e.g., _build_request_params) that returns capabilities,
credentials, effective_max_tokens, thinking_budget_tokens, and
merged_extra_params. Replace the duplicated block in execute_stream with a call
to the new helper and pass returned values into backend.stream; do the same
refactor in execute_completion so both functions reuse the helper and avoid
divergent logic (refer to execute_stream, execute_completion,
_adjust_max_tokens_for_explicit_budget, _build_config_extra_params, and
resolve_credentials).
src/llm/backends/openai.py (1)

263-274: Model-specific branching uses string containment check.

The "gpt-5" in model check is fragile and may match unintended models (e.g., hypothetical "not-gpt-5-model"). Consider a more explicit check or extracting this to a model capabilities lookup.

💡 Alternative approach
GPT5_MODEL_PREFIXES = ("gpt-5", "o1-", "o3-")  # Models requiring max_completion_tokens

def _is_reasoning_model(model: str) -> bool:
    return any(model.startswith(prefix) for prefix in GPT5_MODEL_PREFIXES)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/openai.py` around lines 263 - 274, The model check using
`"gpt-5" in model` is fragile; replace it with an explicit capability lookup
helper (e.g., add a GPT5_MODEL_PREFIXES tuple and implement a function like
_is_reasoning_model(model: str) that returns any(model.startswith(prefix) for
prefix in GPT5_MODEL_PREFIXES)) and then use that helper in the openai.py
branching where the code sets params["max_completion_tokens"] vs
params["max_tokens"] (the block containing the "gpt-5" in model check and the
thinking_effort/verbosity handling); update references to
extra_params/temperature logic to use the same helper so only intended models
use max_completion_tokens and reasoning flags.
src/llm/backends/groq.py (1)

58-76: Add defensive check for empty choices array.

Line 59 accesses response.choices[0] without verifying that choices exist. While unlikely in normal operation, a malformed response could cause an IndexError.

🛡️ Defensive check
         response = await self._client.chat.completions.create(**params)
-        if response.choices[0].message.content is None:
+        if not response.choices:
+            raise ValueError("No choices in response")
+        if response.choices[0].message.content is None:
             raise ValueError("No content in response")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/groq.py` around lines 58 - 76, Add a defensive check that
response.choices is non-empty before accessing response.choices[0] in the chat
completion handling after calling self._client.chat.completions.create; if
choices is empty or None, raise a clear ValueError (or handle gracefully) with
context (e.g., include response metadata) instead of indexing into it. Update
the block that reads response.choices[0].message.content,
response.choices[0].finish_reason, and the JSON parsing/repair code paths
(response_format.model_validate and repair_response_model_json) to only run
after this validation so you never trigger IndexError on a malformed response.
tests/live_llm/test_live_gemini.py (1)

146-157: Consider iterable unpacking for cleaner list construction.

The static analysis tool flags this concatenation pattern. Using unpacking is slightly more idiomatic and avoids intermediate list creation.

♻️ Suggested refactor
-    replay_messages = initial_messages + [
-        adapter.format_assistant_tool_message(first),
-        *adapter.format_tool_results(
-            [
-                {
-                    "tool_id": tool_call.id,
-                    "tool_name": tool_call.name,
-                    "result": tool_result,
-                }
-            ]
-        ),
-    ]
+    replay_messages = [
+        *initial_messages,
+        adapter.format_assistant_tool_message(first),
+        *adapter.format_tool_results(
+            [
+                {
+                    "tool_id": tool_call.id,
+                    "tool_name": tool_call.name,
+                    "result": tool_result,
+                }
+            ]
+        ),
+    ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/live_llm/test_live_gemini.py` around lines 146 - 157, Replace the
current list concatenation when building replay_messages with iterable unpacking
to avoid creating intermediate lists: spread initial_messages, then include
adapter.format_assistant_tool_message(first) as an element, and spread the
result of adapter.format_tool_results(...) so the final construction uses
[*initial_messages, adapter.format_assistant_tool_message(first),
*adapter.format_tool_results(...)] while keeping the same input dict for
tool_call (tool_call.id, tool_call.name, tool_result) to preserve behavior.
.env.template (1)

97-111: Minor: Variable ordering could be more consistent.

DERIVER_DEDUPLICATE (line 100) is interleaved within the MODEL_CONFIG variables, breaking the logical grouping. Consider moving it before or after the MODEL_CONFIG block for better readability.

📝 Suggested reordering
 # DERIVER_MODEL_CONFIG__TRANSPORT=gemini
 # DERIVER_MODEL_CONFIG__MODEL=gemini-2.5-flash-lite
 # DERIVER_MODEL_CONFIG__TEMPERATURE=
-# DERIVER_DEDUPLICATE=true
 # DERIVER_MODEL_CONFIG__MAX_OUTPUT_TOKENS=4096
 # DERIVER_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
+# DERIVER_DEDUPLICATE=true
 # DERIVER_LOG_OBSERVATIONS=false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.env.template around lines 97 - 111, Move the DERIVER_DEDUPLICATE variable
out of the middle of the DERIVER_MODEL_CONFIG__* block so related MODEL_CONFIG
keys stay grouped; specifically, relocate DERIVER_DEDUPLICATE (currently between
DERIVER_MODEL_CONFIG__TEMPERATURE and DERIVER_MODEL_CONFIG__MAX_OUTPUT_TOKENS)
to either just before the DERIVER_MODEL_CONFIG__* block or immediately after it,
keeping all DERIVER_MODEL_CONFIG__* entries contiguous for clearer grouping and
ordering.
src/llm/backends/anthropic.py (1)

74-90: Consider extracting duplicated schema injection logic.

The schema injection (schema_json creation and appending to message content) is duplicated in lines 80-84 and 86-90. While the conditions differ, the core logic is identical.

♻️ Optional refactor to reduce duplication
+        # Inject JSON schema into the last message if response_format is a Pydantic model
+        if response_format and isinstance(response_format, type):
+            schema_json = json.dumps(response_format.model_json_schema(), indent=2)
+            params["messages"][-1]["content"] += (
+                f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
+            )
+
         use_json_prefill = (
             bool(response_format or self._json_mode(extra_params))
             and not thinking_budget_tokens
             and self._supports_assistant_prefill(model)
         )
         if use_json_prefill:
-            if response_format and isinstance(response_format, type):
-                schema_json = json.dumps(response_format.model_json_schema(), indent=2)
-                params["messages"][-1]["content"] += (
-                    f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
-                )
             params["messages"].append({"role": "assistant", "content": "{"})
-        elif response_format and isinstance(response_format, type):
-            schema_json = json.dumps(response_format.model_json_schema(), indent=2)
-            params["messages"][-1]["content"] += (
-                f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
-            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/anthropic.py` around lines 74 - 90, Move the duplicated
schema injection into a small helper and call it from both branches: create a
helper (e.g., _append_schema_to_last_message or build_and_append_schema) that
takes response_format and params (and uses response_format.model_json_schema(),
json.dumps, and appends to params["messages"][-1]["content"]); then replace the
duplicate blocks in the use_json_prefill branch and the else branch with calls
to that helper. Ensure the helper is only invoked when response_format is truthy
and a type, and keep the existing use_json_prefill logic (including appending
the assistant "{" when applicable) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/v3/contributing/configuration.mdx`:
- Line 402: The example line "DERIVER_MODEL_CONFIG__TEMPERATURE=  # Optional
temperature override (unset by default)" currently assigns an empty string;
change the docs to avoid a blank env assignment by either removing the
assignment or showing it commented out and providing a concrete example value —
e.g. replace with a note that the variable is unset by default and show
"DERIVER_MODEL_CONFIG__TEMPERATURE=0.7" as an example, or present "#
DERIVER_MODEL_CONFIG__TEMPERATURE=" to indicate omission; ensure the symbol
DERIVER_MODEL_CONFIG__TEMPERATURE is referenced and the text explicitly says
"omit the variable to leave it unset" rather than showing a blank assignment.

In `@src/config.py`:
- Around line 601-604: The issue is that nested MODEL_CONFIG blocks get fully
replaced by TOML/env partial overrides instead of being merged, causing
validation failures when only a few nested keys (e.g., MAX_TOOL_ITERATIONS or
TOOL_CHOICE) are supplied. Fix by giving MODEL_CONFIG a proper default instance
so Pydantic will merge values rather than drop the whole object: change the
annotation to a concrete field with a default_factory that constructs a
ConfiguredModelSettings (e.g., replace MODEL_CONFIG:
Annotated[ConfiguredModelSettings, Field(validation_alias="model_config")] with
a field like model_config: ConfiguredModelSettings =
Field(default_factory=ConfiguredModelSettings, validation_alias="model_config")
and apply the same pattern for other nested setting blocks referenced in
AppSettings to ensure partial overrides merge correctly.
- Around line 20-21: The dotenv loading currently calls
load_dotenv(override=True) which lets .env values overwrite existing environment
variables; update the call in src/config.py (the load_dotenv invocation guarded
by os.getenv("PYTHON_DOTENV_DISABLED")) to use override=False so that existing
environment variables take precedence over .env file entries, preserving
deployment-time configuration order.

In `@src/llm/backends/anthropic.py`:
- Around line 82-85: The code mutates the incoming request_messages by directly
editing params["messages"][-1]["content"] and appending to params["messages"],
which can produce side effects; fix it by creating a copy of the messages before
modification (e.g., make params["messages"] = [m.copy() for m in
request_messages] or use copy.deepcopy(request_messages]) and then modify the
copied last message content and append the assistant message to the copied list
so request_messages remains unchanged; update the logic around
params["messages"] and any use of request_messages to operate on the copied list
instead.
- Around line 149-153: The stream() implementation uses a different JSON-prefill
check than complete(): update stream() to compute use_json_prefill the same way
as complete() by calling self._json_mode(extra_params) (in addition to checking
response_format, thinking_budget_tokens, and
self._supports_assistant_prefill(model)), and ensure extra_params is not deleted
before this check (or pass the original extra_params into _json_mode) so JSON
prefill triggers when extra_params={"json_mode": True}; reference the stream()
and complete() functions, the use_json_prefill variable, and the helper
_json_mode(extra_params) when making the change.

In `@src/llm/backends/gemini.py`:
- Around line 192-200: The code currently treats thinking_budget_tokens and
thinking_effort with truthiness checks which drop explicit zero values (e.g.,
thinking_budget_tokens = 0) so Gemini's thinking_config is omitted; update the
conditional logic in gemini.py around thinking_config, thinking_budget_tokens,
and thinking_effort to test for presence explicitly (e.g., use "is not None" or
explicit comparisons) rather than truthiness, so zeros are preserved when set,
and keep the existing mutual-exclusion check (len(thinking_config) > 1) intact.
- Around line 68-86: When cache_policy is a PromptCachePolicy (detected in
cache_policy and handled by _attach_cached_content), avoid re-sending the full
cached prompt to self._client.aio.models.generate_content: after
_attach_cached_content returns (it provides the cached_content handle), filter
or replace the original contents passed to generate_content so it contains only
the new/suffix messages (not the already-cached messages). Update the flow
around the cache check in complete() to detect PromptCachePolicy, obtain the
cached_content from _attach_cached_content, compute the suffix/new messages from
the original contents, and pass only that suffix as contents to generate_content
(keeping config and the cached_content handle as needed).

In `@src/llm/backends/groq.py`:
- Around line 38-42: Replace the silent deletion of tools and tool_choice in the
Groq backend with an explicit error: in src/llm/backends/groq.py where variables
tools and tool_choice are currently deleted (and alongside the existing check
for thinking_budget_tokens/thinking_effort), add a guard that raises
ValueError("Groq backend does not support tool calling") when either tools is
not None or tool_choice is not None; keep the existing ValueError for
thinking_budget_tokens/thinking_effort intact (look for the GroqBackend
constructor or the function handling these kwargs to locate the spot).

In `@src/llm/backends/openai.py`:
- Around line 146-159: The current bare except around the fallback path masks
error details; update the handler in the block that calls
_create_structured_response, _parse_or_repair_structured_content, and
_normalize_response to either catch specific exceptions you expect (e.g.,
ValueError, JSONDecodeError, OpenAIError) or, if you must catch all exceptions,
change it to except Exception as e: and log the exception (type and message)
using the module/logger (e.g., self.logger.exception or process_logger.error)
before proceeding to build the fallback_response so failures are visible in
logs.

In `@src/llm/caching.py`:
- Around line 45-65: The InMemoryGeminiCacheStore leaks expired entries because
eviction only happens in get(); update the set method (and/or constructor) to
purge expired handles before inserting to prevent unbounded growth: inside
InMemoryGeminiCacheStore.set (and while holding self._lock) iterate through
self._handles and remove any entries whose expires_at <=
datetime.now(timezone.utc) before assigning self._handles[handle.key] = handle
(ensure you use the same timezone-aware comparison as in get); this minimal
change reclaims expired entries on writes and prevents memory leaks for keys
that are never read again.

In `@src/llm/credentials.py`:
- Around line 3-25: The function _default_transport_credentials currently raises
a bare ValueError for unknown transports; change this to raise the project's
ValidationException (imported from src.exceptions) so misconfigurations use the
repo's typed exception system—update the imports at the top of
src/llm/credentials.py to include ValidationException, and replace the final
raise ValueError(f"Unknown transport: {transport}") with raise
ValidationException(f"Unknown transport: {transport}") so resolve_credentials
and callers receive the correct exception type.

In `@src/llm/structured_output.py`:
- Around line 82-92: The repair path currently only catches
StructuredOutputError in attempt_structured_output_repair when calling
repair_response_model_json, so pydantic ValidationError can still bubble and
bypass failure_policy; import and catch pydantic's ValidationError (e.g., from
pydantic import ValidationError) alongside StructuredOutputError and return None
on either exception so the failure_policy branches still run, and make the
identical change to the other equivalent try/except block later in the file that
follows the same pattern.

In `@src/utils/clients.py`:
- Around line 1542-1601: Remove the dead helper _repair_response_model_json (and
its pyright ignore) because the project uses the public
repair_response_model_json instead; delete the entire function definition for
_repair_response_model_json and any exclusively-used local references (and clean
up now-unused imports if they become unused after removal), ensuring all callers
use repair_response_model_json (from structured_output) instead.

In `@tests/live_llm/conftest.py`:
- Around line 38-45: The require_provider_key helper raises KeyError for
model_spec.provider == "groq" because the mapping only includes
"anthropic"/"openai"/"gemini"; update require_provider_key to include an
explicit "groq": bool(settings.LLM.GROQ_API_KEY) entry (and optionally handle
unknown providers by calling pytest.skip with a clear message) so that
require_provider_key and its use of model_spec.provider no longer blow up for
Groq LiveModelSpec.

In `@tests/unit/llm/test_model_config.py`:
- Around line 254-267: The test test_env_template_uses_nested_model_config_keys
is asserting the wrong env var prefix for nested dialectic settings; update the
assertions to expect the double-underscore nested form
(DIALECTIC__LEVELS__minimal__MODEL_CONFIG__MODEL and
DIALECTIC__LEVELS__minimal__PROVIDER=) instead of DIALECTIC_LEVELS__... so the
test validates the correct nested override key shape for
settings.DIALECTIC.LEVELS; change both occurences in that test to use
DIALECTIC__LEVELS__minimal__... to match the new per-level provider
configuration naming convention.

In `@tests/unit/llm/test_request_builder.py`:
- Around line 13-32: The async test functions (e.g.,
test_gemini_explicit_budget_adjusts_transport_max_tokens and the other three
async def tests in tests/unit/llm/test_request_builder.py) are missing the
`@pytest.mark.asyncio` decorator; add `@pytest.mark.asyncio` above each async test
function and ensure pytest.mark is imported (import pytest) at the top of the
file so pytest will properly await the coroutines during test runs.

---

Outside diff comments:
In `@src/dreamer/specialists.py`:
- Around line 207-228: The code always passes max_tokens=self.get_max_tokens()
into honcho_llm_call, which ignores per-model or specialist overrides (e.g.
DREAM_MODEL_CONFIG.max_output_tokens); update the honcho_llm_call invocation in
this function so max_tokens is derived from the resolved model_config (use
model_config.max_output_tokens when set, falling back to self.get_max_tokens()),
thereby ensuring get_model_config() and any specialist overrides actually
control the LLM request.

In `@src/embedding_client.py`:
- Around line 29-51: The constructor (__init__) currently treats any
non-"gemini" provider as OpenAI, which hides configuration typos; change
__init__ to explicitly accept only supported providers ("gemini" and "openai")
and raise a ValidationException (from src.exceptions import ValidationException)
for any other value of self.provider, ensuring you do not silently instantiate
AsyncOpenAI for unknown providers; keep the existing gemini and openai branches
(symbols: self.provider, genai.Client, AsyncOpenAI, self.model, self.client,
self.max_embedding_tokens, self.max_batch_size) but add the explicit else that
raises ValidationException with a clear message. Also apply the same
explicit-provider validation and exception raising to the other
provider-handling block in this file (the second location handling provider
around lines ~362-373) so both paths reject unsupported providers instead of
defaulting to OpenAI.

---

Nitpick comments:
In @.env.template:
- Around line 97-111: Move the DERIVER_DEDUPLICATE variable out of the middle of
the DERIVER_MODEL_CONFIG__* block so related MODEL_CONFIG keys stay grouped;
specifically, relocate DERIVER_DEDUPLICATE (currently between
DERIVER_MODEL_CONFIG__TEMPERATURE and DERIVER_MODEL_CONFIG__MAX_OUTPUT_TOKENS)
to either just before the DERIVER_MODEL_CONFIG__* block or immediately after it,
keeping all DERIVER_MODEL_CONFIG__* entries contiguous for clearer grouping and
ordering.

In `@src/llm/__init__.py`:
- Around line 55-80: The override client factory functions
(_get_openai_override_client, _get_anthropic_override_client,
_get_groq_override_client, _get_gemini_override_client) must not be memoized by
raw (base_url, api_key) tuples because that caches secret-bearing keys for the
process lifetime; remove the `@cache` decorator from those functions and make them
return a fresh client per call (or replace with a bounded/ephemeral cache if you
truly need caching), and update callers to create/use/close clients per
operation rather than relying on a global persistent cache.

In `@src/llm/backends/anthropic.py`:
- Around line 74-90: Move the duplicated schema injection into a small helper
and call it from both branches: create a helper (e.g.,
_append_schema_to_last_message or build_and_append_schema) that takes
response_format and params (and uses response_format.model_json_schema(),
json.dumps, and appends to params["messages"][-1]["content"]); then replace the
duplicate blocks in the use_json_prefill branch and the else branch with calls
to that helper. Ensure the helper is only invoked when response_format is truthy
and a type, and keep the existing use_json_prefill logic (including appending
the assistant "{" when applicable) unchanged.

In `@src/llm/backends/groq.py`:
- Around line 58-76: Add a defensive check that response.choices is non-empty
before accessing response.choices[0] in the chat completion handling after
calling self._client.chat.completions.create; if choices is empty or None, raise
a clear ValueError (or handle gracefully) with context (e.g., include response
metadata) instead of indexing into it. Update the block that reads
response.choices[0].message.content, response.choices[0].finish_reason, and the
JSON parsing/repair code paths (response_format.model_validate and
repair_response_model_json) to only run after this validation so you never
trigger IndexError on a malformed response.

In `@src/llm/backends/openai.py`:
- Around line 263-274: The model check using `"gpt-5" in model` is fragile;
replace it with an explicit capability lookup helper (e.g., add a
GPT5_MODEL_PREFIXES tuple and implement a function like
_is_reasoning_model(model: str) that returns any(model.startswith(prefix) for
prefix in GPT5_MODEL_PREFIXES)) and then use that helper in the openai.py
branching where the code sets params["max_completion_tokens"] vs
params["max_tokens"] (the block containing the "gpt-5" in model check and the
thinking_effort/verbosity handling); update references to
extra_params/temperature logic to use the same helper so only intended models
use max_completion_tokens and reasoning flags.

In `@src/llm/request_builder.py`:
- Around line 102-152: execute_stream duplicates the parameter-building logic
from execute_completion; extract that shared logic into a helper (e.g.,
_build_request_params) that returns capabilities, credentials,
effective_max_tokens, thinking_budget_tokens, and merged_extra_params. Replace
the duplicated block in execute_stream with a call to the new helper and pass
returned values into backend.stream; do the same refactor in execute_completion
so both functions reuse the helper and avoid divergent logic (refer to
execute_stream, execute_completion, _adjust_max_tokens_for_explicit_budget,
_build_config_extra_params, and resolve_credentials).

In `@src/utils/clients.py`:
- Around line 616-638: The _history_adapter_for_provider function currently
falls through to OpenAIHistoryAdapter for both "openai" and "groq"; make the
Groq handling explicit by adding an if branch for provider == "groq" that
returns OpenAIHistoryAdapter(), keeping the existing return type and behavior;
update the function body in _history_adapter_for_provider to include the
explicit "groq" case (so callers like _history_adapter_for_provider("groq")
clearly return OpenAIHistoryAdapter).

In `@tests/live_llm/test_live_gemini.py`:
- Around line 146-157: Replace the current list concatenation when building
replay_messages with iterable unpacking to avoid creating intermediate lists:
spread initial_messages, then include
adapter.format_assistant_tool_message(first) as an element, and spread the
result of adapter.format_tool_results(...) so the final construction uses
[*initial_messages, adapter.format_assistant_tool_message(first),
*adapter.format_tool_results(...)] while keeping the same input dict for
tool_call (tool_call.id, tool_call.name, tool_result) to preserve behavior.

In `@tests/unit/llm/conftest.py`:
- Around line 9-26: The FakeBackend test double currently calls
next(self._responses) in both complete and stream which will raise StopIteration
with no context if tests run out of canned responses; update FakeBackend
(constructor: _responses, methods: complete and stream) to catch StopIteration
around next(self._responses) and raise a more descriptive error (e.g.
RuntimeError or AssertionError) that includes the method name and a hint that no
more fake responses were queued, so test failures clearly indicate missing
canned responses.

In `@tests/unit/llm/test_backends/test_openai.py`:
- Around line 65-68: The test function
test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models has its
return type annotation split across lines; consolidate the annotation so the
signature is on a single line (e.g., move "-> None" to the same line as the def)
to match standard Python formatting and improve readability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 250f06dc-3d67-4269-844d-c6a531e7eeb6

📥 Commits

Reviewing files that changed from the base of the PR and between ef3426d and f9726ca.

📒 Files selected for processing (50)
  • .env.template
  • config.toml.example
  • docs/v3/contributing/configuration.mdx
  • pyproject.toml
  • src/config.py
  • src/deriver/deriver.py
  • src/dialectic/core.py
  • src/dreamer/specialists.py
  • src/embedding_client.py
  • src/llm/__init__.py
  • src/llm/backend.py
  • src/llm/backends/__init__.py
  • src/llm/backends/anthropic.py
  • src/llm/backends/gemini.py
  • src/llm/backends/groq.py
  • src/llm/backends/openai.py
  • src/llm/caching.py
  • src/llm/capabilities.py
  • src/llm/credentials.py
  • src/llm/history_adapters.py
  • src/llm/request_builder.py
  • src/llm/structured_output.py
  • src/telemetry/reasoning_traces.py
  • src/utils/clients.py
  • src/utils/summarizer.py
  • src/utils/types.py
  • tests/__init__.py
  • tests/conftest.py
  • tests/deriver/test_deriver_processing.py
  • tests/dialectic/test_model_config_usage.py
  • tests/dreamer/test_model_config_usage.py
  • tests/live_llm/README.md
  • tests/live_llm/__init__.py
  • tests/live_llm/conftest.py
  • tests/live_llm/model_matrix.py
  • tests/live_llm/test_live_anthropic.py
  • tests/live_llm/test_live_gemini.py
  • tests/live_llm/test_live_openai.py
  • tests/unit/llm/conftest.py
  • tests/unit/llm/test_backends/test_anthropic.py
  • tests/unit/llm/test_backends/test_gemini.py
  • tests/unit/llm/test_backends/test_openai.py
  • tests/unit/llm/test_capabilities.py
  • tests/unit/llm/test_credentials.py
  • tests/unit/llm/test_history_adapters.py
  • tests/unit/llm/test_model_config.py
  • tests/unit/llm/test_request_builder.py
  • tests/utils/test_clients.py
  • tests/utils/test_length_finish_reason.py
  • tests/utils/test_summarizer.py

Comment thread docs/v3/contributing/configuration.mdx Outdated
Comment thread src/config.py
Comment thread src/config.py
Comment thread src/llm/backends/anthropic.py Outdated
Comment thread src/llm/backends/anthropic.py
Comment thread src/llm/structured_output.py
Comment thread src/utils/clients.py Outdated
Comment thread tests/live_llm/conftest.py
Comment thread tests/llm/test_model_config.py
Comment thread tests/unit/llm/test_request_builder.py Outdated
Comment thread src/embedding_client.py Outdated
Comment thread src/config.py
Comment thread src/utils/clients.py Outdated
Comment thread src/llm/caching.py Outdated
Comment thread src/llm/backends/openai.py Outdated
Comment thread src/llm/backends/anthropic.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/unit/llm/test_model_config.py (1)

240-252: Use raw strings for regex patterns with metacharacters.

The match= pattern contains . which is a regex metacharacter matching any character. While it works here because the expected text contains a literal ., using a raw string makes the intent clearer and avoids potential false matches.

Suggested fix
     with pytest.raises(
         ValueError,
-        match="VECTOR_STORE.DIMENSIONS must match EMBEDDING.VECTOR_DIMENSIONS",
+        match=r"VECTOR_STORE\.DIMENSIONS must match EMBEDDING\.VECTOR_DIMENSIONS",
     ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/llm/test_model_config.py` around lines 240 - 252, The
pytest.assert in
test_app_settings_require_matching_embedding_and_vector_store_dimensions uses a
normal string for the regex match which contains metacharacters (the dot);
update the match argument to use a raw string (e.g., r"...") so the literal
period is matched explicitly. Locate the pytest.raises call in
test_app_settings_require_matching_embedding_and_vector_store_dimensions and
change the match value accordingly while leaving the
AppSettings/EmbeddingSettings/VectorStoreSettings inputs unchanged.
tests/unit/llm/test_embedding_client.py (1)

15-21: Rename input parameter to avoid shadowing builtin.

The input parameter shadows Python's built-in input() function. While functional, this is flagged by Ruff (A002) and could cause confusion.

Suggested fix
-    async def create(self, *, model: str, input: str | list[str]) -> SimpleNamespace:
-        self.calls.append({"model": model, "input": input})
-        if isinstance(input, list):
+    async def create(self, *, model: str, text_input: str | list[str]) -> SimpleNamespace:
+        self.calls.append({"model": model, "input": text_input})
+        if isinstance(text_input, list):
             data = [SimpleNamespace(embedding=self.embedding) for _ in input]

Note: The test assertions at line 54 can remain {"model": ..., "input": ...} since that's what the actual OpenAI API uses.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/llm/test_embedding_client.py` around lines 15 - 21, The create
method currently uses a parameter named input which shadows the built-in; rename
the parameter in async def create(...) (e.g., to input_text or inputs) and
update all uses inside the function (the isinstance check, list comprehension
and return construction) to use the new name, but keep the recorded dict key in
self.calls as "input" so existing tests/assertions that expect {"model": ...,
"input": ...} still pass; ensure the SimpleNamespace construction and type hints
are updated to the new parameter name (function: create).
src/embedding_client.py (1)

419-432: Consider caching resolved config to avoid duplicate resolution.

_resolve_runtime_config() is called twice when checking signatures and rebuilding the client. While not a significant performance issue, you could cache the resolution result within _get_client to avoid redundant env lookups.

Minor optimization
     def _get_client(self) -> _EmbeddingClient:
-        signature = self._get_settings_signature()
+        runtime_config = self._resolve_runtime_config()
+        signature = self._compute_signature(runtime_config)
         if self._instance is None or self._instance_signature != signature:
             with self._lock:
                 if self._instance is None or self._instance_signature != signature:
-                    runtime_config = self._resolve_runtime_config()
                     self._instance = _EmbeddingClient(
                         runtime_config,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/embedding_client.py` around lines 419 - 432, The code calls
_resolve_runtime_config() twice via _get_settings_signature() and later when
rebuilding the client; to avoid redundant env lookups, resolve the runtime
config once inside _get_client and reuse it: call self._resolve_runtime_config()
into a local variable, pass that runtime_config into the signature check and
into the client construction/rebuild logic (or modify _get_settings_signature to
accept a runtime_config parameter), and remove the duplicate internal call to
_resolve_runtime_config() so _get_settings_signature(),
_resolve_runtime_config(), and the client build all reuse the same resolved
object.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/embedding_client.py`:
- Around line 419-432: The code calls _resolve_runtime_config() twice via
_get_settings_signature() and later when rebuilding the client; to avoid
redundant env lookups, resolve the runtime config once inside _get_client and
reuse it: call self._resolve_runtime_config() into a local variable, pass that
runtime_config into the signature check and into the client construction/rebuild
logic (or modify _get_settings_signature to accept a runtime_config parameter),
and remove the duplicate internal call to _resolve_runtime_config() so
_get_settings_signature(), _resolve_runtime_config(), and the client build all
reuse the same resolved object.

In `@tests/unit/llm/test_embedding_client.py`:
- Around line 15-21: The create method currently uses a parameter named input
which shadows the built-in; rename the parameter in async def create(...) (e.g.,
to input_text or inputs) and update all uses inside the function (the isinstance
check, list comprehension and return construction) to use the new name, but keep
the recorded dict key in self.calls as "input" so existing tests/assertions that
expect {"model": ..., "input": ...} still pass; ensure the SimpleNamespace
construction and type hints are updated to the new parameter name (function:
create).

In `@tests/unit/llm/test_model_config.py`:
- Around line 240-252: The pytest.assert in
test_app_settings_require_matching_embedding_and_vector_store_dimensions uses a
normal string for the regex match which contains metacharacters (the dot);
update the match argument to use a raw string (e.g., r"...") so the literal
period is matched explicitly. Locate the pytest.raises call in
test_app_settings_require_matching_embedding_and_vector_store_dimensions and
change the match value accordingly while leaving the
AppSettings/EmbeddingSettings/VectorStoreSettings inputs unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b43962e0-dd48-43fa-8907-70bc0a219ade

📥 Commits

Reviewing files that changed from the base of the PR and between f9726ca and 1ad6321.

📒 Files selected for processing (15)
  • .env.template
  • config.toml.example
  • docs/v3/contributing/configuration.mdx
  • src/config.py
  • src/crud/document.py
  • src/crud/representation.py
  • src/embedding_client.py
  • src/schemas/api.py
  • src/utils/agent_tools.py
  • src/utils/search.py
  • src/vector_store/lancedb.py
  • tests/conftest.py
  • tests/integration/test_message_embeddings.py
  • tests/unit/llm/test_embedding_client.py
  • tests/unit/llm/test_model_config.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/v3/contributing/configuration.mdx

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/unit/llm/test_model_config.py (1)

266-300: Consider using raw strings for regex patterns with metacharacters.

The match= patterns contain literal . characters which are regex metacharacters. While they work here because the actual strings contain dots at those positions, using raw strings makes the intent clearer.

-        match="VECTOR_STORE.DIMENSIONS must match EMBEDDING.VECTOR_DIMENSIONS",
+        match=r"VECTOR_STORE\.DIMENSIONS must match EMBEDDING\.VECTOR_DIMENSIONS",

This is a minor static analysis hint (RUF043) - the tests will pass either way.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/llm/test_model_config.py` around lines 266 - 300, The
pytest.raises match patterns in tests
test_app_settings_require_matching_embedding_and_vector_store_dimensions and
test_app_settings_reject_non_1536_dimensions_while_pgvector_or_dual_write_active
should use raw string literals for the regex patterns (e.g.,
r"VECTOR_STORE.DIMENSIONS must match EMBEDDING.VECTOR_DIMENSIONS" and
r"EMBEDDING.VECTOR_DIMENSIONS must remain 1536") to avoid accidental
interpretation of regex metacharacters; update the match= arguments in those
tests where AppSettings, EmbeddingSettings, and VectorStoreSettings are
instantiated to use raw strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/llm/backends/anthropic.py`:
- Around line 80-92: Ensure you validate params["messages"] before mutating its
last entry: check that params.get("messages") is a non-empty list and that
params["messages"][-1] is a dict with a string "content" key; if the check
fails, raise the repo's ValidationException (import from src.exceptions) with a
clear message instead of allowing IndexError/TypeError. Apply this validation
and exception raise in both places where you append schema guidance (the
use_json_prefill branch that mutates params["messages"][-1]["content"] and the
later branch at lines ~156-167), and only append when the validation passes; if
needed, add a small helper like _ensure_last_message_has_content to reuse the
check.
- Around line 300-313: The _convert_tool_choice function currently returns None
for tool_choice == "none", causing the tool_choice field to be omitted and
Anthropic to default to auto; update _convert_tool_choice (the function handling
tool_choice) so that when tool_choice == "none" it returns {"type":"none"}
instead of None, ensuring the API request includes an explicit {"type":"none"}
to disable tools.
- Around line 75-99: When assistant prefill is used (the use_json_prefill flag
computed the same way as in the shown send/complete path), the stream()
implementation must compensate for the prefixed "{" that the API omits in
streamed deltas; update stream() to detect use_json_prefill (same condition:
bool(response_format or self._json_mode(extra_params)) and not
thinking_budget_tokens and self._supports_assistant_prefill(model)) and, when
True, either (a) prepend a single "{" to the very first yielded chunk before
passing through subsequent chunks, or (b) accumulate stream chunks and call the
same reconstruction logic used by _normalize_response (pass a
prefilled_json=True-style flag or reuse its logic) so the final concatenation
produces valid JSON; ensure this behavior is applied regardless of whether
response_format is a Pydantic type or plain json_mode so streamed output matches
complete() normalization.

---

Nitpick comments:
In `@tests/unit/llm/test_model_config.py`:
- Around line 266-300: The pytest.raises match patterns in tests
test_app_settings_require_matching_embedding_and_vector_store_dimensions and
test_app_settings_reject_non_1536_dimensions_while_pgvector_or_dual_write_active
should use raw string literals for the regex patterns (e.g.,
r"VECTOR_STORE.DIMENSIONS must match EMBEDDING.VECTOR_DIMENSIONS" and
r"EMBEDDING.VECTOR_DIMENSIONS must remain 1536") to avoid accidental
interpretation of regex metacharacters; update the match= arguments in those
tests where AppSettings, EmbeddingSettings, and VectorStoreSettings are
instantiated to use raw strings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c1d7f702-5ac7-4786-bff4-4efed84b493c

📥 Commits

Reviewing files that changed from the base of the PR and between 1ad6321 and d6cdd08.

📒 Files selected for processing (11)
  • .env.template
  • config.toml.example
  • docs/v3/contributing/configuration.mdx
  • src/config.py
  • src/llm/backends/anthropic.py
  • src/llm/backends/gemini.py
  • src/llm/backends/openai.py
  • src/llm/caching.py
  • src/llm/structured_output.py
  • src/utils/clients.py
  • tests/unit/llm/test_model_config.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • docs/v3/contributing/configuration.mdx
  • src/llm/caching.py
  • .env.template
  • src/llm/structured_output.py
  • src/llm/backends/gemini.py
  • src/utils/clients.py
  • src/llm/backends/openai.py

Comment thread src/llm/backends/anthropic.py
Comment thread src/llm/backends/anthropic.py Outdated
Comment thread src/llm/backends/anthropic.py
VVoruganti and others added 3 commits April 19, 2026 23:46
…ients.py

The 1624-line src/utils/clients.py has been carved up into focused modules
under src/llm/ and deleted. There is now one golden path for LLM
orchestration and no dual entrypoint.

New module layout:

  src/llm/
    __init__.py       thin stable re-export surface
    api.py            public honcho_llm_call with retry + fallback + tool
                      loop delegation
    executor.py       honcho_llm_call_inner (single-call executor); bridges
                      to request_builder.execute_completion / execute_stream
    tool_loop.py      execute_tool_loop + stream_final_response, plus
                      assistant-tool-message and tool-result formatting
    runtime.py        AttemptPlan dataclass (replaces the loose
                      ProviderSelection NamedTuple), effective_config_for_call,
                      plan_attempt, per-retry temperature bump, attempt
                      ContextVar
    registry.py       single owner of CLIENTS dict + cached default and
                      override SDK-client factories + backend/history-adapter
                      selection + high-level get_backend(config)
    conversation.py   count_message_tokens, tool-aware message grouping,
                      truncate_messages_to_fit
    types.py          HonchoLLMCallResponse, HonchoLLMCallStreamChunk,
                      StreamingResponseWithMetadata, IterationData,
                      IterationCallback, ReasoningEffortType, VerbosityType,
                      ProviderClient
    request_builder.py low-level request assembly (ModelConfig → backend
                      complete/stream); no longer owns credential resolution
    credentials.py    default_transport_api_key, resolve_credentials
    caching.py        gemini_cache_store; re-exports PromptCachePolicy
                      from src.config
    backend.py        Protocol + normalized result types
    history_adapters.py provider-specific assistant/tool message shapes
    structured_output.py
    backends/         AnthropicBackend, OpenAIBackend, GeminiBackend

handle_streaming_response had no production callers; it is deleted. The
three tests that used it now drive honcho_llm_call_inner(stream=True,
client_override=...) directly, which exercises the same code path the
public API uses.

Dead credential passthrough removed. The ProviderBackend Protocol and
all three concrete backends no longer accept api_key / api_base — those
are baked into the underlying SDK client at registry construction time
and were being del'd everywhere they appeared. request_builder also
stops resolving and forwarding them.

Client construction is unified. The cached default-client factories
(get_anthropic_client, get_openai_client, get_gemini_client) and override
factories (get_*_override_client) are promoted to public API; the
module-level CLIENTS dict populates from them and remains the
patch.dict(CLIENTS, {...}) mocking seam tests rely on. Old duplicate
helpers (_build_client, _default_credentials_for_provider) are gone.
default_transport_api_key is promoted to public.

Application imports now come from src.llm (dreamer, dialectic, deriver,
summarizer, telemetry-adjacent tests). No code imports from
src.utils.clients anywhere in the repo.

ruff: clean. basedpyright: 0 errors, 0 warnings. Tests: 1013/1013 pass
across the entire non-infra test suite (excluding tests/unified,
tests/bench, tests/live_llm, tests/alembic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lidator

Gemini's native-transport function-declarations validator accepts a narrow
subset of JSON-Schema / OpenAPI: type, format, description, nullable, enum,
properties, required, items, minItems, maxItems, minimum, maximum, title.
Anything else — additionalProperties, allOf, if/then/else, $ref, anyOf,
oneOf, $defs, patternProperties — triggers an INVALID_ARGUMENT 400 at call
time.

Our agent tool schemas in src/utils/agent_tools.py use several of those
(additionalProperties: false, allOf + if/then conditionals) because they
were authored for OpenAI strict-mode + Anthropic, which need the richer
vocabulary. GeminiBackend._convert_tools was passing them straight through.

Add _sanitize_schema(): walks the parameters tree and drops unsupported
keywords while preserving semantics for the keywords that hold user data
(properties maps field-name → sub-schema; required / enum are lists of
literals; items is a single sub-schema). Other backends are untouched and
continue to receive the full strict schemas.

Regression tests:
- test_gemini_sanitize_schema_strips_unsupported_keywords: confirms
  additionalProperties, allOf + if/then, and $defs are stripped at nested
  levels while legitimate fields survive.
- test_gemini_convert_tools_sanitizes_parameters_schema: end-to-end
  _convert_tools output has no forbidden keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (3)
tests/llm/test_model_config.py (1)

253-257: ⚠️ Potential issue | 🟡 Minor

Still relevant: escape the dots in these pytest.raises(match=...) patterns.

. is regex-any-character here, not a literal dot, so these assertions are looser than they look, and Ruff is already flagging all three cases.

🧪 Tighten the three regexes
-        match="VECTOR_STORE.DIMENSIONS must match EMBEDDING.VECTOR_DIMENSIONS",
+        match=r"VECTOR_STORE\.DIMENSIONS must match EMBEDDING\.VECTOR_DIMENSIONS",
@@
-        match="EMBEDDING.VECTOR_DIMENSIONS must remain 1536",
+        match=r"EMBEDDING\.VECTOR_DIMENSIONS must remain 1536",
@@
-        match="EMBEDDING.VECTOR_DIMENSIONS must remain 1536",
+        match=r"EMBEDDING\.VECTOR_DIMENSIONS must remain 1536",

Also applies to: 271-283

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm/test_model_config.py` around lines 253 - 257, The
pytest.raises(match=...) patterns in the tests (notably
test_app_settings_require_matching_embedding_and_vector_store_dimensions and the
similar assertions around lines 271-283) use unescaped dots so "." acts as a
regex wildcard; update those match strings to escape literal dots (e.g., use
"\." or construct the pattern with re.escape) so the message is matched exactly;
locate the raises calls in the test function names mentioned and replace the
loose regexes with escaped-dot versions or safe escaped strings.
src/llm/backends/openai.py (2)

388-402: ⚠️ Potential issue | 🟠 Major

Still relevant: normalize each tool independently.

Returning early when tools[0] is already OpenAI-shaped leaves later Anthropic-shaped entries untouched, so partially normalized tool lists can still produce invalid requests.

🛠️ Convert per item
     `@staticmethod`
     def _convert_tools(tools: list[dict[str, Any]]) -> list[dict[str, Any]]:
-        if not tools or tools[0].get("type") == "function":
-            return tools
-        return [
-            {
-                "type": "function",
-                "function": {
-                    "name": tool["name"],
-                    "description": tool["description"],
-                    "parameters": tool["input_schema"],
-                    "strict": True,
-                },
-            }
-            for tool in tools
-        ]
+        converted: list[dict[str, Any]] = []
+        for tool in tools:
+            if tool.get("type") == "function":
+                converted.append(tool)
+                continue
+            converted.append(
+                {
+                    "type": "function",
+                    "function": {
+                        "name": tool["name"],
+                        "description": tool["description"],
+                        "parameters": tool["input_schema"],
+                        "strict": True,
+                    },
+                }
+            )
+        return converted
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/openai.py` around lines 388 - 402, The _convert_tools
function currently returns early based on tools[0] which leaves later
Anthropic-shaped entries unnormalized; change it to normalize each item
individually by iterating over the tools list and for each tool: if
tool.get("type") == "function" keep it as-is, otherwise replace it with the
OpenAI-shaped dict structure using keys "type":"function" and a nested
"function" object containing "name" (from tool["name"]), "description" (from
tool["description"]), "parameters" (from tool["input_schema"]) and "strict":
True; preserve the original order and return the mapped list.

321-330: ⚠️ Potential issue | 🟠 Major

Still relevant: don't log raw tool arguments.

Malformed tool args can contain user content or secrets, and this warning currently writes the full payload to logs. Log the tool id/name plus the parse error instead.

🔒 Safer logging
-                    except (json.JSONDecodeError, TypeError):
+                    except (json.JSONDecodeError, TypeError) as exc:
                         logger.warning(
                             "Malformed tool arguments for %s (id=%s): %s",
                             tool_call.function.name,
                             tool_call.id,
-                            tool_call.function.arguments,
+                            exc,
                         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/openai.py` around lines 321 - 330, The warning currently
logs raw tool arguments (tool_call.function.arguments) which may contain
sensitive user data; update the except block that catches json.JSONDecodeError
and TypeError so it does not include the full arguments payload—instead catch
the exception as e and call logger.warning with a message that includes only the
tool identifier and name (tool_call.id and tool_call.function.name) plus the
parse error (e) or e.__class__.__name__, removing tool_call.function.arguments
from the log; locate this change around the code handling
tool_call.function.arguments and the existing logger.warning call.
🧹 Nitpick comments (4)
src/llm/runtime.py (1)

185-190: Consider documenting the temperature bump rationale.

The 0.0 → 0.2 temperature bump on retries is an interesting heuristic for adding variety, but a brief docstring explaining why this specific value was chosen would help future maintainers understand the tradeoff.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/runtime.py` around lines 185 - 190, Update the docstring for
effective_temperature to briefly explain the rationale and tradeoffs for bumping
0.0 → 0.2 on retries: note that this heuristic introduces slight randomness to
avoid repeated identical deterministic outputs on subsequent attempts, why 0.2
was chosen as a small-but-noticeable perturbation, and that the bump only
applies when current_attempt.get() > 1 (log via logger.debug already present);
reference the function name effective_temperature and the current_attempt check
in the description so maintainers understand when and why the change occurs.
src/llm/registry.py (2)

58-81: Unbounded @cache for override clients may cause memory growth.

As noted in PR objectives, the unbounded functools.cache for override clients could grow indefinitely if many unique (base_url, api_key) combinations are used. Consider using lru_cache with a bounded maxsize to cap memory usage.

Suggested fix
-@cache
+@lru_cache(maxsize=128)
 def get_openai_override_client(
     base_url: str | None, api_key: str | None
 ) -> AsyncOpenAI:

-@cache
+@lru_cache(maxsize=128)
 def get_anthropic_override_client(
     base_url: str | None,
     api_key: str | None,
 ) -> AsyncAnthropic:

-@cache
+@lru_cache(maxsize=128)
 def get_gemini_override_client(
     base_url: str | None, api_key: str | None
 ) -> genai.Client:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/registry.py` around lines 58 - 81, Replace the unbounded
functools.cache decorators with a bounded lru_cache to prevent unbounded memory
growth: import lru_cache from functools and annotate get_openai_override_client,
get_anthropic_override_client, and get_gemini_override_client with
`@lru_cache`(maxsize=128) (or another reasonable maxsize), keeping the same
function signatures so the cache keys remain the (base_url, api_key) pair; this
limits the number of cached override clients while preserving behavior.

84-102: Duplicate client instances created at import time.

CLIENTS creates new client instances at import time (lines 89, 95, 100), while the get_*_client() factories with lru_cache(maxsize=1) create separate cached instances. This results in two sets of clients for the same credentials.

Consider populating CLIENTS using the cached factories:

Suggested fix
 CLIENTS: dict[ModelTransport, ProviderClient] = {}

 if settings.LLM.ANTHROPIC_API_KEY:
-    CLIENTS["anthropic"] = AsyncAnthropic(
-        api_key=settings.LLM.ANTHROPIC_API_KEY,
-        timeout=600.0,
-    )
+    CLIENTS["anthropic"] = get_anthropic_client()

 if settings.LLM.OPENAI_API_KEY:
-    CLIENTS["openai"] = AsyncOpenAI(
-        api_key=settings.LLM.OPENAI_API_KEY,
-    )
+    CLIENTS["openai"] = get_openai_client()

 if settings.LLM.GEMINI_API_KEY:
-    CLIENTS["gemini"] = genai.client.Client(
-        api_key=settings.LLM.GEMINI_API_KEY,
-    )
+    CLIENTS["gemini"] = get_gemini_client()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/registry.py` around lines 84 - 102, CLIENTS is instantiating provider
clients at import time (AsyncAnthropic, AsyncOpenAI, genai.client.Client) while
separate cached factories (the get_*_client() functions decorated with
lru_cache(maxsize=1)) also create instances, causing duplicate clients; change
CLIENTS population to call and store the cached factory functions instead of
instantiating directly (i.e., use the
get_anthropic_client/get_openai_client/get_gemini_client factories when settings
keys exist), ensuring CLIENTS references the single cached instances and
removing direct construction at import time.
src/llm/types.py (1)

60-65: Inconsistent default values for token fields.

input_tokens has a default of 0, but output_tokens is required (no default). This inconsistency could cause issues when constructing responses where output token count isn't yet known.

Consider adding a default for consistency:

     input_tokens: int = 0
-    output_tokens: int
+    output_tokens: int = 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/types.py` around lines 60 - 65, The token fields in the response type
are inconsistent: input_tokens, cache_creation_input_tokens, and
cache_read_input_tokens have defaults but output_tokens is required; update the
type definition to give output_tokens a default (e.g., 0) to match the others
and avoid construction errors, and while editing the same dataclass/structure
adjust finish_reasons to use a safe default (e.g., an empty list via a
default_factory) rather than a required list so the object can be instantiated
without explicit finish_reasons; locate the declaration containing the fields
content, input_tokens, output_tokens, cache_creation_input_tokens,
cache_read_input_tokens, finish_reasons and apply these default-value changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/llm/api.py`:
- Around line 176-186: The current defaulting of locals (temperature, stop_seqs,
thinking_budget_tokens, reasoning_effort) from runtime_model_config happens too
early and causes these values to be treated as explicit overrides by
effective_config_for_call(), preventing fallback model defaults from applying;
change the code so these parameters remain None until after the chosen
attempt/model is selected (or only populate them from the selected model's
runtime_model_config when that model is the primary/intentional one), ensuring
you do not pass non-None values through to effective_config_for_call() unless
they are genuine call overrides; apply this same fix in the other affected
blocks (the tool-loop path and the other ranges around lines 217-253 and
323-325) and reference the variables temperature, stop_seqs,
thinking_budget_tokens, reasoning_effort and the helper
effective_config_for_call() when locating where to move the defaulting logic.
- Around line 188-193: Replace the raw ValueError raised in the stream+tools
guard with the project's ValidationException; import ValidationException from
src.exceptions and raise ValidationException(...) instead of ValueError(...) in
the block that checks "if stream and tools and not stream_final_only" in
src/llm/api.py, preserving the existing error message text so callers receive
the same message but as a ValidationException for consistent public input
validation handling.

In `@src/llm/backends/openai.py`:
- Around line 170-177: The code path in the structured response handling (the
block checking if parsed is None and using refusal and _normalize_response)
currently raises a raw ValueError; replace this with raising the project's
ValidationException (import it where exceptions are handled) so both
structured-output failure branches use the same exception type; update the raise
to include a clear message (e.g., "No parsed content in structured response")
and any relevant context from response/refusal as the exception message to aid
upstream handling, keeping the existing behavior of returning
_normalize_response when refusal is present.

In `@src/llm/conversation.py`:
- Around line 37-56: The helpers _is_tool_use_message and
_is_tool_result_message currently detect Anthropic (content list with type) and
OpenAI (tool_calls / role) formats but miss Gemini messages; update both to also
inspect msg.get("parts") when it's a list and return True if any part dict
contains "function_call" (for _is_tool_use_message) or "function_response" (for
_is_tool_result_message). Preserve the existing checks (content list/type,
tool_calls, role == "tool") and add the new parts-based detection so
Gemini-formatted messages are recognized by those functions.

In `@src/llm/registry.py`:
- Around line 120-123: Replace the plain ValueError with the custom
ValidationException from src.exceptions: import ValidationException (if not
already imported) and change the raise in the api_key validation block inside
src/llm/registry.py (the logic around api_key = model_config.api_key or
default_transport_api_key(provider), base_url = model_config.base_url, and the
provider/model_config check) to raise ValidationException with the same message.
Ensure the exception import is added and used where the current raise
ValueError(f"Missing API key for {provider} model config") appears.

In `@src/llm/tool_loop.py`:
- Around line 87-131: stream_final_response currently rebuilds the client from
model_config and calls honcho_llm_call_inner directly, which can lose the
winning/selected config and retry behavior; instead thread the selected/winning
config (the same object used by the successful tool attempt) into
stream_final_response and invoke the same retry wrapper used by _final_call so
streaming uses the exact selected_config and retry logic. Concretely, accept a
selected_config or winning attempt parameter in stream_final_response (or pass
through the existing selected_config if available), avoid calling
client_for_model_config inside the function, and call the retry wrapper that
wraps _final_call (or call _final_call with stream=True) so the stream is
produced by the same retry-handling path and uses the same
client_override/selected_config as the successful attempt.
- Around line 374-394: After appending the synthesis_prompt to
conversation_messages, call truncate_messages_to_fit(...) again to ensure
conversation_messages is bounded before making the final LLM calls;
specifically, re-run truncate_messages_to_fit with the same parameters used
earlier just after conversation_messages.append(...) so that both the streaming
path (stream_final_response) and the non-streaming final call use the truncated
history and cannot exceed max_input_tokens (apply the same fix for the analogous
block around the non-streaming final call in the 408-428 region).
- Around line 43-45: Validate the incoming max_tool_iterations parameter at the
start of the tool loop entry function (the function that accepts
max_tool_iterations — e.g., run_tool_loop or similar) by importing and using
ValidationException from src.exceptions and the existing MIN_TOOL_ITERATIONS and
MAX_TOOL_ITERATIONS constants; if max_tool_iterations < MIN_TOOL_ITERATIONS or >
MAX_TOOL_ITERATIONS raise ValidationException with a clear message, and remove
any existing behavior that allows 0 or oversized values to bypass or endlessly
run the loop so the validated value is guaranteed before any tool-enabled call
or iteration logic.

In `@tests/utils/test_clients.py`:
- Around line 1101-1154: The test currently only verifies provider_params reach
OpenAIBackend.complete by patching OpenAIBackend.complete (capture_extra) but
doesn't assert they reach the actual SDK call; update
test_provider_params_passthrough to also inspect the mocked SDK call
(mock_client.chat.completions.create) call kwargs after honcho_llm_call and
assert that the custom sentinel (provider_params["honcho_sentinel"] == "zap") is
present in the arguments passed to the SDK (e.g., in extra_params or the final
kwargs), ensuring the full path from honcho_llm_call ->
OpenAIBackend._build_params -> OpenAIBackend.complete ->
mock_client.chat.completions.create is covered.

---

Duplicate comments:
In `@src/llm/backends/openai.py`:
- Around line 388-402: The _convert_tools function currently returns early based
on tools[0] which leaves later Anthropic-shaped entries unnormalized; change it
to normalize each item individually by iterating over the tools list and for
each tool: if tool.get("type") == "function" keep it as-is, otherwise replace it
with the OpenAI-shaped dict structure using keys "type":"function" and a nested
"function" object containing "name" (from tool["name"]), "description" (from
tool["description"]), "parameters" (from tool["input_schema"]) and "strict":
True; preserve the original order and return the mapped list.
- Around line 321-330: The warning currently logs raw tool arguments
(tool_call.function.arguments) which may contain sensitive user data; update the
except block that catches json.JSONDecodeError and TypeError so it does not
include the full arguments payload—instead catch the exception as e and call
logger.warning with a message that includes only the tool identifier and name
(tool_call.id and tool_call.function.name) plus the parse error (e) or
e.__class__.__name__, removing tool_call.function.arguments from the log; locate
this change around the code handling tool_call.function.arguments and the
existing logger.warning call.

In `@tests/llm/test_model_config.py`:
- Around line 253-257: The pytest.raises(match=...) patterns in the tests
(notably
test_app_settings_require_matching_embedding_and_vector_store_dimensions and the
similar assertions around lines 271-283) use unescaped dots so "." acts as a
regex wildcard; update those match strings to escape literal dots (e.g., use
"\." or construct the pattern with re.escape) so the message is matched exactly;
locate the raises calls in the test function names mentioned and replace the
loose regexes with escaped-dot versions or safe escaped strings.

---

Nitpick comments:
In `@src/llm/registry.py`:
- Around line 58-81: Replace the unbounded functools.cache decorators with a
bounded lru_cache to prevent unbounded memory growth: import lru_cache from
functools and annotate get_openai_override_client,
get_anthropic_override_client, and get_gemini_override_client with
`@lru_cache`(maxsize=128) (or another reasonable maxsize), keeping the same
function signatures so the cache keys remain the (base_url, api_key) pair; this
limits the number of cached override clients while preserving behavior.
- Around line 84-102: CLIENTS is instantiating provider clients at import time
(AsyncAnthropic, AsyncOpenAI, genai.client.Client) while separate cached
factories (the get_*_client() functions decorated with lru_cache(maxsize=1))
also create instances, causing duplicate clients; change CLIENTS population to
call and store the cached factory functions instead of instantiating directly
(i.e., use the get_anthropic_client/get_openai_client/get_gemini_client
factories when settings keys exist), ensuring CLIENTS references the single
cached instances and removing direct construction at import time.

In `@src/llm/runtime.py`:
- Around line 185-190: Update the docstring for effective_temperature to briefly
explain the rationale and tradeoffs for bumping 0.0 → 0.2 on retries: note that
this heuristic introduces slight randomness to avoid repeated identical
deterministic outputs on subsequent attempts, why 0.2 was chosen as a
small-but-noticeable perturbation, and that the bump only applies when
current_attempt.get() > 1 (log via logger.debug already present); reference the
function name effective_temperature and the current_attempt check in the
description so maintainers understand when and why the change occurs.

In `@src/llm/types.py`:
- Around line 60-65: The token fields in the response type are inconsistent:
input_tokens, cache_creation_input_tokens, and cache_read_input_tokens have
defaults but output_tokens is required; update the type definition to give
output_tokens a default (e.g., 0) to match the others and avoid construction
errors, and while editing the same dataclass/structure adjust finish_reasons to
use a safe default (e.g., an empty list via a default_factory) rather than a
required list so the object can be instantiated without explicit finish_reasons;
locate the declaration containing the fields content, input_tokens,
output_tokens, cache_creation_input_tokens, cache_read_input_tokens,
finish_reasons and apply these default-value changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 94a33b16-9e97-4751-8724-5372cee85dcc

📥 Commits

Reviewing files that changed from the base of the PR and between 63581a4 and a97ca2b.

📒 Files selected for processing (32)
  • src/config.py
  • src/deriver/deriver.py
  • src/dialectic/core.py
  • src/dreamer/specialists.py
  • src/llm/__init__.py
  • src/llm/api.py
  • src/llm/backend.py
  • src/llm/backends/anthropic.py
  • src/llm/backends/gemini.py
  • src/llm/backends/openai.py
  • src/llm/caching.py
  • src/llm/conversation.py
  • src/llm/credentials.py
  • src/llm/executor.py
  • src/llm/registry.py
  • src/llm/request_builder.py
  • src/llm/runtime.py
  • src/llm/tool_loop.py
  • src/llm/types.py
  • src/telemetry/reasoning_traces.py
  • src/utils/clients.py
  • src/utils/summarizer.py
  • src/utils/types.py
  • tests/conftest.py
  • tests/deriver/test_deriver_processing.py
  • tests/dialectic/test_model_config_usage.py
  • tests/dreamer/test_model_config_usage.py
  • tests/integration/test_token_metrics.py
  • tests/llm/test_model_config.py
  • tests/utils/test_clients.py
  • tests/utils/test_length_finish_reason.py
  • tests/utils/test_summarizer.py
💤 Files with no reviewable changes (1)
  • src/utils/types.py
✅ Files skipped from review due to trivial changes (2)
  • tests/integration/test_token_metrics.py
  • src/llm/backend.py
🚧 Files skipped from review as they are similar to previous changes (11)
  • tests/utils/test_summarizer.py
  • src/utils/summarizer.py
  • tests/dreamer/test_model_config_usage.py
  • tests/conftest.py
  • src/deriver/deriver.py
  • tests/dialectic/test_model_config_usage.py
  • src/dreamer/specialists.py
  • src/llm/init.py
  • src/llm/request_builder.py
  • src/llm/caching.py
  • src/llm/backends/anthropic.py

Comment thread src/llm/api.py Outdated
Comment thread src/llm/api.py
Comment thread src/llm/backends/openai.py Outdated
Comment thread src/llm/conversation.py
Comment thread src/llm/registry.py Outdated
Comment thread src/llm/tool_loop.py
Comment thread src/llm/tool_loop.py Outdated
Comment thread src/llm/tool_loop.py
Comment on lines +1101 to +1154
async def test_provider_params_passthrough(self):
"""Operator-supplied provider_params must reach extra_params.

async def test_groq_no_content_error(self):
"""Test Groq error handling when no content in response"""
from groq import AsyncGroq
Uses a custom sentinel key that won't be intercepted by a backend —
we verify it shows up in the backend call by monkeypatching the
OpenAIBackend at the extra_params boundary.
"""
from openai import AsyncOpenAI

mock_client = AsyncMock(spec=AsyncGroq)
mock_client = AsyncMock(spec=AsyncOpenAI)
mock_response = ChatCompletion(
id="test-id",
object="chat.completion",
created=1234567890,
model="llama-3.1-70b",
model="gpt-4.1",
choices=[
Choice(
index=0,
message=ChatCompletionMessage(role="assistant", content=None),
message=ChatCompletionMessage(role="assistant", content="ok"),
finish_reason="stop",
)
],
usage=CompletionUsage(
prompt_tokens=10, completion_tokens=0, total_tokens=10
prompt_tokens=10, completion_tokens=5, total_tokens=15
),
)
mock_client.chat.completions.create = AsyncMock(return_value=mock_response)

captured_extra: dict[str, Any] = {}

from src.llm.backends.openai import OpenAIBackend

original_complete = OpenAIBackend.complete

async def capture_extra(self: Any, **kwargs: Any) -> Any:
captured_extra.update(kwargs.get("extra_params") or {})
return await original_complete(self, **kwargs)

with (
patch.dict(CLIENTS, {"groq": mock_client}),
pytest.raises(ValueError, match="No content in response"),
patch.dict(CLIENTS, {"openai": mock_client}),
patch.object(OpenAIBackend, "complete", capture_extra),
):
await honcho_llm_call_inner(
provider="groq",
model="llama-3.1-70b",
await honcho_llm_call(
model_config=ModelConfig(
model="gpt-4.1",
transport="openai",
provider_params={"honcho_sentinel": "zap"},
),
prompt="Hello",
max_tokens=100,
enable_retry=False,
)

async def test_groq_streaming(self):
"""Test Groq streaming response"""
from groq import AsyncGroq

mock_client = AsyncMock(spec=AsyncGroq)
assert captured_extra.get("honcho_sentinel") == "zap"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This regression test stops one layer too early.

It only asserts that provider_params reaches OpenAIBackend.complete(). _build_params() can still drop those keys before chat.completions.create(), so this test will pass even when the actual SDK request is missing the custom params. Assert on the mocked OpenAI call kwargs to cover the full path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_clients.py` around lines 1101 - 1154, The test currently
only verifies provider_params reach OpenAIBackend.complete by patching
OpenAIBackend.complete (capture_extra) but doesn't assert they reach the actual
SDK call; update test_provider_params_passthrough to also inspect the mocked SDK
call (mock_client.chat.completions.create) call kwargs after honcho_llm_call and
assert that the custom sentinel (provider_params["honcho_sentinel"] == "zap") is
present in the arguments passed to the SDK (e.g., in extra_params or the final
kwargs), ensuring the full path from honcho_llm_call ->
OpenAIBackend._build_params -> OpenAIBackend.complete ->
mock_client.chat.completions.create is covered.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/conftest.py (1)

717-755: ⚠️ Potential issue | 🟠 Major

The mock cannot be awaited—it returns a decorator function instead of a coroutine.

honcho_llm_call is defined as async def in src/llm/api.py, but the fixture's decorator_factory (via side_effect) returns a regular function (mock_llm_decorator). When test code awaits the result (e.g., await honcho_llm_call(...)), it tries to await a function object, which raises TypeError: a coroutine was expected, got <function>.

Tests in tests/utils/test_clients.py are not blocklisted and will apply this fixture, causing await honcho_llm_call(...) calls on lines 901, 929, 968, etc. to fail at runtime.

Replace the decorator-factory approach with an AsyncMock that directly returns mock responses:

Suggested fix
-    def decorator_factory(*args: Any, **kwargs: Any) -> Callable[..., Any]:
-        """Factory function that creates the mock decorator"""
-
-        def mock_llm_decorator(func: Callable[..., Any]) -> Callable[..., Any]:
-            async def async_wrapper(*func_args: Any, **func_kwargs: Any) -> Any:
-                return create_mock_response(
-                    response_model=kwargs.get("response_model"),
-                    stream=kwargs.get("stream", False),
-                    return_call_response=kwargs.get("return_call_response", False),
-                )
-
-            def sync_wrapper(*func_args: Any, **func_kwargs: Any) -> Any:
-                return create_mock_response(
-                    response_model=kwargs.get("response_model"),
-                    stream=kwargs.get("stream", False),
-                    return_call_response=kwargs.get("return_call_response", False),
-                )
-
-            import inspect
-            if inspect.iscoroutinefunction(func):
-                return async_wrapper
-            else:
-                return sync_wrapper
-
-        return mock_llm_decorator
-
-    with patch("src.llm.honcho_llm_call", side_effect=decorator_factory):
-        yield decorator_factory
+    async def mock_honcho_call(**kwargs: Any) -> Any:
+        return create_mock_response(
+            response_model=kwargs.get("response_model"),
+            stream=kwargs.get("stream", False),
+            return_call_response=kwargs.get("return_call_response", False),
+        )
+
+    with patch("src.llm.honcho_llm_call", new=AsyncMock(side_effect=mock_honcho_call)):
+        yield
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/conftest.py` around lines 717 - 755, The fixture currently provides a
synchronous decorator via decorator_factory -> mock_llm_decorator, but
honcho_llm_call in src/llm/api.py is async and test code does await
honcho_llm_call(...), so awaiting returns a plain function and raises TypeError;
fix by making the patched honcho_llm_call an async-returning mock: either change
decorator_factory to be async (async def decorator_factory(...) -> Callable:
return mock_llm_decorator) so the patch's side_effect yields a coroutine that
resolves to the decorator, or use an AsyncMock that has return_value set to
mock_llm_decorator (e.g., patch("src.llm.honcho_llm_call",
new_callable=AsyncMock, return_value=mock_llm_decorator)); keep
mock_llm_decorator and create_mock_response unchanged so awaited
honcho_llm_call(...) returns a callable decorator that then produces the
expected mock responses.
src/dreamer/specialists.py (1)

209-223: ⚠️ Potential issue | 🟠 Major

Honor model_config.max_output_tokens in the specialist LLM call.

Line 222 always uses self.get_max_tokens(), so per-feature overrides (*_MODEL_CONFIG__MAX_OUTPUT_TOKENS) are ignored.

Proposed fix
         model_config = self.get_model_config()
+        max_tokens = (
+            model_config.max_output_tokens
+            if model_config.max_output_tokens is not None
+            else self.get_max_tokens()
+        )
@@
         response: HonchoLLMCallResponse[str] = await honcho_llm_call(
             model_config=model_config,
             prompt="",  # Ignored since we pass messages
-            max_tokens=self.get_max_tokens(),
+            max_tokens=max_tokens,
             tools=self.get_tools(peer_card_enabled=peer_card_enabled),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dreamer/specialists.py` around lines 209 - 223, The specialist LLM call
always passes self.get_max_tokens() to honcho_llm_call, ignoring per-feature
overrides in model_config.max_output_tokens; change the code to compute an
effective max_tokens value that prefers model_config.max_output_tokens when set
(non-None/positive) and falls back to self.get_max_tokens(), then pass that
effective value into honcho_llm_call (refer to get_model_config(),
model_config.max_output_tokens, get_max_tokens(), and honcho_llm_call to locate
and update the call).
♻️ Duplicate comments (1)
tests/llm/test_model_config.py (1)

256-256: ⚠️ Potential issue | 🟡 Minor

Escape regex metacharacters in pytest.raises(match=...) patterns.

At Line 256, Line 273, and Line 282, . is interpreted as “any character”, so these assertions are looser than intended.

Proposed fix
-        match="VECTOR_STORE.DIMENSIONS must match EMBEDDING.VECTOR_DIMENSIONS",
+        match=r"VECTOR_STORE\.DIMENSIONS must match EMBEDDING\.VECTOR_DIMENSIONS",
@@
-        match="EMBEDDING.VECTOR_DIMENSIONS must remain 1536",
+        match=r"EMBEDDING\.VECTOR_DIMENSIONS must remain 1536",
@@
-        match="EMBEDDING.VECTOR_DIMENSIONS must remain 1536",
+        match=r"EMBEDDING\.VECTOR_DIMENSIONS must remain 1536",
#!/bin/bash
# Verify this file has no unescaped regex-pattern warnings.
python -m ruff check tests/llm/test_model_config.py --select RUF043

Expected result: no RUF043 findings for these three lines.

Also applies to: 273-273, 282-282

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm/test_model_config.py` at line 256, The pytest.raises(match=...)
patterns use unescaped dots so "." matches any char; update the three
occurrences that assert messages like "VECTOR_STORE.DIMENSIONS must match
EMBEDDING.VECTOR_DIMENSIONS" to use an exact literal regex by escaping
metacharacters (e.g. escape the dots) or call re.escape on the expected message;
ensure you import re if using re.escape and replace the raw string match with
match=re.escape("VECTOR_STORE.DIMENSIONS must match
EMBEDDING.VECTOR_DIMENSIONS") (and similarly for the other two assertions) so
pytest.raises enforces an exact match.
🧹 Nitpick comments (5)
tests/llm/test_agent_tool_schemas.py (2)

37-46: Also assert additionalProperties is disabled for specialist schemas.

You already lock required and minItems; adding this guard will prevent schema drift allowing unknown fields.

Suggested test additions
 def test_deductive_specialist_tool_requires_evidence_fields() -> None:
     items = _observation_items_schema("create_observations_deductive")
@@
     assert items["required"] == ["content", "source_ids", "premises"]
+    assert items["additionalProperties"] is False
     assert items["properties"]["source_ids"]["minItems"] == 1
     assert items["properties"]["premises"]["minItems"] == 1
@@
 def test_inductive_specialist_tool_requires_pattern_fields() -> None:
     items = _observation_items_schema("create_observations_inductive")
@@
     assert items["required"] == [
         "content",
         "source_ids",
         "sources",
         "pattern_type",
         "confidence",
     ]
+    assert items["additionalProperties"] is False
     assert items["properties"]["source_ids"]["minItems"] == 2
     assert items["properties"]["sources"]["minItems"] == 2

Also applies to: 48-63

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm/test_agent_tool_schemas.py` around lines 37 - 46, The test for
specialist observation tool schemas (in
test_deductive_specialist_tool_requires_evidence_fields using
_observation_items_schema and TOOLS["create_observations_deductive"]) currently
asserts required and minItems but doesn't assert that additionalProperties is
disabled; update this test to also assert items["additionalProperties"] is False
to prevent unknown fields, and make the same change in the corresponding
specialist test that covers the other schema (the similar test around the other
specialist tool) so both specialist schemas enforce additionalProperties =
False.

10-72: Add Google-style docstrings for helper/tests to align with repository standards.

Functions in this module currently have no docstrings, including _observation_items_schema and test cases.

As per coding guidelines "Use Google style docstrings".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm/test_agent_tool_schemas.py` around lines 10 - 72, Add Google-style
docstrings to the helper and each test in this module: document the purpose,
parameters, return types and any important behavior for the helper
_observation_items_schema(tool_key: str) -> dict[str, Any], and add short one-
or two-line docstrings to each test_... function describing the scenario being
tested and expected outcome (e.g.,
test_generic_create_observations_schema_has_level_specific_requirements,
test_deductive_specialist_tool_requires_evidence_fields,
test_inductive_specialist_tool_requires_pattern_fields,
test_dreamer_specialists_use_level_specific_creation_tools); follow the repo's
Google style (Args, Returns sections where applicable) and place the docstrings
immediately below each def.
tests/llm/test_backends/test_openai.py (1)

233-276: Consider one integration test for max_output_tokens + extra_params forwarding.

You already test _uses_max_completion_tokens thoroughly, but complete() also applies max_output_tokens or max_tokens and forwards model-tuning params (top_p, penalties, seed, GPT-5 verbosity). A focused async test would lock down that ModelConfig forwarding path.

🧪 Suggested test shape
+@pytest.mark.asyncio
+async def test_openai_backend_prefers_max_output_tokens_and_forwards_extra_params() -> None:
+    client = Mock()
+    client.chat.completions.create = AsyncMock(
+        return_value=SimpleNamespace(
+            choices=[
+                SimpleNamespace(
+                    finish_reason="stop",
+                    message=SimpleNamespace(content="ok", tool_calls=[], reasoning_details=[]),
+                )
+            ],
+            usage=SimpleNamespace(prompt_tokens=1, completion_tokens=1, prompt_tokens_details=None),
+        )
+    )
+
+    backend = OpenAIBackend(client)
+    await backend.complete(
+        model="gpt-5-mini",
+        messages=[{"role": "user", "content": "Hello"}],
+        max_tokens=100,
+        max_output_tokens=42,
+        extra_params={
+            "top_p": 0.9,
+            "frequency_penalty": 0.1,
+            "presence_penalty": 0.2,
+            "seed": 7,
+            "verbosity": "low",
+        },
+    )
+
+    call = client.chat.completions.create.await_args.kwargs
+    assert call["max_completion_tokens"] == 42
+    assert call["top_p"] == 0.9
+    assert call["frequency_penalty"] == 0.1
+    assert call["presence_penalty"] == 0.2
+    assert call["seed"] == 7
+    assert call["verbosity"] == "low"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llm/test_backends/test_openai.py` around lines 233 - 276, Add one async
integration test in tests/llm/test_backends/test_openai.py that verifies
ModelConfig forwarding: construct a ModelConfig with max_output_tokens plus
extra_params (e.g., top_p, presence_penalty, frequency_penalty, seed, and GPT‑5
specific verbosity), then call the backend.complete(...) (use the openai
backend's complete function) for both a reasoning model (e.g., "gpt-5") and a
classic model (e.g., "gpt-4o") while mocking the HTTP/OpenAI client; assert the
outgoing request payload includes the correct token param name
(max_completion_tokens for reasoning via _uses_max_completion_tokens and
max_tokens otherwise) and that all extra_params (top_p, presence_penalty,
frequency_penalty, seed, verbosity) are forwarded unchanged. Ensure the test
uses async test harness and inspects the mocked client's received kwargs rather
than relying on response content.
tests/live_llm/model_matrix.py (1)

8-8: Expose tool_replay in FeatureName and filtering for API symmetry.

supports_tool_replay exists on specs, but get_live_model_specs(feature=...) cannot query it. Adding this keeps feature flags and filter API aligned.

Suggested refactor
-FeatureName = Literal["thinking", "structured_output", "caching", "reasoning"]
+FeatureName = Literal[
+    "thinking", "structured_output", "caching", "reasoning", "tool_replay"
+]
...
         if feature == "reasoning" and not spec.supports_reasoning:
             continue
+        if feature == "tool_replay" and not spec.supports_tool_replay:
+            continue

Also applies to: 151-169

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/live_llm/model_matrix.py` at line 8, The FeatureName literal lacks
"tool_replay", so update FeatureName = Literal[...] (the one in
tests/live_llm/model_matrix.py and the similar block around lines 151-169) to
include "tool_replay" and then extend get_live_model_specs(feature=...)
filtering logic to recognize "tool_replay" by mapping it to the existing
supports_tool_replay spec attribute (i.e., when feature == "tool_replay", filter
specs where spec.supports_tool_replay is true) so the feature name and filter
API remain symmetric with the spec field.
tests/deriver/test_deriver_processing.py (1)

58-71: Add an assertion for max_tokens forwarding in this regression test.

The test already validates model config fields; asserting max_tokens here would lock down the max_output_tokens/default fallback contract too.

Suggested test extension
         assert kwargs["model_config"].stop_sequences == expected_config.stop_sequences
+        expected_max_tokens = (
+            expected_config.max_output_tokens or settings.LLM.DEFAULT_MAX_TOKENS
+        )
+        assert kwargs["max_tokens"] == expected_max_tokens
         assert "llm_settings" not in kwargs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/deriver/test_deriver_processing.py` around lines 58 - 71, Add an
assertion that the forwarded max tokens matches the expected config: verify
kwargs["model_config"].max_tokens == expected_config.max_output_tokens (using
the existing expected_config from settings.DERIVER.MODEL_CONFIG.model_copy and
the model_config in kwargs) to lock down the max_output_tokens/default fallback
contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/v3/contributing/troubleshooting.mdx`:
- Around line 149-152: The docs currently reference the generic env key
THINKING_BUDGET_TOKENS which is misleading; update the guidance to use the
correct nested per-component key pattern *_MODEL_CONFIG__THINKING_BUDGET_TOKENS
(matching examples and schema) and change the sentence and the Fix line to
instruct setting each component's *_MODEL_CONFIG__THINKING_BUDGET_TOKENS=0 when
using models that don't support thinking, ensuring the example and any
surrounding text use the *_MODEL_CONFIG__... form consistently.

In `@src/dreamer/specialists.py`:
- Around line 39-46: The helper _require_specialist_model_config currently
raises a built-in ValueError; replace this with the project's
ValidationException from src/exceptions.py: import ValidationException and
change the raise to raise ValidationException(f"{specialist_name} MODEL_CONFIG
must be resolved before use") so the module uses the project-specific validation
exception type (keep the existing message and function name
_require_specialist_model_config unchanged).

In `@src/llm/backends/gemini.py`:
- Around line 421-447: The loop that builds `contents` in gemini.py currently
discards non-"text" blocks when handling message["content"] lists; update the
logic in that loop (the code manipulating `system_messages`, `contents`, and
`message["content"]`) to explicitly handle supported block types (e.g.,
translate "image", "tool", or other allowed types into the Gemini-compatible
parts format) and, if an unknown/unsupported block type is encountered, raise a
ValidationException (or the existing validation error type used in the project)
with a clear message about the unsupported block type and the offending message;
ensure assistant→model role mapping and existing branches for string content and
parts lists remain unchanged.
- Around line 370-383: The cache key currently omits system_instruction and
tool_config, causing wrong cache hits for requests that differ only by system
prompt or tool constraints; update the call to build_cache_key in the Gemini
path (where cache_key is created before gemini_cache_store.get) to include
system_instruction and tool_config (pull them from config or from
_cache_model_config if appropriate) so the key reflects
config.get("system_instruction") and config.get("tool_config") alongside
contents and tools; ensure build_cache_key's parameters (as used by the caching
implementation) receive these values so cached_handle lookup is correctly
discriminative.
- Around line 165-189: stream() currently always yields a terminal StreamChunk
even when the model returned an empty/blocked response
(SAFETY/RECITATION/PROHIBITED_CONTENT/BLOCKLIST), which differs from complete()
that raises an LLMError; update stream() to mirror complete() behavior by
detecting blocked responses from final_chunk (inspect
final_chunk.candidates[0].finish_reason and any safety/block indicators or usage
metadata similarly to complete()) and raise the same LLMError instead of
yielding the final is_done=True chunk when a blocked/empty response is detected;
keep existing logic for normal completions to compute finish_reason and
output_tokens and yield the terminal StreamChunk only in non-blocked cases.

In `@tests/bench/harness.py`:
- Around line 656-677: _compact currently prints nested BaseModel and dict
values without redaction and _mask assumes value has len(), which can raise;
update _compact to pass the full field path into _mask (e.g., build a full_key
like f"{parent}.{field_name}" or just field_name when top-level) and apply _mask
to every emitted value and to inner compacted strings, and handle dicts of
BaseModel by iterating values and compacting them through _compact; also make
_mask robust by converting non-string values to str (use "None" for falsy/None)
before measuring length and returning a masked string of the same length so
masking never raises. Ensure calls that append parts use the masked field
name/value and that recursive _compact calls return masked output.

In `@tests/llm/test_conversation.py`:
- Around line 14-33: The test fixture uses OpenAI-style tool_calls/role:"tool"
shapes so it never exercises the tool-unit grouping in _group_into_units; update
the messages in test_truncate_messages_to_fit_preserves_tool_result_pair to use
the internal SDK shape (e.g., an assistant message whose content is a dict with
type: "tool_use" and includes the function name/arguments and id "call_1",
followed by a tool-result message whose content is a dict with type:
"tool_result" and tool_call_id "call_1" and the result text) so
truncate_messages_to_fit actually triggers the tool-use/tool-result grouping
logic in _group_into_units and validates pair preservation.

---

Outside diff comments:
In `@src/dreamer/specialists.py`:
- Around line 209-223: The specialist LLM call always passes
self.get_max_tokens() to honcho_llm_call, ignoring per-feature overrides in
model_config.max_output_tokens; change the code to compute an effective
max_tokens value that prefers model_config.max_output_tokens when set
(non-None/positive) and falls back to self.get_max_tokens(), then pass that
effective value into honcho_llm_call (refer to get_model_config(),
model_config.max_output_tokens, get_max_tokens(), and honcho_llm_call to locate
and update the call).

In `@tests/conftest.py`:
- Around line 717-755: The fixture currently provides a synchronous decorator
via decorator_factory -> mock_llm_decorator, but honcho_llm_call in
src/llm/api.py is async and test code does await honcho_llm_call(...), so
awaiting returns a plain function and raises TypeError; fix by making the
patched honcho_llm_call an async-returning mock: either change decorator_factory
to be async (async def decorator_factory(...) -> Callable: return
mock_llm_decorator) so the patch's side_effect yields a coroutine that resolves
to the decorator, or use an AsyncMock that has return_value set to
mock_llm_decorator (e.g., patch("src.llm.honcho_llm_call",
new_callable=AsyncMock, return_value=mock_llm_decorator)); keep
mock_llm_decorator and create_mock_response unchanged so awaited
honcho_llm_call(...) returns a callable decorator that then produces the
expected mock responses.

---

Duplicate comments:
In `@tests/llm/test_model_config.py`:
- Line 256: The pytest.raises(match=...) patterns use unescaped dots so "."
matches any char; update the three occurrences that assert messages like
"VECTOR_STORE.DIMENSIONS must match EMBEDDING.VECTOR_DIMENSIONS" to use an exact
literal regex by escaping metacharacters (e.g. escape the dots) or call
re.escape on the expected message; ensure you import re if using re.escape and
replace the raw string match with match=re.escape("VECTOR_STORE.DIMENSIONS must
match EMBEDDING.VECTOR_DIMENSIONS") (and similarly for the other two assertions)
so pytest.raises enforces an exact match.

---

Nitpick comments:
In `@tests/deriver/test_deriver_processing.py`:
- Around line 58-71: Add an assertion that the forwarded max tokens matches the
expected config: verify kwargs["model_config"].max_tokens ==
expected_config.max_output_tokens (using the existing expected_config from
settings.DERIVER.MODEL_CONFIG.model_copy and the model_config in kwargs) to lock
down the max_output_tokens/default fallback contract.

In `@tests/live_llm/model_matrix.py`:
- Line 8: The FeatureName literal lacks "tool_replay", so update FeatureName =
Literal[...] (the one in tests/live_llm/model_matrix.py and the similar block
around lines 151-169) to include "tool_replay" and then extend
get_live_model_specs(feature=...) filtering logic to recognize "tool_replay" by
mapping it to the existing supports_tool_replay spec attribute (i.e., when
feature == "tool_replay", filter specs where spec.supports_tool_replay is true)
so the feature name and filter API remain symmetric with the spec field.

In `@tests/llm/test_agent_tool_schemas.py`:
- Around line 37-46: The test for specialist observation tool schemas (in
test_deductive_specialist_tool_requires_evidence_fields using
_observation_items_schema and TOOLS["create_observations_deductive"]) currently
asserts required and minItems but doesn't assert that additionalProperties is
disabled; update this test to also assert items["additionalProperties"] is False
to prevent unknown fields, and make the same change in the corresponding
specialist test that covers the other schema (the similar test around the other
specialist tool) so both specialist schemas enforce additionalProperties =
False.
- Around line 10-72: Add Google-style docstrings to the helper and each test in
this module: document the purpose, parameters, return types and any important
behavior for the helper _observation_items_schema(tool_key: str) -> dict[str,
Any], and add short one- or two-line docstrings to each test_... function
describing the scenario being tested and expected outcome (e.g.,
test_generic_create_observations_schema_has_level_specific_requirements,
test_deductive_specialist_tool_requires_evidence_fields,
test_inductive_specialist_tool_requires_pattern_fields,
test_dreamer_specialists_use_level_specific_creation_tools); follow the repo's
Google style (Args, Returns sections where applicable) and place the docstrings
immediately below each def.

In `@tests/llm/test_backends/test_openai.py`:
- Around line 233-276: Add one async integration test in
tests/llm/test_backends/test_openai.py that verifies ModelConfig forwarding:
construct a ModelConfig with max_output_tokens plus extra_params (e.g., top_p,
presence_penalty, frequency_penalty, seed, and GPT‑5 specific verbosity), then
call the backend.complete(...) (use the openai backend's complete function) for
both a reasoning model (e.g., "gpt-5") and a classic model (e.g., "gpt-4o")
while mocking the HTTP/OpenAI client; assert the outgoing request payload
includes the correct token param name (max_completion_tokens for reasoning via
_uses_max_completion_tokens and max_tokens otherwise) and that all extra_params
(top_p, presence_penalty, frequency_penalty, seed, verbosity) are forwarded
unchanged. Ensure the test uses async test harness and inspects the mocked
client's received kwargs rather than relying on response content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 364a0f30-d4dc-4bf4-a8d3-0e9dfcc680fe

📥 Commits

Reviewing files that changed from the base of the PR and between a97ca2b and ab4bccf.

📒 Files selected for processing (23)
  • .env.template
  • README.md
  • config.toml.example
  • docs/v3/contributing/configuration.mdx
  • docs/v3/contributing/self-hosting.mdx
  • docs/v3/contributing/troubleshooting.mdx
  • src/config.py
  • src/deriver/deriver.py
  • src/dreamer/specialists.py
  • src/llm/backends/gemini.py
  • src/llm/backends/openai.py
  • src/llm/conversation.py
  • src/utils/agent_tools.py
  • tests/bench/harness.py
  • tests/conftest.py
  • tests/deriver/test_deriver_processing.py
  • tests/live_llm/README.md
  • tests/live_llm/model_matrix.py
  • tests/llm/test_agent_tool_schemas.py
  • tests/llm/test_backends/test_gemini.py
  • tests/llm/test_backends/test_openai.py
  • tests/llm/test_conversation.py
  • tests/llm/test_model_config.py
✅ Files skipped from review due to trivial changes (2)
  • tests/live_llm/README.md
  • src/llm/backends/openai.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/llm/test_backends/test_gemini.py
  • src/llm/conversation.py
  • src/utils/agent_tools.py
  • src/config.py

Comment thread docs/v3/contributing/troubleshooting.mdx Outdated
Comment thread src/dreamer/specialists.py
Comment thread src/llm/backends/gemini.py
Comment thread src/llm/backends/gemini.py
Comment thread src/llm/backends/gemini.py
Comment thread tests/bench/harness.py
Comment on lines +656 to +677
def _mask(full_key, value):
if isinstance(full_key, str) and any(t in full_key.lower() for t in SENSITIVE_TOKENS):
return '*' * len(value) if value else 'None'
return value

def _compact(model):
# Render a pydantic BaseModel as `field=value` pairs, skipping
# None / empty-dict fields and recursing into nested models.
parts = []
for field_name in type(model).model_fields:
val = getattr(model, field_name)
if val is None:
continue
if isinstance(val, BaseModel):
inner = _compact(val)
if inner:
parts.append(f"{{field_name}}=({{inner}})")
continue
if isinstance(val, dict) and not val:
continue
parts.append(f"{{field_name}}={{val!r}}")
return " ".join(parts)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Nested Pydantic settings still bypass redaction.

_compact() never calls _mask(), so the new BaseModel / dict[str, BaseModel] branches print sensitive nested fields like api_key, password, or *_uri in cleartext. Also, _mask() assumes len(value) exists, which can raise on non-string secret-like values and break the config dump.

🔒 Suggested fix
-    def _mask(full_key, value):
-        if isinstance(full_key, str) and any(t in full_key.lower() for t in SENSITIVE_TOKENS):
-            return '*' * len(value) if value else 'None'
-        return value
+    def _mask(full_key, value):
+        rendered = repr(value)
+        if isinstance(full_key, str) and any(
+            token in full_key.lower() for token in SENSITIVE_TOKENS
+        ):
+            return "*" * len(rendered) if rendered else "None"
+        return rendered

-    def _compact(model):
+    def _compact(model, prefix=""):
         # Render a pydantic BaseModel as `field=value` pairs, skipping
         # None / empty-dict fields and recursing into nested models.
         parts = []
         for field_name in type(model).model_fields:
             val = getattr(model, field_name)
+            full_key = f"{prefix}.{field_name}" if prefix else field_name
             if val is None:
                 continue
             if isinstance(val, BaseModel):
-                inner = _compact(val)
+                inner = _compact(val, full_key)
                 if inner:
                     parts.append(f"{field_name}=({inner})")
                 continue
             if isinstance(val, dict) and not val:
                 continue
-            parts.append(f"{field_name}={val!r}")
+            parts.append(f"{field_name}={_mask(full_key, val)}")
         return " ".join(parts)
...
-                    rendered = _compact(v)
+                    rendered = _compact(v, f"{full_key}.{k}")
...
-                rendered = _compact(value)
+                rendered = _compact(value, full_key)

Also applies to: 689-705

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/bench/harness.py` around lines 656 - 677, _compact currently prints
nested BaseModel and dict values without redaction and _mask assumes value has
len(), which can raise; update _compact to pass the full field path into _mask
(e.g., build a full_key like f"{parent}.{field_name}" or just field_name when
top-level) and apply _mask to every emitted value and to inner compacted
strings, and handle dicts of BaseModel by iterating values and compacting them
through _compact; also make _mask robust by converting non-string values to str
(use "None" for falsy/None) before measuring length and returning a masked
string of the same length so masking never raises. Ensure calls that append
parts use the masked field name/value and that recursive _compact calls return
masked output.

Comment thread tests/llm/test_conversation.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/llm/tool_loop.py (1)

87-140: ⚠️ Potential issue | 🟠 Major

Pin final streaming to the successful attempt.

stream_final_response() re-runs get_attempt_plan() inside its own retry wrapper. If the tool loop already succeeded on a fallback attempt, the first streaming call uses that winner, but a retry can recalculate back onto the primary model/client. Thread the winning plan into this helper and retry that fixed target instead of replanning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/tool_loop.py` around lines 87 - 140, The streaming logic re-calls
get_attempt_plan() inside the retry wrapper which can switch back to a different
client/model; compute the winning AttemptPlan once and pin it for all retries by
capturing it outside the retry (or add a winning_plan parameter to
stream_final_response) and use that fixed plan inside _setup_stream (remove the
inner call to get_attempt_plan()); ensure the retry wrapper and
honcho_llm_call_inner use plan.provider, plan.model, plan.client,
plan.selected_config, plan.reasoning_effort, and plan.thinking_budget_tokens
from that pinned plan so streaming always targets the successful attempt.
tests/utils/test_clients.py (1)

1068-1121: ⚠️ Potential issue | 🟡 Minor

Extend this regression test to the SDK call.

Right now it only proves provider_params reaches OpenAIBackend.complete(). The test still passes if _build_params() drops the sentinel before chat.completions.create(), which is the actual regression surface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_clients.py` around lines 1068 - 1121, The test only verifies
provider_params reach OpenAIBackend.complete but not that _build_params
preserves them into the SDK call; wrap or replace the mocked AsyncOpenAI
client's chat.completions.create with a capturing AsyncMock (e.g.,
capture_sdk_create or captured_sdk_extra dict) that records kwargs (or
kwargs.get("extra_params")) when invoked, keep the existing patch of
OpenAIBackend.complete, then after honcho_llm_call assert the captured SDK
kwargs include the "honcho_sentinel" == "zap" to ensure _build_params passes the
sentinel through to chat.completions.create.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/llm/registry.py`:
- Around line 158-189: get_backend currently builds fresh clients for the
default path, bypassing the CLIENTS seam and the missing-API-key checks in
client_for_model_config; change get_backend so that after resolving credentials
it uses the override client helpers only when config.api_key or config.base_url
are provided (using the already-resolved credentials.get("api_base") and
credentials.get("api_key")), but for the default/no-override branch delegate to
client_for_model_config(config) (wrapped in the appropriate backend class like
AnthropicBackend/GeminiBackend/OpenAIBackend) so CLIENTS and its validation are
used; preserve the transport dispatch and final assert_never(config.transport).
- Around line 59-82: Replace the unbounded `@cache` on get_openai_override_client,
get_anthropic_override_client, and get_gemini_override_client with a bounded
cache decorator (e.g., `@lru_cache`(maxsize=128)) to prevent unbounded retention
of distinct (base_url, api_key) pairs; update imports to bring in
functools.lru_cache if not present and keep the function signatures and return
values unchanged.

---

Duplicate comments:
In `@src/llm/tool_loop.py`:
- Around line 87-140: The streaming logic re-calls get_attempt_plan() inside the
retry wrapper which can switch back to a different client/model; compute the
winning AttemptPlan once and pin it for all retries by capturing it outside the
retry (or add a winning_plan parameter to stream_final_response) and use that
fixed plan inside _setup_stream (remove the inner call to get_attempt_plan());
ensure the retry wrapper and honcho_llm_call_inner use plan.provider,
plan.model, plan.client, plan.selected_config, plan.reasoning_effort, and
plan.thinking_budget_tokens from that pinned plan so streaming always targets
the successful attempt.

In `@tests/utils/test_clients.py`:
- Around line 1068-1121: The test only verifies provider_params reach
OpenAIBackend.complete but not that _build_params preserves them into the SDK
call; wrap or replace the mocked AsyncOpenAI client's chat.completions.create
with a capturing AsyncMock (e.g., capture_sdk_create or captured_sdk_extra dict)
that records kwargs (or kwargs.get("extra_params")) when invoked, keep the
existing patch of OpenAIBackend.complete, then after honcho_llm_call assert the
captured SDK kwargs include the "honcho_sentinel" == "zap" to ensure
_build_params passes the sentinel through to chat.completions.create.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: faf205a6-a2d7-4bd5-aeb5-36a013354c86

📥 Commits

Reviewing files that changed from the base of the PR and between ab4bccf and 7d106b7.

📒 Files selected for processing (17)
  • docs/v3/contributing/troubleshooting.mdx
  • src/dreamer/specialists.py
  • src/llm/api.py
  • src/llm/backends/gemini.py
  • src/llm/backends/openai.py
  • src/llm/caching.py
  • src/llm/conversation.py
  • src/llm/registry.py
  • src/llm/tool_loop.py
  • tests/deriver/test_queue_processing.py
  • tests/integration/test_enqueue.py
  • tests/llm/test_conversation.py
  • tests/llm/test_model_config.py
  • tests/routes/test_peers.py
  • tests/routes/test_queue_status.py
  • tests/routes/test_scoped_api.py
  • tests/utils/test_clients.py
💤 Files with no reviewable changes (4)
  • tests/routes/test_queue_status.py
  • tests/integration/test_enqueue.py
  • tests/routes/test_scoped_api.py
  • tests/routes/test_peers.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • tests/llm/test_conversation.py
  • src/llm/conversation.py
  • src/llm/caching.py
  • tests/llm/test_model_config.py
  • src/llm/backends/openai.py

Comment thread src/llm/registry.py Outdated
Comment thread src/llm/registry.py Outdated
@VVoruganti VVoruganti merged commit b65d03d into main Apr 20, 2026
9 checks passed
@VVoruganti VVoruganti deleted the vineeth/dev-1454-full branch April 20, 2026 06:46
jooray added a commit to jooray/honcho that referenced this pull request Apr 20, 2026
Replace hardcoded 1536-dimension pgvector columns with configurable
dimensions via EMBEDDING.VECTOR_DIMENSIONS setting. Drops the post-plastic-labs#459
validator guard that blocked non-1536 dims on pgvector, letting self-hosted
deployments use models like nomic-embed-text (768) with pgvector instead of
requiring LanceDB/Turbopuffer.

Ported from the pre-plastic-labs#459 feature branch onto current upstream main.
jooray added a commit to jooray/honcho that referenced this pull request Apr 22, 2026
Replace hardcoded 1536-dimension pgvector columns with configurable
dimensions via EMBEDDING.VECTOR_DIMENSIONS setting. Drops the post-plastic-labs#459
validator guard that blocked non-1536 dims on pgvector, letting self-hosted
deployments use models like nomic-embed-text (768) with pgvector instead of
requiring LanceDB/Turbopuffer.

Ported from the pre-plastic-labs#459 feature branch onto current upstream main.
jooray added a commit to jooray/honcho that referenced this pull request Apr 24, 2026
Replace hardcoded 1536-dimension pgvector columns with configurable
dimensions via EMBEDDING.VECTOR_DIMENSIONS setting. Drops the post-plastic-labs#459
validator guard that blocked non-1536 dims on pgvector, letting self-hosted
deployments use models like nomic-embed-text (768) with pgvector instead of
requiring LanceDB/Turbopuffer.

Re-ported onto upstream main (07e7a99).
fossouo pushed a commit to fossouo/honcho that referenced this pull request Apr 28, 2026


Remove MCP aws_rds_status tool, property-based tests, and lock file
to reduce scope per maintainer feedback. Rebased on merged plastic-labs#459.
MCP tools and PBT tests will be a follow-up PR.
jooray added a commit to jooray/honcho that referenced this pull request Apr 30, 2026
Replace hardcoded 1536-dimension pgvector columns with configurable
dimensions via EMBEDDING.VECTOR_DIMENSIONS setting. Drops the post-plastic-labs#459
validator guard that blocked non-1536 dims on pgvector, letting self-hosted
deployments use models like nomic-embed-text (768) with pgvector instead of
requiring LanceDB/Turbopuffer.

Re-ported onto upstream main (a05c2f8).
jooray added a commit to jooray/honcho that referenced this pull request May 1, 2026
Replace hardcoded 1536-dimension pgvector columns with configurable
dimensions via EMBEDDING.VECTOR_DIMENSIONS setting. Drops the post-plastic-labs#459
validator guard that blocked non-1536 dims on pgvector, letting self-hosted
deployments use models like nomic-embed-text (768) with pgvector instead of
requiring LanceDB/Turbopuffer.

Re-ported onto upstream main (f37338b).
jooray added a commit to jooray/honcho that referenced this pull request May 7, 2026
Replace hardcoded 1536-dimension pgvector columns with configurable
dimensions via EMBEDDING.VECTOR_DIMENSIONS setting. Drops the post-plastic-labs#459
validator guard that blocked non-1536 dims on pgvector, letting self-hosted
deployments use models like nomic-embed-text (768) with pgvector instead of
requiring LanceDB/Turbopuffer.

Re-ported onto upstream main (f37338b).
ranc1 added a commit to ranc1/honcho that referenced this pull request May 21, 2026
Re-port the fork's ACP provider hooks onto the new src/llm/ module
layout introduced upstream by PR plastic-labs#459, which deleted src/utils/clients.py
and split its responsibilities across src/llm/{registry,executor,credentials}.

Conflict resolutions:

- src/utils/types.py: dropped SupportedProviders Literal entirely (upstream
  killed it; ACP is now expressed via ModelTransport).
- src/config.py: kept upstream's per-transport BASE_URL fields, dropped the
  fork's VLLM_* knobs (use OPENAI_BASE_URL instead), kept ACP_GATEWAY_URL,
  ACP_TIMEOUT_MS, EMBEDDING_PROVIDER. Added "acp" to ModelTransport and
  "openrouter" to EmbeddingTransport. Default VECTOR_DIMENSIONS now 1024
  (matches qwen3-embedding:0.6b).
- src/embedding_client.py: kept the openrouter branch with qwen3-embedding
  default; updated to use upstream's `self.transport` (was `self.provider`)
  and OPENAI_API_KEY/OPENAI_BASE_URL settings, with default URL fallback to
  OpenRouter.
- src/models.py: accepted upstream's _VECTOR_DIM (configurable, defaults 1024).
- src/main.py: kept both fork's mcp_router import and upstream's
  validate_embedding_schema import.
- src/utils/clients.py: accepted upstream deletion. Re-ported the ACP
  dispatch hooks to src/llm/{registry,executor,runtime}.
- Dockerfile: kept fork's HEALTHCHECK + ENTRYPOINT (entrypoint.sh runs
  migrations, MCP, deriver, and API).
- README.md: kept fork's ACP-provider documentation.
- mcp/src/{index,config}.ts: kept fork's standalone Node.js HTTP server
  (over upstream's Cloudflare Workers fetch handler).

Re-ported ACP integration in src/llm/:

- registry.py: widened CLIENTS dict to dict[ModelTransport, ProviderClient |
  str]; register settings.LLM.ACP_GATEWAY_URL under "acp". client_for_model_config
  / backend_for_provider raise ValidationException/ValueError for "acp" since
  the executor short-circuits before backend dispatch.
- runtime.py: AttemptPlan.client widened to ProviderClient | str.
  resolve_backend_for_plan rejects ACP plans with a clear error.
- executor.py: client_override widened. Before backend dispatch,
  honcho_llm_call_inner short-circuits provider == "acp" to
  honcho_llm_call_inner_acp; on stream=True it yields a single is_done chunk
  carrying the final text (matches the original streaming behavior in
  src/utils/clients.py:handle_streaming_response).

src/utils/acp_provider.py is unchanged — self-contained module that imports
only httpx, pydantic, and stdlib.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants