refactor: replace mirascope with handrolled client#202
Conversation
WalkthroughReplaces mirascope decorator-driven LLM calls with an explicit honcho_llm_call surface, centralizes provider clients in CLIENTS, migrates Langfuse usage to module-level get_client(), introduces SupportedProviders and groq/tenacity deps, updates streaming to AsyncIterator, and extends tests for the new clients and streaming behavior. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant API as Router (/peers/chat)
participant Dial as dialectic.chat
participant LF as Langfuse (lf)
participant Clients as honcho_llm_call
participant Provider as LLM Provider
API->>Dial: chat(query, stream=True)
Dial->>LF: update_current_trace(metadata: dialectic_model)
Dial->>Clients: honcho_llm_call(prompt, provider, model, stream=True)
activate Clients
Clients->>Provider: create streaming completions
Provider-->>Clients: stream chunks
Clients-->>Dial: AsyncIterator<HonchoLLMCallStreamChunk>
deactivate Clients
loop stream
Dial-->>API: chunk.content
end
sequenceDiagram
autonumber
participant Deriver as deriver.derive_new_insights
participant Prompts as prompts.critical_analysis_prompt
participant Clients as honcho_llm_call
participant LF as Langfuse (lf)
participant Provider as LLM Provider
Deriver->>Prompts: build prompt(...)
Prompts-->>Deriver: prompt (string)
Deriver->>Clients: honcho_llm_call(prompt, response_model?, json_mode)
activate Clients
Clients->>Provider: provider-specific call
Provider-->>Clients: response (text/structured)
Clients-->>Deriver: response.content
deactivate Clients
Deriver->>LF: update_current_generation(input=prompt, output=response.content)
Deriver-->>Deriver: parse JSON / extract thinking
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (2){src,tests}/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (1)📓 Common learnings🧬 Code graph analysis (1)tests/utils/test_clients.py (1)
🔇 Additional comments (7)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.13.1)tests/utils/test_clients.py�[1;31mruff failed�[0m Comment |
There was a problem hiding this comment.
Actionable comments posted: 26
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (21)
src/utils/logging.py (4)
34-41: Guard the langfuse import; avoid crashing when key is set but package is missing.If LANGFUSE_PUBLIC_KEY is present but langfuse isn't installed/available, importing will raise and break function definition.
- if settings.LANGFUSE_PUBLIC_KEY: - # Import here to avoid circular imports and only import when needed - from langfuse import observe # pyright: ignore - - return observe()(func) + if settings.LANGFUSE_PUBLIC_KEY: + # Import here to avoid circular imports and only import when needed + try: + from langfuse import observe # pyright: ignore + except Exception as e: # ImportError or runtime import issues + logger.warning("Langfuse enabled but import failed: %s", e) + return func + return observe()(func)Add a module logger (outside this range):
from logging import getLogger logger = getLogger(__name__)
244-255: Fix KeyError and docstring mismatch in log_performance_metrics.
- Calling clear() on a missing task_name key raises KeyError.
- Docstring refers to a dict, but the param is a list of tuples.
- """ - Log performance metrics in a clean table. - Args: - metrics: Dictionary of metric names and (value, unit) tuples - title: Table title - """ - if not accumulated_metrics.get(task_name) and not metrics: - return - if metrics is None: - metrics = [] - metrics = accumulated_metrics.get(task_name, []) + metrics - accumulated_metrics[task_name].clear() + """ + Log performance metrics in a clean table. + Args: + task_name: Logical task key for accumulated metrics + metrics: Optional list of (metric, value, unit) tuples to print now + title: Table title + """ + if metrics is None: + metrics = [] + accumulated = accumulated_metrics.pop(task_name, []) + if not accumulated and not metrics: + return + metrics = accumulated + metrics
266-273: Avoid TypeError when formatting non-numeric values.Value can be str per annotation; formatting with :0f/:3f would raise.
- for metric, value, unit in metrics: - if unit == "ms": - formatted_value = f"{value:.0f}" - elif unit == "s": - formatted_value = f"{value:.3f}" - else: - formatted_value = str(value) + for metric, value, unit in metrics: + if isinstance(value, (int, float)): + if unit == "ms": + formatted_value = f"{value:.0f}" + elif unit == "s": + formatted_value = f"{value:.3f}" + else: + formatted_value = str(value) + else: + formatted_value = str(value)
192-199: Consider integrating rich output with standard logging.console.print writes to stdout and bypasses structured logging/context. For production observability, prefer logging handlers (e.g., RichHandler) or gate these prints behind a dev flag.
src/dialectic/prompts.py (4)
100-135: Tighten JSON validity instructions (reduce LLM formatting drift).Explicitly require strict JSON to lower parse failures.
-Respond with 3-5 search queries as a JSON object with a "queries" field containing an array of strings. Each query should target different aspects or reasoning levels to maximize retrieval coverage. +Respond with 3-5 search queries as a strict JSON object with a "queries" field containing an array of strings. Each query should target different aspects or reasoning levels to maximize retrieval coverage. @@ -Format: `{{"queries": ["query1", "query2", "query3"]}}` - -No markdown, no explanations, just the JSON object. +Format: `{{"queries": ["query1", "query2", "query3"]}}` + +Constraints: +- Strict JSON only (double-quoted keys/strings, no comments, no trailing commas). +- No markdown, no explanations—just the JSON object.
27-46: Handle None safely and complete the docstring for dialectic_prompt.
- recent_conversation_history and working_representation can be None; current output will render "None" in the prompt.
- Docstring is missing several parameters and should follow Google style per repo guidelines.
Example updated function (apply adaptively):
def dialectic_prompt( query: str, working_representation: str | None, recent_conversation_history: str | None, additional_context: str | None, peer_name: str, peer_card: list[str] | None, target_name: str | None = None, target_peer_card: list[str] | None = None, ) -> str: """ Generate the main dialectic prompt for context synthesis. Args: query: Application question/request about the user. working_representation: Current session conclusions (may be None). recent_conversation_history: Recent dialog context (may be None). additional_context: Historical/global conclusions (may be None). peer_name: Name of the user/peer being queried. peer_card: Biographical lines for the user (may be None). target_name: Optional name of the target peer. target_peer_card: Optional biographical lines for the target peer. Returns: Formatted prompt string for the dialectic model. """ def _join(lines: list[str] | None) -> str: return "\n".join(lines) if lines else "(none)" if target_name: query_target = f"""The query is about user {peer_name}'s understanding of {target_name}. The user's known biographical information: {_join(peer_card)} The target's known biographical information: {_join(target_peer_card)} If the user's name or nickname is known, exclusively refer to them by that name. If the target's name or nickname is known, exclusively refer to them by that name. """ else: query_target = f"""The query is about user {peer_name}. The user's known biographical information: {_join(peer_card)} If the user's name or nickname is known, exclusively refer to them by that name. """ return c( f""" ... (unchanged preamble) ... {query_target} <recent_conversation_history> {(recent_conversation_history or "").strip()} </recent_conversation_history> <query>{query}</query> <working_representation>{(working_representation or "").strip()}</working_representation> {f"<global_context>{additional_context}</global_context>" if additional_context else ""}""" )Also applies to: 76-83
31-35: Use '\n' instead of chr(10) for readability.Semantically identical; clearer to readers.
-{chr(10).join(peer_card) if peer_card else "(none)"} +{\n.join(peer_card) if peer_card else "(none)"} @@ -{chr(10).join(target_peer_card) if target_peer_card else "(none)"} +{\n.join(target_peer_card) if target_peer_card else "(none)"}Also applies to: 43-46
86-97: Docstring params out of sync for query_generation_prompt.Mentions peer_name/target_name that aren’t parameters. Align with function signature.
src/utils/types.py (1)
1-4: Declare the alias explicitly with TypeAlias (pyright-friendly).Improves clarity for basedpyright and prevents misinterpretation as a value.
-from typing import Literal +from typing import Literal, TypeAlias @@ -SupportedProviders = Literal["anthropic", "openai", "google", "groq", "custom"] +SupportedProviders: TypeAlias = Literal["anthropic", "openai", "google", "groq", "custom"]src/utils/embedding_store.py (2)
64-64: Consistency nit: traced scope label typo."ed_embedding_store.save_unified_observations" looks accidental; align with file name for consistent tracing.
- async with tracked_db("ed_embedding_store.save_unified_observations") as db: + async with tracked_db("embedding_store.save_unified_observations") as db:
97-105: Centralize direct OpenAI embedding calls
Remove the two directembedding_client.client.embeddings.createinvocations in src/utils/embedding_store.py (lines 100 and 428) and route both loops through a single helper in src/embedding_client (e.g.batch_embed_texts(texts: list[str]) -> list[list[float]]) to ensure consistent model/version handling, retries/timeouts, and testability.src/deriver/consumer.py (1)
51-56: Wrap Langfuse trace calls in try/except and use precise ignores
- In both summary (consumer.py 51–56) and representation (66–71) branches, keep the
if settings.LANGFUSE_PUBLIC_KEYcheck, but wrap thelf.update_current_tracecall in atry/except Exceptionand log failures at debug level.- Replace the broad
# type: ignorewith# type: ignore[attr-defined]for both calls.src/routers/peers.py (2)
195-197: Improve error logging and avoid leaking internal messages.Use logger.exception for stack traces and return a generic error to clients.
- except Exception as e: - logger.error(f"Error in stream: {str(e)}") - raise HTTPException(status_code=500, detail=str(e)) from e + except Exception as e: + logger.exception("Error in stream", extra={"workspace_id": workspace_id, "peer_id": peer_id}) + raise HTTPException(status_code=500, detail="Stream failed") from e
199-201: Add SSE headers to reduce buffering.Helps proxies and clients stream promptly.
- return StreamingResponse( - content=parse_stream(), media_type="text/event-stream", status_code=200 - ) + return StreamingResponse( + content=parse_stream(), + media_type="text/event-stream", + status_code=200, + headers={ + "Cache-Control": "no-cache", + "X-Accel-Buffering": "no", + "Connection": "keep-alive", + }, + )src/dialectic/utils.py (4)
80-97: Harden Langfuse updates (JSON-serializable payload + isolation).
- Ensure payloads are JSON-serializable (use model_dump()).
- Don’t let telemetry failures affect product flow.
- if settings.LANGFUSE_PUBLIC_KEY: - lf.update_current_generation( + if settings.LANGFUSE_PUBLIC_KEY: + lf = get_client() + try: + lf.update_current_generation( input={ "query": query, "include_premises": include_premises, }, output={ - "search_queries": search_queries_result, + "search_queries": search_queries_result.model_dump(), "all_results": all_results, "unique_observations": unique_observations, }, - ) - - lf.update_current_trace( - metadata={ - "search_queries": search_queries, - "observations_retrieved": unique_observations, - } - ) + ) + lf.update_current_trace( + metadata={ + "search_queries": search_queries, + "observations_retrieved_count": len(unique_observations), + } + ) + except Exception: + logger.debug("Langfuse update skipped due to client error", exc_info=True)
65-72: Avoid duplicate query entry.Guard against the original query already appearing in the model output.
- # Include the original query in the search queries - search_queries.append(query) + if query not in search_queries: + search_queries.append(query)
232-247: Tighten LLM call: enable JSON mode + add tracking + (optional) provider cast.
- json_mode=True increases robustness for structured output.
- track_name adds traceability.
- If basedpyright complains about provider type, cast to SupportedProviders.
async def generate_semantic_queries( query: str, target_peer_name: str ) -> SemanticQueries: """Generate semantic search queries for observation retrieval.""" prompt = query_generation_prompt(query, target_peer_name) response = await honcho_llm_call( - provider=settings.DIALECTIC.QUERY_GENERATION_PROVIDER, + provider=settings.DIALECTIC.QUERY_GENERATION_PROVIDER, model=settings.DIALECTIC.QUERY_GENERATION_MODEL, prompt=prompt, max_tokens=settings.LLM.DEFAULT_MAX_TOKENS, + track_name="dialectic.generate_semantic_queries", response_model=SemanticQueries, + json_mode=True, enable_retry=True, retry_attempts=3, ) return response.contentIf strict typing is desired, also update imports at the top:
from typing import Any, cast from src.utils.clients import honcho_llm_call, SupportedProvidersand cast provider:
provider=cast(SupportedProviders, settings.DIALECTIC.QUERY_GENERATION_PROVIDER),
35-48: Docstring param completeness.Document target_peer_name in get_observations’ docstring for clarity.
src/dialectic/chat.py (1)
343-346: Nit: remove stale comment and unnecessary str() cast
responseis already astr; the Anthropic wrapper note is obsolete.- log_performance_metrics(f"dialectic_chat_{dialectic_chat_uuid}") - # Convert AnthropicCallResponse to string for compatibility - return str(response) + log_performance_metrics(f"dialectic_chat_{dialectic_chat_uuid}") + return responsesrc/utils/summarizer.py (1)
350-361: Fix metrics: avoid non-numeric value and guard empty messages
accumulate_metriclikely expects numeric values; passingf"{summary_tokens} (est.)"risks runtime issues. Also, deriving metric keys frommessages[-1]can crash on empty lists.- response: HonchoLLMCallResponse[str] | None = None + response: HonchoLLMCallResponse[str] | None = None @@ - summary_tokens = response.output_tokens + summary_tokens = response.output_tokens @@ - accumulate_metric( - f"summary_{messages[-1].workspace_name}_{messages[-1].id}", + workspace = messages[-1].workspace_name if messages else "unknown" + last_id = messages[-1].id if messages else 0 + accumulate_metric( + f"summary_{workspace}_{last_id}", f"{summary_type.name}_summary_size", - response.output_tokens if response else f"{summary_tokens} (est.)", + response.output_tokens if response else summary_tokens, "tokens", )Also applies to: 380-385
src/deriver/deriver.py (1)
362-378: Thinking extraction likely always None nowGiven the new client surface,
ReasoningResponsewon’t naturally carry_responseorthinking. If “thinking” matters operationally (it’s logged and sent to Langfuse), we should explicitly pass it from the client layer rather than rely on introspection.If acceptable, I can open a follow-up to add an optional
thinkingfield toHonchoLLMCallResponseand set it for OpenAI reasoning models; we then consume it here.Also applies to: 385-397
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
docs/v2/documentation/introduction/vibecoding.mdx(0 hunks)docs/v2/documentation/reference/platform.mdx(2 hunks)pyproject.toml(3 hunks)src/config.py(4 hunks)src/crud/representation.py(1 hunks)src/deriver/consumer.py(4 hunks)src/deriver/deriver.py(6 hunks)src/deriver/prompts.py(0 hunks)src/dialectic/chat.py(8 hunks)src/dialectic/prompts.py(1 hunks)src/dialectic/utils.py(5 hunks)src/routers/peers.py(2 hunks)src/utils/clients.py(1 hunks)src/utils/embedding_store.py(3 hunks)src/utils/logging.py(1 hunks)src/utils/summarizer.py(10 hunks)src/utils/types.py(1 hunks)tests/bench/peer_card_bench.py(2 hunks)tests/conftest.py(2 hunks)
💤 Files with no reviewable changes (2)
- src/deriver/prompts.py
- docs/v2/documentation/introduction/vibecoding.mdx
🧰 Additional context used
📓 Path-based instructions (4)
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright
Files:
src/utils/logging.pysrc/utils/types.pysrc/config.pysrc/crud/representation.pysrc/utils/clients.pysrc/dialectic/prompts.pysrc/dialectic/utils.pysrc/deriver/consumer.pytests/bench/peer_card_bench.pysrc/utils/embedding_store.pysrc/dialectic/chat.pytests/conftest.pysrc/deriver/deriver.pysrc/utils/summarizer.pysrc/routers/peers.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)
Files:
src/utils/logging.pysrc/utils/types.pysrc/config.pysrc/crud/representation.pysrc/utils/clients.pysrc/dialectic/prompts.pysrc/dialectic/utils.pysrc/deriver/consumer.pysrc/utils/embedding_store.pysrc/dialectic/chat.pysrc/deriver/deriver.pysrc/utils/summarizer.pysrc/routers/peers.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Write tests using pytest
Files:
tests/bench/peer_card_bench.pytests/conftest.py
tests/conftest.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place shared pytest fixtures in tests/conftest.py
Files:
tests/conftest.py
🧬 Code graph analysis (6)
src/utils/clients.py (3)
src/dialectic/chat.py (1)
chat(152-345)src/routers/peers.py (1)
chat(156-201)tests/conftest.py (3)
client(168-189)async_wrapper(460-466)sync_wrapper(468-474)
src/dialectic/utils.py (3)
src/utils/shared_models.py (1)
SemanticQueries(179-184)src/dialectic/prompts.py (1)
query_generation_prompt(86-135)src/utils/clients.py (4)
honcho_llm_call(102-118)honcho_llm_call(122-137)honcho_llm_call(141-156)honcho_llm_call(159-221)
tests/bench/peer_card_bench.py (3)
src/deriver/prompts.py (1)
peer_card_prompt(102-158)src/utils/clients.py (4)
honcho_llm_call(102-118)honcho_llm_call(122-137)honcho_llm_call(141-156)honcho_llm_call(159-221)src/utils/shared_models.py (1)
PeerCardQuery(196-205)
src/dialectic/chat.py (2)
src/utils/clients.py (5)
HonchoLLMCallStreamChunk(86-98)honcho_llm_call(102-118)honcho_llm_call(122-137)honcho_llm_call(141-156)honcho_llm_call(159-221)src/dialectic/prompts.py (1)
dialectic_prompt(4-83)
src/deriver/deriver.py (3)
src/utils/shared_models.py (2)
ReasoningResponse(91-101)PeerCardQuery(196-205)src/deriver/prompts.py (2)
critical_analysis_prompt(12-99)peer_card_prompt(102-158)src/utils/clients.py (4)
honcho_llm_call(102-118)honcho_llm_call(122-137)honcho_llm_call(141-156)honcho_llm_call(159-221)
src/utils/summarizer.py (2)
src/utils/clients.py (5)
HonchoLLMCallResponse(70-83)honcho_llm_call(102-118)honcho_llm_call(122-137)honcho_llm_call(141-156)honcho_llm_call(159-221)src/models.py (1)
Message(173-230)
🪛 LanguageTool
docs/v2/documentation/reference/platform.mdx
[grammar] ~74-~74: There might be a mistake here.
Context: ...tive utilities page. Next to the Peer name you can edit the [Global Peer Configura...
(QB_NEW_EN_OTHER)
[grammar] ~74-~74: Use correct spacing
Context: ...w, explore all utilities for the Peer.
Utilities include: - Message search ...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~94-~94: Use correct spacing
Context: ... a Session to open its utilities page.
Here you can: - *View and add Messages...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
🔇 Additional comments (27)
src/dialectic/prompts.py (1)
1-1: All mirascope decorators removed – no@prompt_templateorfrom mirascopereferences detected; prompt functions are now plain.src/utils/types.py (1)
3-3: Provider strings are consistent
No lingeringProvidersalias; the"google"literal inSupportedProviders,config.py, andclients.pyaligns across runtime configuration and client initialization.src/utils/embedding_store.py (1)
73-76: Harden Langfuse update_current_generation callpyproject.toml already pins langfuse≥3.3.2, so
update_current_generationis guaranteed to exist—wrap this call in a client‐existence check and swallow any exceptions to avoid flaky tests:- lf.update_current_generation( - input={"observations": [obs.model_dump() for obs in observations]}, - output={"unique_conclusions": unique_conclusions}, - ) + if lf: + try: + lf.update_current_generation( # type: ignore[attr-defined] + input={"observations": [obs.model_dump() for obs in observations]}, + output={"unique_conclusions": unique_conclusions}, + ) + except Exception as exc: + logger.debug("Langfuse update_current_generation failed: %s", exc)Likely an incorrect or invalid review comment.
pyproject.toml (2)
12-12: Groq client addition looks good.No issues spotted; matches Dialectic QUERY_GENERATION_PROVIDER default in config.
30-30: API key plumbing for google-genai is correct. settings.LLM.GEMINI_API_KEY is loaded from LLM_GEMINI_API_KEY and passed directly to genai.client.Client(api_key=…), so no reliance on GOOGLE_API_KEY.src/config.py (2)
190-191: LGTM on provider type migrations.Annotations align with the new multi-provider surface; no functional changes.
Also applies to: 197-198, 219-225, 246-247
17-17: SupportedProviders includes all default providers.tests/conftest.py (1)
305-371: No changes needed to mock dialectic_stream or add embeddings stub
dialectic_stream isn’t implemented as an async generator nor iterated withasync for, so returning an AsyncMock works; and there’s nosrc.embedding_client…embeddings.createcall to stub.Likely an incorrect or invalid review comment.
tests/bench/peer_card_bench.py (2)
145-147: LGTM on prompt construction.Prompt assembly matches the prompt API and keeps inputs explicit.
161-162: Return model content directly.Returning the parsed model keeps call sites simple and typed.
src/routers/peers.py (2)
2-2: OK to import AsyncIterator.Switching to collections.abc.AsyncIterator is the right runtime check target.
169-179: Remove suggestion: non-streamdialectic_chatalready returns a string
async def chatends withreturn str(response), so in the non-stream pathresponseis the content itself, not a wrapper. Wrappingstr(response)inDialecticResponseis correct—no change needed.Likely an incorrect or invalid review comment.
src/dialectic/chat.py (4)
39-41: Module-level Langfuse client initialization looks goodCreating a single
lfviaget_client()and gating usage onsettings.LANGFUSE_PUBLIC_KEYis sensible.
152-160: Return type change is fine; verify callers handle AsyncIterator
chat()now returnsstr | AsyncIterator[HonchoLLMCallStreamChunk]. Confirm all call sites (e.g., routers/handlers) branch onstream=Trueand iterate chunks appropriately.
193-199: Nice: richer Langfuse metadataAdding
dialectic_modelalongside query-generation fields improves trace clarity.
66-75: Confirm provider typing compatibilityType-check tool wasn’t found in the sandbox—manually verify that
settings.DIALECTIC.PROVIDERis one of theSupportedProvidersenum/literal values so your type checker won’t flag astrvs literal mismatch.src/utils/summarizer.py (3)
5-5: Good call using cleandoc aliascKeeps prompt templates tight and readable.
80-85: Public return types updated to HonchoLLMCallResponse[str]—goodThis enables token accounting downstream.
Also applies to: 130-134
97-121: Prompts are clear and scoped; nice upgradeInclusive-of-previous-summary instruction will reduce drift. No issues spotted.
Also applies to: 143-169
src/deriver/deriver.py (3)
51-51: Module-level Langfuse client is consistent with the new patternMatches the rest of the refactor.
54-86: Return only parsed content—verify “thinking” preservation
critical_analysis_callreturnsReasoningResponseviaresponse.content. Downstream (derive_new_insights) tries to pullthinkingfrom hidden attributes on the object. Confirmhoncho_llm_callattaches provider “thinking” to the parsed model (e.g., via a private attribute) or we’ll silently lose it. If not attached, consider returning the fullHonchoLLMCallResponse[ReasoningResponse]and extending it to carry optionalthinking.Would you like a patch to plumb an optional
thinking: str | Nonefield throughHonchoLLMCallResponse?
320-328: Langfuse generation input/output updates look correctNice integration points for reasoning traces.
Also applies to: 411-413
src/utils/clients.py (5)
65-67: Good validation of required providers at module load timeThis proactive validation ensures all required LLM providers are configured before the application starts, preventing runtime failures. This is a solid defensive programming practice.
190-194: Verify retry configuration parameters are appropriateThe retry configuration uses an exponential backoff with multiplier=1, min=4s, max=10s. This means wait times of 4s, 4s, 8s (clamped to max=10s). Consider if these values are appropriate for your use case.
Are these retry parameters (3 attempts, 4-10s waits) aligned with your SLA requirements and typical LLM provider response times?
343-351: Excellent handling of GPT-5 specific parametersThe conditional logic properly differentiates between GPT-5 models (using
max_completion_tokens) and other models (usingmax_tokens), while also supporting GPT-5 specific features likereasoning_effortandverbosity.
479-502: Well-structured streaming response handlerThe
_handle_streaming_responsefunction provides a clean abstraction for handling streaming across different providers. The docstring is comprehensive and the parameter handling is consistent.
642-694: Clean implementation of retry decoratorThe retry decorator properly handles both sync and async functions with appropriate sleep mechanisms. The implementation is well-structured and follows Python best practices.
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
src/crud/representation.py (1)
18-22: Also mark remaining module constants as Final.Keep consistency for constants used as keys.
-# The key for the working representation in the session peer's internal_metadata -WORKING_REPRESENTATION_METADATA_KEY = "working_representation" +# The key for the working representation in the session peer's internal_metadata +WORKING_REPRESENTATION_METADATA_KEY: Final[str] = "working_representation" -# Old working representation key--remove in 2.3.0? -WORKING_REPRESENTATION_LEGACY_METADATA_KEY = "global_representation" +# Old working representation key--remove in 2.3.0? +WORKING_REPRESENTATION_LEGACY_METADATA_KEY: Final[str] = "global_representation"src/dialectic/chat.py (1)
345-347: Avoid redundant str() conversion.response is already a string from dialectic_call.
- # Convert AnthropicCallResponse to string for compatibility - return str(response) + return responsesrc/deriver/deriver.py (6)
314-321: Same: guard Langfuse generation update behind client presenceReplace
if settings.LANGFUSE_PUBLIC_KEY:withif lf:.
356-371: JSON fallback likely dead after structured return; either remove or ensure it's reachableGiven
critical_analysis_callcurrently returnsReasoningResponse, this branch won’t run. If you keep provider-gated unstructured calls, leave it; otherwise remove for simplicity.
444-448: Type mismatch passed to _save_new_observationsYou pass
ReasoningResponseWithThinkingwhile the callee expectsReasoningResponse. Works at runtime but will fail type-checking.Apply:
- async def _save_new_observations( - self, - original_working_representation: ReasoningResponse, - revised_observations: ReasoningResponse, + async def _save_new_observations( + self, + original_working_representation: ReasoningResponseWithThinking, + revised_observations: ReasoningResponseWithThinking, ) -> dict[str, list[str]]:…and update the return type below (see next comment).
487-491: Return type under-declares actual element types
new_observations_by_levelcan holdDeductiveObservationentries, not onlystr.Apply:
- new_observations_by_level: dict[str, list[str]] = find_new_observations( + new_observations_by_level: dict[str, list[str | DeductiveObservation]] = find_new_observations( original_working_representation, revised_observations )Optionally introduce a
type aliasfor readability.
553-566: Handle string peer-card responses defensivelyIf
peer_card_callever returns a string (e.g., Anthropic path),.cardwill raise. Parse when needed.Apply:
- response = await peer_card_call(old_peer_card, new_observations) - new_peer_card = response.card + response = await peer_card_call(old_peer_card, new_observations) + if isinstance(response, str): + try: + parsed = PeerCardQuery.model_validate_json(response) + except Exception: + logger.info("No changes to peer card") + return + new_peer_card = parsed.card + else: + new_peer_card = response.card
61-86: Gate structured output by provider
- In
src/deriver/deriver.py(lines 61–86), add a flag before callinghoncho_llm_callto disableresponse_modelwhensettings.DERIVER.PROVIDER == "anthropic".- Pass
response_model=ReasoningResponse if use_structured else Nonein the call.- Update the function’s return annotation to
ReasoningResponse | strto reflect the possible string fallback.
♻️ Duplicate comments (13)
src/crud/representation.py (1)
15-15: Constant annotated as Final — resolved.GLOBAL_REPRESENTATION_COLLECTION_NAME is now immutable as suggested previously.
pyproject.toml (1)
26-26: Bump Langfuse to a version exposing update_current_trace.chat.py uses lf.update_current_trace; this isn’t available in 3.3.2. Raise the floor.
- "langfuse>=3.3.2", + "langfuse>=3.4.0",src/dialectic/chat.py (1)
90-92: Redact/truncate prompts in logs to reduce PII exposure.Log a bounded slice and use formatter placeholders.
- logger.debug("=== DIALECTIC PROMPT ===") - logger.debug(prompt) - logger.debug("=== END DIALECTIC PROMPT ===") + logger.debug("=== DIALECTIC PROMPT (truncated) ===") + logger.debug("%s", (prompt[:2000] + "…") if len(prompt) > 2000 else prompt) + logger.debug("=== END DIALECTIC PROMPT ===")- logger.debug("=== DIALECTIC PROMPT (STREAM) ===") - logger.debug(prompt) - logger.debug("=== END DIALECTIC PROMPT ===") + logger.debug("=== DIALECTIC PROMPT (STREAM, truncated) ===") + logger.debug("%s", (prompt[:2000] + "…") if len(prompt) > 2000 else prompt) + logger.debug("=== END DIALECTIC PROMPT ===")Also applies to: 145-147
src/deriver/deriver.py (1)
51-52: Avoid module-level client init; gate on config or lazy-initModule-level
lf = get_client()can race at import time and is unnecessary if Langfuse is disabled. Gate on settings or lazy-init.Apply:
-lf = get_client() +lf = get_client() if settings.LANGFUSE_PUBLIC_KEY else NoneAnd guard uses:
- if settings.LANGFUSE_PUBLIC_KEY: + if lf: lf.update_current_trace( output=format_reasoning_response_as_markdown(final_observations) )Also apply the same
if lf:guard at Lines 314-321 and 404-406.src/utils/clients.py (9)
27-28: Avoid module-level Langfuse client initializationSame concern: prefer lazy-init or gating to prevent import-time side effects.
Apply:
-lf = get_client() +lf = get_client() if settings.LANGFUSE_PUBLIC_KEY else None
175-177: Improve error message for missing runtime clientMirror the earlier error improvement here as well.
Apply:
- if not client: - raise ValueError(f"Missing client for {provider}") + if not client: + available = ", ".join(sorted(CLIENTS.keys())) or "none" + raise ValueError(f"Missing client for {provider}. Available: {available}")
315-319: Anthropic JSON-mode hack is brittleAppending
"{"as assistant content is fragile. Prefer a system instruction for JSON-only or a robust extractor.Apply:
- if json_mode: - anthropic_params["messages"].append( - {"role": "assistant", "content": "{"} - ) + if json_mode: + anthropic_params["system"] = "Respond with valid JSON only."Also update the streaming branch (Lines 517-519).
404-422: Gemini: surface None content as error rather than empty stringRaising helps callers differentiate empty content vs. transport/model issues.
Apply:
- text_content = gemini_response.text if gemini_response.text else "" + if gemini_response.text is None: + raise ValueError("No text content in Gemini response") + text_content = gemini_response.text
582-599: Gemini streaming likely blocks the event loopUse the async streaming API to avoid blocking.
Apply:
- if response_model is not None: - response_stream = client.models.generate_content_stream( + if response_model is not None: + response_stream = await client.aio.models.generate_content_stream( # type: ignore[attr-defined] model=params["model"], contents=prompt_text, config={ "response_mime_type": "application/json", "response_schema": response_model, }, ) else: - response_stream = client.models.generate_content_stream( + response_stream = await client.aio.models.generate_content_stream( # type: ignore[attr-defined] model=params["model"], contents=prompt_text, config={ "response_mime_type": "application/json" if json_mode else None, }, ) @@ - final_chunk = None - for chunk in response_stream: + final_chunk = None + async for chunk in response_stream: if chunk.text: yield HonchoLLMCallStreamChunk(content=chunk.text) final_chunk = chunk
721-749: Avoid shadowing built-ins (min,max)Rename to prevent confusion and improve readability.
Apply:
-def wait_exponential( - *, multiplier: float = 1.0, min: float | None = None, max: float | None = None +def wait_exponential( + *, multiplier: float = 1.0, min_wait: float | None = None, max_wait: float | None = None ) -> Callable[[int], float]: @@ - clamped_min = float(min) if min is not None else None - clamped_max = float(max) if max is not None else None + clamped_min = float(min_wait) if min_wait is not None else None + clamped_max = float(max_wait) if max_wait is not None else NoneAnd update the caller (Line 193):
- wait=wait_exponential(multiplier=1, min=4, max=10), + wait=wait_exponential(multiplier=1, min_wait=4, max_wait=10),
751-757: Ensure Langfuse generation is closed on success/errorAvoid dangling generations.
Apply:
def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]: @wraps(func) async def wrapper(*args: Any, **kwargs: Any) -> Any: - lf.start_as_current_generation(name="LLM Call") - return await func(*args, **kwargs) + if lf: + generation = lf.start_as_current_generation(name="LLM Call") + try: + result = await func(*args, **kwargs) + if lf: + generation.end() # type: ignore[reportAttributeAccessIssue] + return result + except Exception as e: + if lf: + generation.end(error=str(e)) # type: ignore[reportAttributeAccessIssue] + raise
101-119: Overload consistency forstreamtypingMinor: consider consistent use of
Literal[True, False]across overloads or keep as-is but add a short docstring clarifying the overload behavior.Also applies to: 141-157
65-67: Improve missing-client error to aid opsInclude available providers to speed up misconfig triage.
Apply:
- if provider_value not in CLIENTS: - raise ValueError(f"Missing client for {provider_name}: {provider_value}") + if provider_value not in CLIENTS: + available = ", ".join(sorted(CLIENTS.keys())) or "none" + raise ValueError( + f"Missing client for {provider_name}: {provider_value}. " + f"Available providers: {available}" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
pyproject.toml(3 hunks)src/crud/representation.py(2 hunks)src/deriver/deriver.py(6 hunks)src/deriver/prompts.py(0 hunks)src/dialectic/chat.py(8 hunks)src/utils/clients.py(1 hunks)
💤 Files with no reviewable changes (1)
- src/deriver/prompts.py
🧰 Additional context used
📓 Path-based instructions (2)
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright
Files:
src/crud/representation.pysrc/utils/clients.pysrc/dialectic/chat.pysrc/deriver/deriver.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)
Files:
src/crud/representation.pysrc/utils/clients.pysrc/dialectic/chat.pysrc/deriver/deriver.py
🧬 Code graph analysis (1)
src/utils/clients.py (2)
src/routers/peers.py (1)
chat(156-201)tests/conftest.py (2)
async_wrapper(460-466)sync_wrapper(468-474)
🔇 Additional comments (3)
src/crud/representation.py (1)
2-2: Importing Final is correct.Matches the coding guideline and enables immutability signaling.
pyproject.toml (1)
97-97: Cleanup looks good.Dropping "langfuse.decorators" from allowedUntypedLibraries matches the codebase.
src/dialectic/chat.py (1)
160-160: Return type update to AsyncIterator for streaming is correct.Matches the new streaming surface.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/deriver/deriver.py (1)
358-376: Consider simplifying response parsing logicThe response parsing logic handles both string and object responses, but this dual path increases complexity. Consider standardizing on one approach.
The current implementation correctly handles both string and structured responses from the LLM call. However, since
honcho_llm_callis configured withresponse_model=ReasoningResponse, it should always return a properly typed response. Consider asserting the expected type:- # If response is a string, try to parse as JSON - if isinstance(response_obj, str): - try: - response_data = json.loads(response_obj) - new_insights = ReasoningResponse( - explicit=response_data.get("explicit", []), - deductive=[ - DeductiveObservation(**item) - for item in response_data.get("deductive", []) - ], - ) - except (json.JSONDecodeError, KeyError, TypeError) as e: - if settings.SENTRY.ENABLED: - sentry_sdk.capture_exception(e) - logger.warning("Failed to parse string response as JSON: %s", e) - new_insights = ReasoningResponse(explicit=[], deductive=[]) - else: - # If response is already a ReasoningResponse object - new_insights = response_obj + # Response should always be a ReasoningResponse when response_model is set + if not isinstance(response_obj, ReasoningResponse): + logger.error("Unexpected response type: %s", type(response_obj)) + raise TypeError(f"Expected ReasoningResponse, got {type(response_obj)}") + new_insights = response_obj
♻️ Duplicate comments (14)
src/deriver/deriver.py (6)
54-86: Add return type annotation to critical_analysis_callThe function is missing an explicit return type annotation, which would improve type safety and documentation.
Apply this diff to add the return type:
async def critical_analysis_call( peer_id: str, peer_card: list[str] | None, message_created_at: datetime.datetime, working_representation: str | None, history: str, new_turn: str, ) -> ReasoningResponse:
316-324: Use consistent Langfuse client checkUpdate this check to match the conditional initialization pattern.
Apply this diff:
- if settings.LANGFUSE_PUBLIC_KEY: + if lf: lf.update_current_generation(
51-52: Guard Langfuse client initialization with conditional checkThe Langfuse client is initialized unconditionally but should only be created when Langfuse is enabled.
Apply this diff to conditionally initialize the client:
-lf = get_client() - +lf = get_client() if settings.LANGFUSE_PUBLIC_KEY else NoneThen update usage sites to check if the client exists:
- if settings.LANGFUSE_PUBLIC_KEY: + if lf: lf.update_current_trace(
88-115: Add comprehensive docstring and verify provider compatibilityThe function has a minimal docstring and may fail with Anthropic if it doesn't support response_model.
Apply this diff to improve documentation and add provider validation:
async def peer_card_call( old_peer_card: list[str] | None, new_observations: list[str], ) -> PeerCardQuery: """ - Generate peer card prompt, call LLM with response model. + Update a peer card from old card + new observations via LLM. + + Args: + old_peer_card: Existing peer card lines or None. + new_observations: New observations to merge. + + Returns: + PeerCardQuery: Structured peer card result from the model. + + Raises: + NotImplementedError: If the configured provider doesn't support response models. """ prompt = peer_card_prompt( old_peer_card=old_peer_card, new_observations=new_observations, ) + + # Verify provider supports response_model + if settings.DERIVER.PEER_CARD_PROVIDER == "anthropic": + raise NotImplementedError( + "Anthropic provider doesn't support response_model for peer card generation" + )
286-289: Use consistent Langfuse client checkFollowing the suggested conditional initialization pattern, update this check for consistency.
Apply this diff after implementing the conditional initialization:
- if settings.LANGFUSE_PUBLIC_KEY: + if lf: lf.update_current_trace(
406-409: Use consistent Langfuse client checkUpdate this check to match the conditional initialization pattern.
Apply this diff:
- if settings.LANGFUSE_PUBLIC_KEY: + if lf: lf.update_current_generation(src/utils/clients.py (8)
27-28: Consider lazy initialization for Langfuse clientThe module-level Langfuse client initialization may cause issues in async environments or during testing. Consider lazy initialization to avoid potential race conditions.
Apply this approach for safer initialization:
-lf = get_client() +lf = None + +def get_lf_client(): + """Get or create the Langfuse client instance. + + Uses lazy initialization to avoid module-level side effects + and potential race conditions in async environments. + """ + global lf + if lf is None and settings.LANGFUSE_PUBLIC_KEY: + lf = get_client() + return lfThen update the
with_langfusedecorator to use this:def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]: @wraps(func) async def wrapper(*args: Any, **kwargs: Any) -> Any: - lf.start_as_current_generation(name="LLM Call") + client = get_lf_client() + if client: + client.start_as_current_generation(name="LLM Call") return await func(*args, **kwargs)
175-177: Add list of available providers to error messageThe error message could be more helpful by listing available providers.
Apply this diff:
client = CLIENTS.get(provider) if not client: - raise ValueError(f"Missing client for {provider}") + available_providers = list(CLIENTS.keys()) + raise ValueError( + f"Missing client for {provider}. Available providers: {available_providers}" + )
315-318: Anthropic JSON mode implementation is fragileAppending
"{"as an assistant message is a workaround that could lead to malformed JSON if the model doesn't complete the structure properly.Consider using Anthropic's system message for JSON mode or implementing robust JSON extraction:
if json_mode: - anthropic_params["messages"].append( - {"role": "assistant", "content": "{"} - ) + # Use system message to instruct JSON-only response + anthropic_params["system"] = ( + "You must respond with valid JSON only. " + "Do not include any text outside the JSON structure." + )
600-618: Gemini streaming uses synchronous iteratorThe Gemini streaming implementation uses a regular
forloop which will block the event loop. This should use async iteration.The google-genai library provides async streaming support. Update to use the async API:
if response_model is not None: - response_stream = client.models.generate_content_stream( + response_stream = await client.aio.models.generate_content_stream( model=params["model"], contents=prompt_text, config={ "response_mime_type": "application/json", "response_schema": response_model, }, ) else: - response_stream = client.models.generate_content_stream( + response_stream = await client.aio.models.generate_content_stream( model=params["model"], contents=prompt_text, config={ "response_mime_type": "application/json" if json_mode else None, }, ) final_chunk = None - for chunk in response_stream: + async for chunk in response_stream: if chunk.text: yield HonchoLLMCallStreamChunk(content=chunk.text) final_chunk = chunk
722-723: Avoid shadowing Python built-in functionsThe parameter names
minandmaxshadow Python's built-in functions.Apply this diff to use non-shadowing names:
def wait_exponential( - *, multiplier: float = 1.0, min: float | None = None, max: float | None = None + *, multiplier: float = 1.0, min_wait: float | None = None, max_wait: float | None = None ) -> Callable[[int], float]:And update the implementation:
- clamped_min = float(min) if min is not None else None - clamped_max = float(max) if max is not None else None + clamped_min = float(min_wait) if min_wait is not None else None + clamped_max = float(max_wait) if max_wait is not None else NoneAlso update the call site:
- wait=wait_exponential(multiplier=1, min=4, max=10), + wait=wait_exponential(multiplier=1, min_wait=4, max_wait=10),
751-757: Add error handling to Langfuse wrapperThe Langfuse wrapper should properly handle errors and ensure the generation is closed even on failure.
Apply this diff to ensure proper cleanup:
def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]: @wraps(func) async def wrapper(*args: Any, **kwargs: Any) -> Any: - lf.start_as_current_generation(name="LLM Call") - return await func(*args, **kwargs) + client = get_lf_client() # Use the lazy init function + if not client: + return await func(*args, **kwargs) + + generation = client.start_as_current_generation(name="LLM Call") + try: + result = await func(*args, **kwargs) + generation.end() + return result + except Exception as e: + generation.end(error=str(e)) + raise return wrapper
516-519: Anthropic streaming JSON mode uses same fragile approachThe streaming implementation has the same JSON mode issue as the non-streaming path.
Apply the same fix as suggested for the non-streaming path:
if json_mode: - anthropic_params["messages"].append( - {"role": "assistant", "content": "{"} - ) + # Use system message to instruct JSON-only response + anthropic_params["system"] = ( + "You must respond with valid JSON only. " + "Do not include any text outside the JSON structure." + )
65-67: Include provider value in error message for better debuggingThe error message should include the actual provider value to help with debugging configuration issues.
Apply this diff to improve the error message:
for provider_name, provider_value in SELECTED_PROVIDERS: if provider_value not in CLIENTS: - raise ValueError(f"Missing client for {provider_name}: {provider_value}") + available = list(CLIENTS.keys()) + raise ValueError( + f"Missing client for {provider_name}. " + f"Requested: '{provider_value}', Available: {available}" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
src/deriver/deriver.py(6 hunks)src/utils/clients.py(1 hunks)tests/bench/harness.py(1 hunks)tests/utils/__init__.py(1 hunks)tests/utils/test_clients.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright
Files:
tests/bench/harness.pytests/utils/__init__.pytests/utils/test_clients.pysrc/utils/clients.pysrc/deriver/deriver.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Write tests using pytest
Files:
tests/bench/harness.pytests/utils/__init__.pytests/utils/test_clients.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)
Files:
src/utils/clients.pysrc/deriver/deriver.py
🧬 Code graph analysis (3)
tests/utils/test_clients.py (1)
src/utils/clients.py (15)
HonchoLLMCallResponse(70-83)HonchoLLMCallStreamChunk(86-98)handle_streaming_response(481-645)honcho_llm_call(102-118)honcho_llm_call(122-137)honcho_llm_call(141-156)honcho_llm_call(159-221)honcho_llm_call_inner(225-237)honcho_llm_call_inner(241-253)honcho_llm_call_inner(257-269)honcho_llm_call_inner(272-478)retry(648-699)stop_after_attempt(702-718)wait_exponential(721-748)with_langfuse(751-757)
src/utils/clients.py (3)
src/dialectic/chat.py (1)
chat(152-347)src/routers/peers.py (1)
chat(156-201)tests/conftest.py (3)
client(168-189)async_wrapper(460-466)sync_wrapper(468-474)
src/deriver/deriver.py (3)
src/utils/shared_models.py (2)
ReasoningResponse(91-101)PeerCardQuery(196-205)src/deriver/prompts.py (2)
critical_analysis_prompt(12-99)peer_card_prompt(102-157)src/utils/clients.py (4)
honcho_llm_call(102-118)honcho_llm_call(122-137)honcho_llm_call(141-156)honcho_llm_call(159-221)
🔇 Additional comments (7)
tests/bench/harness.py (1)
330-331: LGTM!The addition of "google.genai.models" to the filter list correctly aligns with the dependency update from google-generativeai to google-genai package. This ensures both the old and new module paths are filtered from the output, maintaining backward compatibility during the transition.
tests/utils/test_clients.py (4)
1-27: LGTM! Comprehensive test coverage structureThe file header and imports are well-organized, covering all necessary dependencies for testing the LLM clients. The docstring clearly outlines the test scope across providers, response models, streaming, and error handling.
42-48: LGTM! Well-designed test modelThe
SampleTestModelPydantic model is appropriately designed for testing structured output, with a good mix of required and optional fields.
50-93: LGTM! Thorough test coverage for response modelsExcellent test coverage for both
HonchoLLMCallResponseandHonchoLLMCallStreamChunkmodels, including generic type testing and default value verification.
1107-1143: LGTM! Excellent edge case coverageGreat job testing edge cases including zero multiplier, very large attempt numbers, and mutable default handling for finish_reasons. These tests ensure robust behavior under unusual conditions.
src/utils/clients.py (2)
553-556: Use reasoning_effort and verbosity in OpenAI streamingThe streaming handler accepts
reasoning_effortandverbosityparameters but they're not being used in the OpenAI case.These parameters are actually being used correctly in lines 553-556 for GPT-5 models. The function signature is correct.
405-405: Consider consistent null checking for Gemini responsesWhile the code provides a fallback empty string for missing text, consider applying consistent null checking like other providers.
The current implementation with the fallback empty string is actually reasonable for Gemini, as it handles the case gracefully without throwing an error. The defensive coding approach here is appropriate.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
pyproject.toml (1)
26-26: Bump Langfuse to a version with tracing update APIsYour code uses get_client()/update_current_trace/update_current_generation; these aren’t in 3.3.2. Increase the minimum.
Apply:
- "langfuse>=3.3.2", + "langfuse>=3.4.0",Optionally verify minimal working version:
#!/bin/bash python - <<'PY' import importlib, pkg_resources import langfuse as lf print("langfuse:", pkg_resources.get_distribution("langfuse").version) print("has get_client:", hasattr(lf, "get_client")) print("has update_current_trace:", hasattr(lf, "update_current_trace")) print("has update_current_generation:", hasattr(lf, "update_current_generation")) PY
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
pyproject.toml(3 hunks)src/config.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright
Files:
src/config.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)
Files:
src/config.py
🔇 Additional comments (3)
pyproject.toml (2)
12-12: groq dependency: remove if unused or justify retentionrg found only a logger reference at src/deriver/main.py:43 — logging.getLogger("groq._base_client").setLevel(...); no direct imports/usages detected. If groq is not required at runtime, remove it; if it’s required via dynamic import/plugin, keep and document.
Apply if unused:
- "groq>=0.31.0",
97-97: Approve — trimmed allowedUntypedLibraries looks good; re-run type checkerVerification failed: "/bin/bash: line 2: uvx: command not found". Re-run the project's type checker and paste the output (examples: npx pyright --project pyrightconfig.json, pipx run pyright, or poetry run pyright).
src/config.py (1)
17-17: Type alias swap is correct — SupportedProviders includes required literals.SupportedProviders is defined as Literal["anthropic","openai","google","groq","custom"] (src/utils/types.py:3). src/config.py and src/utils/clients.py use SupportedProviders and set PROVIDER values to "google", "openai", "anthropic", and "groq". No exact 'Providers' alias or direct imports of Providers were found in the repo.
Rajat-Ahuja1997
left a comment
There was a problem hiding this comment.
looks great, just one comment about the retry logic
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
pyproject.toml(3 hunks)src/utils/clients.py(1 hunks)tests/utils/test_clients.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright
Files:
tests/utils/test_clients.pysrc/utils/clients.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Write tests using pytest
Files:
tests/utils/test_clients.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)
Files:
src/utils/clients.py
🧬 Code graph analysis (2)
tests/utils/test_clients.py (1)
src/utils/clients.py (12)
HonchoLLMCallResponse(67-80)HonchoLLMCallStreamChunk(83-95)handle_streaming_response(478-642)honcho_llm_call(99-115)honcho_llm_call(119-134)honcho_llm_call(138-153)honcho_llm_call(156-218)honcho_llm_call_inner(222-234)honcho_llm_call_inner(238-250)honcho_llm_call_inner(254-266)honcho_llm_call_inner(269-475)with_langfuse(645-651)
src/utils/clients.py (2)
src/dialectic/chat.py (1)
chat(152-347)src/routers/peers.py (1)
chat(156-201)
🔇 Additional comments (12)
tests/utils/test_clients.py (4)
639-646: Future-proof Google streaming test to async APIProd code should switch to async Gemini streaming (see src/utils/clients.py comment). When you do, update this test to patch
client.aio.models.generate_content_streamand iterate withasync for.Would you like me to push the matching test changes once the impl flips to async?
253-255: Fix async streaming mock: use real async iterator
__aiter__is set to a sync iterator;async forwill fail at runtime.Apply:
- # Set up the async iterator (same as working test_streaming_call) - mock_stream.__aiter__.return_value = iter(mock_chunks) + # Set up the async iterator + async def async_iter(): + for c in mock_chunks: + yield c + mock_stream.__aiter__.return_value = async_iter()
932-934: Same issue: make__aiter__asyncRepeat of the above in main streaming test; needs the same fix.
- mock_stream.__aenter__.return_value = mock_stream - mock_stream.__aiter__.return_value = iter(mock_chunks) + mock_stream.__aenter__.return_value = mock_stream + async def async_iter(): + for c in mock_chunks: + yield c + mock_stream.__aiter__.return_value = async_iter()
499-508: Optional: simplify by using AsyncMock directlyCleaner and consistent with other tests.
- async def mock_create(**_kwargs: Any): - return async_chunk_iterator() - - mock_client.chat.completions.create = mock_create + mock_client.chat.completions.create = AsyncMock(return_value=async_chunk_iterator())src/utils/clients.py (6)
172-175: Improve missing‑client error with available providersFaster diagnosis in prod/test.
- if not client: - raise ValueError(f"Missing client for {provider}") + if not client: + available = sorted(CLIENTS.keys()) + raise ValueError(f"Missing client for {provider}. Available: {available}")
312-316: Anthropic JSON “{” prompt hack — prefer system instructionMore robust than seeding with a dangling brace.
- if json_mode: - anthropic_params["messages"].append( - {"role": "assistant", "content": "{"} - ) + if json_mode: + anthropic_params["system"] = "Respond with valid JSON only."
645-651: End Langfuse generation on success/errorEnsure cleanup and useful error context.
def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]: @wraps(func) async def wrapper(*args: Any, **kwargs: Any) -> Any: - lf.start_as_current_generation(name="LLM Call") - return await func(*args, **kwargs) + generation = lf.start_as_current_generation(name="LLM Call") if lf else None + try: + result = await func(*args, **kwargs) + if generation: + generation.end() + return result + except Exception as e: + if generation: + generation.end(error=str(e)) + raise return wrapper
24-25: Guard Langfuse client init to avoid import‑time failures
get_client()at import time can raise when env keys aren’t set. Only initialize when Langfuse is enabled.-lf = get_client() +lf = get_client() if settings.LANGFUSE_PUBLIC_KEY else None
401-419: Optional: fail fast on empty Gemini textCurrent fallback to
""can hide provider issues. Consider explicit error when bothtextand candidates are missing.- text_content = gemini_response.text if gemini_response.text else "" + if gemini_response.text is None and not getattr(gemini_response, "candidates", []): + raise ValueError("No text content in Gemini response") + text_content = gemini_response.text or ""
62-64: Defer provider validation to app startup (avoid import-time ValueError)Import-time raise makes tests brittle when credentials/config are absent; validate at startup and raise a clearer runtime/config error.
-for provider_name, provider_value in SELECTED_PROVIDERS: - if provider_value not in CLIENTS: - raise ValueError(f"Missing client for {provider_name}: {provider_value}") +def validate_clients() -> None: + missing = [(n, v) for n, v in SELECTED_PROVIDERS if v not in CLIENTS] + if missing: + detail = ", ".join(f"{n}={v}" for n, v in missing) + raise RuntimeError(f"Missing LLM clients: {detail}") +# Call validate_clients() during app startup, not at import time.Confirm where to call validate_clients() in this repo’s startup lifecycle — e.g. FastAPI lifespan handler, Flask app factory after config, Django AppConfig.ready, or worker startup — so the call can be placed in the proper startup hook.
pyproject.toml (2)
98-98: Confirm type-stub noise; add exceptions only if flagged by basedpyrightbasedpyright wasn't available in the verification environment (command not found) and the rg invocation failed — run locally:
basedpyright -p pyproject.tomland search for "Missing type stubs". If it reports google-genai, groq, or anthropic, add them to allowedUntypedLibraries in pyproject.toml (currently ["langfuse"] at ~line 98); otherwise keep the list minimal.
27-27: Do not bump the Langfuse requirement — v3 already includes the tracing APIs
update_current_trace and start_as_current_generation were added in the Langfuse Python SDK v3 (OTel-based) release (June 5, 2025), so keep "langfuse>=3.3.2".Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/utils/clients.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright
Files:
src/utils/clients.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)
Files:
src/utils/clients.py
🧬 Code graph analysis (1)
src/utils/clients.py (2)
src/dialectic/chat.py (1)
chat(152-347)src/routers/peers.py (1)
chat(156-201)
🪛 GitHub Actions: FastAPI Tests with PostgreSQL and uv
src/utils/clients.py
[error] 595-595: TypeError: object Mock can't be used in 'await' expression while awaiting client.aio.models.generate_content_stream in handle_streaming_response (used by tests/utils/test_clients.py::TestGoogleClient::test_google_streaming). The test run (uv run pytest -x) failed due to improper async mocking; consider using AsyncMock or enabling proper async mocks for the genai client in tests.
🔇 Additional comments (9)
src/utils/clients.py (9)
83-95: Good: streaming chunk model avoids mutable defaults.Using
Field(default_factory=list)is correct and pyright-friendly.
47-50: Instantiate Gemini client via genai.Client (ensures match/case and type checks).Use the top-level constructor to align with your
match client: case genai.Client().Apply this diff:
- google = genai.client.Client(api_key=settings.LLM.GEMINI_API_KEY) + google = genai.Client(api_key=settings.LLM.GEMINI_API_KEY)
24-25: Make Langfuse client lazy and always end generations (even on error).Avoid module-level initialization in async contexts and ensure proper cleanup.
Apply this diff:
-lf = get_client() +lf_instance: Any | None = None + +def get_lf() -> Any: + """Lazily initialize and return the Langfuse client.""" + global lf_instance + if lf_instance is None: + lf_instance = get_client() + return lf_instance @@ def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]: @wraps(func) async def wrapper(*args: Any, **kwargs: Any) -> Any: - lf.start_as_current_generation(name="LLM Call") - return await func(*args, **kwargs) + generation = get_lf().start_as_current_generation(name="LLM Call") + try: + result = await func(*args, **kwargs) + generation.end() + return result + except Exception as e: + generation.end(error=str(e)) + raise return wrapperAlso applies to: 651-657
172-175: Improve missing-client error message with available providers.Aids debugging misconfigurations.
Apply this diff:
- if not client: - raise ValueError(f"Missing client for {provider}") + if not client: + available = sorted(CLIENTS.keys()) + raise ValueError( + f"Missing client for {provider}. Available providers: {available}" + )
318-322: Replace Anthropic JSON-mode “{” workaround with system instruction.Appending a lone brace is brittle and can yield malformed JSON.
Apply this diff:
- if json_mode: - anthropic_params["messages"].append( - {"role": "assistant", "content": "{"} - ) + if json_mode: + anthropic_params["system"] = "Respond with valid JSON only."
407-425: Harden Gemini non-streaming: fail loudly on missing text.Silent fallback to "" can mask upstream issues.
Apply this diff:
- # Safely extract response data - text_content = gemini_response.text if gemini_response.text else "" + # Safely extract response data + if gemini_response.text is None: + raise ValueError("No text content in Gemini response") + text_content = gemini_response.text
98-115: Unify overload typing forstreamparam for readability.Use consistent
Literal[False]/Literal[True]across overloads and group OpenAI-only params together.Would you like me to push a small typing-only diff aligning overloads (no behavior change)?
Also applies to: 118-135, 137-154, 221-235, 237-251, 253-267
528-545: Guard Anthropic streaming with try/except to avoid resource leaks and emit done chunk on error.Ensure cleanup and consistent stream termination.
Apply this diff:
- async with client.messages.stream(**anthropic_params) as anthropic_stream: - async for chunk in anthropic_stream: - if ( - chunk.type == "content_block_delta" - and hasattr(chunk, "delta") - and hasattr(chunk.delta, "text") - ): - text_content = getattr(chunk.delta, "text", "") - yield HonchoLLMCallStreamChunk(content=text_content) - final_message = await anthropic_stream.get_final_message() - yield HonchoLLMCallStreamChunk( - content="", - is_done=True, - finish_reasons=[final_message.stop_reason] - if final_message.stop_reason - else [], - ) + try: + async with client.messages.stream(**anthropic_params) as anthropic_stream: + async for chunk in anthropic_stream: + if ( + chunk.type == "content_block_delta" + and hasattr(chunk, "delta") + and hasattr(chunk.delta, "text") + ): + text_content = getattr(chunk.delta, "text", "") + yield HonchoLLMCallStreamChunk(content=text_content) + final_message = await anthropic_stream.get_final_message() + yield HonchoLLMCallStreamChunk( + content="", + is_done=True, + finish_reasons=[final_message.stop_reason] + if final_message.stop_reason + else [], + ) + except Exception: + yield HonchoLLMCallStreamChunk( + content="", is_done=True, finish_reasons=["error"] + ) + raise
582-621: Fix Gemini streaming: don't await stream factory; handle both awaitable and iterator (unblocks tests).Awaiting
client.aio.models.generate_content_stream(...)causes TypeError with mocked clients and is unnecessary if the SDK returns an async iterator. Handle both cases and iterate withasync for.Apply this diff:
+import inspect @@ case genai.Client(): prompt_text = params["messages"][0]["content"] if params["messages"] else "" if response_model is not None: - response_stream = await client.aio.models.generate_content_stream( - model=params["model"], - contents=prompt_text, - config={ - "response_mime_type": "application/json", - "response_schema": response_model, - }, - ) + _call = client.aio.models.generate_content_stream( + model=params["model"], + contents=prompt_text, + config={ + "response_mime_type": "application/json", + "response_schema": response_model, + }, + ) + response_stream = await _call if inspect.isawaitable(_call) else _call else: - response_stream = await client.aio.models.generate_content_stream( - model=params["model"], - contents=prompt_text, - config={ - "response_mime_type": "application/json" if json_mode else None, - }, - ) + _call = client.aio.models.generate_content_stream( + model=params["model"], + contents=prompt_text, + config={ + "response_mime_type": "application/json" if json_mode else None, + }, + ) + response_stream = await _call if inspect.isawaitable(_call) else _call final_chunk = None async for chunk in response_stream: if chunk.text: yield HonchoLLMCallStreamChunk(content=chunk.text) final_chunk = chunk
* feat: add optional JWT and webhook secrets to honcho instance creation * chore: ignore spurious warnings * feat: add response format if using gpt-5 model family * feat: add response models to all apis except anthropic * fix: raise NotImplementedError for response models in AsyncAnthropic client * chore: address review * chore: add tests, cleanup * fix: use tenacity instead of custom fns * fix: pass all params to streaming, nonblocking streaming * chore: fix test mock
Summary by CodeRabbit
New Features
Improvements
Tests
Chores