Skip to content

refactor: replace mirascope with handrolled client#202

Merged
VVoruganti merged 13 commits into
mainfrom
ben/replace-mirascope
Sep 24, 2025
Merged

refactor: replace mirascope with handrolled client#202
VVoruganti merged 13 commits into
mainfrom
ben/replace-mirascope

Conversation

@dr-frmr

@dr-frmr dr-frmr commented Sep 3, 2025

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features

    • Unified LLM call surface with support for Google GenAI and Groq, plus streaming responses.
  • Improvements

    • Streaming chat yields live async chunks; short/long summaries produce clearer, more inclusive outputs.
    • Better reliability with configurable retries and enhanced tracing/observability.
  • Tests

    • Added comprehensive provider and streaming tests.
  • Chores

    • Updated dependencies and typing/configuration.

@dr-frmr dr-frmr marked this pull request as draft September 3, 2025 16:26
@plastic-labs plastic-labs deleted a comment from coderabbitai Bot Sep 3, 2025
@coderabbitai

coderabbitai Bot commented Sep 3, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Replaces mirascope decorator-driven LLM calls with an explicit honcho_llm_call surface, centralizes provider clients in CLIENTS, migrates Langfuse usage to module-level get_client(), introduces SupportedProviders and groq/tenacity deps, updates streaming to AsyncIterator, and extends tests for the new clients and streaming behavior.

Changes

Cohort / File(s) Summary
Dependencies & config
pyproject.toml
Added groq>=0.31.0, tenacity>=9.1.2; replaced mirascope[...] with langfuse>=3.3.2; replaced google-generativeai with google-genai>=1.32.0; narrowed allowedUntypedLibraries to ["langfuse"].
Provider typing & config
src/utils/types.py, src/config.py
Replaced Providers alias with SupportedProviders literal and updated config type hints to use SupportedProviders.
Unified LLM client surface
src/utils/clients.py
Added CLIENTS registry and SELECTED_PROVIDERS; introduced HonchoLLMCallResponse and HonchoLLMCallStreamChunk; implemented honcho_llm_call (sync + streaming), provider dispatch, streaming iterator handling, retry helpers, and langfuse wiring.
Langfuse client migration
src/deriver/consumer.py, src/utils/embedding_store.py, src/utils/logging.py, src/dialectic/utils.py
Replaced decorator/context usage with from langfuse import get_client and module-level lf = get_client(); switched calls to lf.update_current_*.
Deriver: inline LLM calls & tracing
src/deriver/deriver.py, src/deriver/prompts.py, src/deriver/consumer.py
Removed prompt_template decorators; inlined prompt construction; replaced decorator-based calls with direct honcho_llm_call invocations; functions now consume/return raw response.content strings and use lf for tracing.
Dialectic: streaming & call refactor
src/dialectic/chat.py, src/dialectic/prompts.py, src/dialectic/utils.py
Removed prompt_template decorators; replaced decorated calls with honcho_llm_call (streaming supported); added module lf client; chat() may return str or AsyncIterator[HonchoLLMCallStreamChunk].
Summarizer: prompt & return changes
src/utils/summarizer.py
Removed decorators; build prompts with cleandoc; call honcho_llm_call directly; functions return HonchoLLMCallResponse[str]; updated prompt text, token budgeting, and metric extraction.
Routers: streaming adjustments
src/routers/peers.py
Switched from mirascope Stream to AsyncIterator; iterate async for chunk in stream and yield chunk.content; adjusted imports.
Typing/constants minor
src/crud/representation.py, src/config.py
Annotated GLOBAL_REPRESENTATION_COLLECTION_NAME: Final[str]; propagated SupportedProviders in config.
Embedding store & logging observe
src/utils/embedding_store.py, src/utils/logging.py
Replaced langfuse decorator/context imports with get_client() usage (lf = get_client()); changed observe import source.
Tests & benches
tests/conftest.py, tests/bench/peer_card_bench.py, tests/bench/harness.py, tests/utils/*
Renamed/expanded fixtures to mock LLM call sites; removed langfuse_context mock; inlined honcho_llm_call in bench helpers; added tests/utils/test_clients.py and tests/utils/__init__.py; adjusted harness filter patterns.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant API as Router (/peers/chat)
  participant Dial as dialectic.chat
  participant LF as Langfuse (lf)
  participant Clients as honcho_llm_call
  participant Provider as LLM Provider

  API->>Dial: chat(query, stream=True)
  Dial->>LF: update_current_trace(metadata: dialectic_model)
  Dial->>Clients: honcho_llm_call(prompt, provider, model, stream=True)
  activate Clients
  Clients->>Provider: create streaming completions
  Provider-->>Clients: stream chunks
  Clients-->>Dial: AsyncIterator<HonchoLLMCallStreamChunk>
  deactivate Clients
  loop stream
    Dial-->>API: chunk.content
  end
Loading
sequenceDiagram
  autonumber
  participant Deriver as deriver.derive_new_insights
  participant Prompts as prompts.critical_analysis_prompt
  participant Clients as honcho_llm_call
  participant LF as Langfuse (lf)
  participant Provider as LLM Provider

  Deriver->>Prompts: build prompt(...)
  Prompts-->>Deriver: prompt (string)
  Deriver->>Clients: honcho_llm_call(prompt, response_model?, json_mode)
  activate Clients
  Clients->>Provider: provider-specific call
  Provider-->>Clients: response (text/structured)
  Clients-->>Deriver: response.content
  deactivate Clients
  Deriver->>LF: update_current_generation(input=prompt, output=response.content)
  Deriver-->>Deriver: parse JSON / extract thinking
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • Vineeth/dev 1027 #177 — Modifies same deriver functions (derive_new_insights, process_representation_task); likely overlapping changes to LLM call/tracing.
  • Peer Cards #180 — Overlaps on deriver/prompts/clients and Langfuse integration migration from decorators to explicit clients.
  • Summarizer improvements #173 — Related changes in summarizer paths and token/summary handling.

Suggested reviewers

  • danibalcells
  • VVoruganti

Poem

I’m a rabbit hopping through the code so bright,
I swapped decorators for a honcho light.
Streams now trickle, clients align,
Langfuse hums softly, traces shine.
Tests hop forward — nibble, ship, delight. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely describes the main intent of the changeset—replacing the external mirascope dependency with a hand-rolled client—and matches the diffs that remove mirascope decorators and add a central clients implementation; it is specific, focused, and readable for someone scanning PR history.
Docstring Coverage ✅ Passed Docstring coverage is 86.84% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ben/replace-mirascope

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a472b21 and 72411e8.

📒 Files selected for processing (1)
  • tests/utils/test_clients.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright

Files:

  • tests/utils/test_clients.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Write tests using pytest

Files:

  • tests/utils/test_clients.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-27T14:54:38.262Z
Learning: Abstract LLM providers behind a model client to support multiple providers
🧬 Code graph analysis (1)
tests/utils/test_clients.py (1)
src/utils/clients.py (12)
  • HonchoLLMCallResponse (67-80)
  • HonchoLLMCallStreamChunk (83-95)
  • handle_streaming_response (484-648)
  • honcho_llm_call (99-115)
  • honcho_llm_call (119-134)
  • honcho_llm_call (138-153)
  • honcho_llm_call (156-218)
  • honcho_llm_call_inner (222-234)
  • honcho_llm_call_inner (238-250)
  • honcho_llm_call_inner (254-266)
  • honcho_llm_call_inner (269-481)
  • with_langfuse (651-657)
🔇 Additional comments (7)
tests/utils/test_clients.py (7)

951-960: Make test hermetic wrt Langfuse settings

If LANGFUSE_PUBLIC_KEY is set in CI, honcho_llm_call will wrap with Langfuse and touch lf. Patch it off in this test.

-        with patch.dict(CLIENTS, {"anthropic": mock_client}):
+        with (
+            patch.dict(CLIENTS, {"anthropic": mock_client}),
+            patch("src.utils.clients.settings.LANGFUSE_PUBLIC_KEY", ""),
+        ):

979-986: Make retry-disabled test hermetic wrt Langfuse settings

Same rationale as above.

-        with patch.dict(CLIENTS, {"anthropic": mock_client}):
+        with (
+            patch.dict(CLIENTS, {"anthropic": mock_client}),
+            patch("src.utils.clients.settings.LANGFUSE_PUBLIC_KEY", ""),
+        ):

253-255: Fix async iteration: use an async generator for aiter

iter(mock_chunks) is synchronous and will fail under async for. Return an async iterator.

-        # Set up the async iterator (same as working test_streaming_call)
-        mock_stream.__aiter__.return_value = iter(mock_chunks)
+        # Set up the async iterator
+        async def async_iter():
+            for chunk in mock_chunks:
+                yield chunk
+        mock_stream.__aiter__.return_value = async_iter()

499-508: Optional: simplify OpenAI stream mocking with AsyncMock

Use a single AsyncMock that yields chunks; reduces nested functions.

-        # Create async iterator
-        async def async_chunk_iterator():
-            for chunk in mock_chunks:
-                yield chunk
-
-        # OpenAI's create method returns an awaitable that resolves to an async iterator
-        async def mock_create(**_kwargs: Any):
-            return async_chunk_iterator()
-
-        mock_client.chat.completions.create = mock_create
+        # Create async iterator mock directly
+        mock_stream = AsyncMock()
+        async def async_iter():
+            for chunk in mock_chunks:
+                yield chunk
+        mock_stream.__aiter__.return_value = async_iter()
+        mock_client.chat.completions.create = AsyncMock(return_value=mock_stream)

348-366: Add streaming coverage for GPT‑5 params (reasoning_effort/verbosity)

Prevent regressions by asserting GPT‑5 streaming passes these params.

+    async def test_openai_gpt5_streaming_parameters(self):
+        """Test GPT-5 params applied in streaming path"""
+        from openai import AsyncOpenAI
+
+        mock_client = AsyncMock(spec=AsyncOpenAI)
+        mock_chunks = [
+            ChatCompletionChunk(
+                id="id1",
+                object="chat.completion.chunk",
+                created=1,
+                model="gpt-5-turbo",
+                choices=[ChunkChoice(index=0, delta=ChoiceDelta(content="hi"), finish_reason=None)],
+            ),
+            ChatCompletionChunk(
+                id="id2",
+                object="chat.completion.chunk",
+                created=2,
+                model="gpt-5-turbo",
+                choices=[ChunkChoice(index=0, delta=ChoiceDelta(content=None), finish_reason="stop")],
+            ),
+        ]
+        mock_stream = AsyncMock()
+        async def async_iter():
+            for c in mock_chunks:
+                yield c
+        mock_stream.__aiter__.return_value = async_iter()
+        mock_client.chat.completions.create = AsyncMock(return_value=mock_stream)
+
+        with patch.dict(CLIENTS, {"openai": mock_client}):
+            chunks = []
+            async for chunk in handle_streaming_response(
+                client=mock_client,
+                params={
+                    "model": "gpt-5-turbo",
+                    "max_tokens": 123,
+                    "messages": [{"role": "user", "content": "Hello"}],
+                    "stream": True,
+                },
+                json_mode=False,
+                thinking_budget_tokens=None,
+                response_model=None,
+                reasoning_effort="high",
+                verbosity="medium",
+            ):
+                chunks.append(chunk)
+
+        mock_client.chat.completions.create.assert_called_once()
+        kwargs = mock_client.chat.completions.create.call_args.kwargs
+        assert kwargs["max_completion_tokens"] == 123
+        assert kwargs["reasoning_effort"] == "high"
+        assert kwargs["verbosity"] == "medium"

899-902: Align Groq streaming mocking pattern with OpenAI test

Use the same async generator + function style for consistency.

-        # Mock the create method to return the async generator when awaited
-        mock_client.chat.completions.create = AsyncMock(
-            return_value=async_chunk_iterator()
-        )
+        # Mock the create method to return the async generator when awaited
+        async def mock_create(**_kwargs: Any):
+            return async_chunk_iterator()
+        mock_client.chat.completions.create = mock_create

942-944: Fix async iteration in main streaming call test

iter(mock_chunks) is sync; return an async iterator instead.

-        mock_stream.__aiter__.return_value = iter(mock_chunks)
+        async def async_iter():
+            for c in mock_chunks:
+                yield c
+        mock_stream.__aiter__.return_value = async_iter()

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.13.1)
tests/utils/test_clients.py

�[1;31mruff failed�[0m
�[1mCause:�[0m Failed to load configuration /ruff.toml
�[1mCause:�[0m Failed to parse /ruff.toml
�[1mCause:�[0m TOML parse error at line 26, column 3
|
26 | "RSE100", # Use of assert detected
| ^^^^^^^^
Unknown rule selector: RSE100


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 26

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (21)
src/utils/logging.py (4)

34-41: Guard the langfuse import; avoid crashing when key is set but package is missing.

If LANGFUSE_PUBLIC_KEY is present but langfuse isn't installed/available, importing will raise and break function definition.

-    if settings.LANGFUSE_PUBLIC_KEY:
-        # Import here to avoid circular imports and only import when needed
-        from langfuse import observe  # pyright: ignore
-
-        return observe()(func)
+    if settings.LANGFUSE_PUBLIC_KEY:
+        # Import here to avoid circular imports and only import when needed
+        try:
+            from langfuse import observe  # pyright: ignore
+        except Exception as e:  # ImportError or runtime import issues
+            logger.warning("Langfuse enabled but import failed: %s", e)
+            return func
+        return observe()(func)

Add a module logger (outside this range):

from logging import getLogger

logger = getLogger(__name__)

244-255: Fix KeyError and docstring mismatch in log_performance_metrics.

  • Calling clear() on a missing task_name key raises KeyError.
  • Docstring refers to a dict, but the param is a list of tuples.
-    """
-    Log performance metrics in a clean table.
-    Args:
-        metrics: Dictionary of metric names and (value, unit) tuples
-        title: Table title
-    """
-    if not accumulated_metrics.get(task_name) and not metrics:
-        return
-    if metrics is None:
-        metrics = []
-    metrics = accumulated_metrics.get(task_name, []) + metrics
-    accumulated_metrics[task_name].clear()
+    """
+    Log performance metrics in a clean table.
+    Args:
+        task_name: Logical task key for accumulated metrics
+        metrics: Optional list of (metric, value, unit) tuples to print now
+        title: Table title
+    """
+    if metrics is None:
+        metrics = []
+    accumulated = accumulated_metrics.pop(task_name, [])
+    if not accumulated and not metrics:
+        return
+    metrics = accumulated + metrics

266-273: Avoid TypeError when formatting non-numeric values.

Value can be str per annotation; formatting with :0f/:3f would raise.

-    for metric, value, unit in metrics:
-        if unit == "ms":
-            formatted_value = f"{value:.0f}"
-        elif unit == "s":
-            formatted_value = f"{value:.3f}"
-        else:
-            formatted_value = str(value)
+    for metric, value, unit in metrics:
+        if isinstance(value, (int, float)):
+            if unit == "ms":
+                formatted_value = f"{value:.0f}"
+            elif unit == "s":
+                formatted_value = f"{value:.3f}"
+            else:
+                formatted_value = str(value)
+        else:
+            formatted_value = str(value)

192-199: Consider integrating rich output with standard logging.

console.print writes to stdout and bypasses structured logging/context. For production observability, prefer logging handlers (e.g., RichHandler) or gate these prints behind a dev flag.

src/dialectic/prompts.py (4)

100-135: Tighten JSON validity instructions (reduce LLM formatting drift).

Explicitly require strict JSON to lower parse failures.

-Respond with 3-5 search queries as a JSON object with a "queries" field containing an array of strings. Each query should target different aspects or reasoning levels to maximize retrieval coverage.
+Respond with 3-5 search queries as a strict JSON object with a "queries" field containing an array of strings. Each query should target different aspects or reasoning levels to maximize retrieval coverage.
@@
-Format: `{{"queries": ["query1", "query2", "query3"]}}`
-
-No markdown, no explanations, just the JSON object.
+Format: `{{"queries": ["query1", "query2", "query3"]}}`
+
+Constraints:
+- Strict JSON only (double-quoted keys/strings, no comments, no trailing commas).
+- No markdown, no explanations—just the JSON object.

27-46: Handle None safely and complete the docstring for dialectic_prompt.

  • recent_conversation_history and working_representation can be None; current output will render "None" in the prompt.
  • Docstring is missing several parameters and should follow Google style per repo guidelines.

Example updated function (apply adaptively):

def dialectic_prompt(
    query: str,
    working_representation: str | None,
    recent_conversation_history: str | None,
    additional_context: str | None,
    peer_name: str,
    peer_card: list[str] | None,
    target_name: str | None = None,
    target_peer_card: list[str] | None = None,
) -> str:
    """
    Generate the main dialectic prompt for context synthesis.

    Args:
        query: Application question/request about the user.
        working_representation: Current session conclusions (may be None).
        recent_conversation_history: Recent dialog context (may be None).
        additional_context: Historical/global conclusions (may be None).
        peer_name: Name of the user/peer being queried.
        peer_card: Biographical lines for the user (may be None).
        target_name: Optional name of the target peer.
        target_peer_card: Optional biographical lines for the target peer.

    Returns:
        Formatted prompt string for the dialectic model.
    """
    def _join(lines: list[str] | None) -> str:
        return "\n".join(lines) if lines else "(none)"

    if target_name:
        query_target = f"""The query is about user {peer_name}'s understanding of {target_name}.

The user's known biographical information:
{_join(peer_card)}

The target's known biographical information:
{_join(target_peer_card)}

If the user's name or nickname is known, exclusively refer to them by that name.
If the target's name or nickname is known, exclusively refer to them by that name.
"""
    else:
        query_target = f"""The query is about user {peer_name}.

The user's known biographical information:
{_join(peer_card)}

If the user's name or nickname is known, exclusively refer to them by that name.
"""

    return c(
        f"""
... (unchanged preamble) ...
{query_target}

<recent_conversation_history>
{(recent_conversation_history or "").strip()}
</recent_conversation_history>

<query>{query}</query>
<working_representation>{(working_representation or "").strip()}</working_representation>
{f"<global_context>{additional_context}</global_context>" if additional_context else ""}"""
    )

Also applies to: 76-83


31-35: Use '\n' instead of chr(10) for readability.

Semantically identical; clearer to readers.

-{chr(10).join(peer_card) if peer_card else "(none)"}
+{\n.join(peer_card) if peer_card else "(none)"}
@@
-{chr(10).join(target_peer_card) if target_peer_card else "(none)"}
+{\n.join(target_peer_card) if target_peer_card else "(none)"}

Also applies to: 43-46


86-97: Docstring params out of sync for query_generation_prompt.

Mentions peer_name/target_name that aren’t parameters. Align with function signature.

src/utils/types.py (1)

1-4: Declare the alias explicitly with TypeAlias (pyright-friendly).

Improves clarity for basedpyright and prevents misinterpretation as a value.

-from typing import Literal
+from typing import Literal, TypeAlias
@@
-SupportedProviders = Literal["anthropic", "openai", "google", "groq", "custom"]
+SupportedProviders: TypeAlias = Literal["anthropic", "openai", "google", "groq", "custom"]
src/utils/embedding_store.py (2)

64-64: Consistency nit: traced scope label typo.

"ed_embedding_store.save_unified_observations" looks accidental; align with file name for consistent tracing.

-        async with tracked_db("ed_embedding_store.save_unified_observations") as db:
+        async with tracked_db("embedding_store.save_unified_observations") as db:

97-105: Centralize direct OpenAI embedding calls
Remove the two direct embedding_client.client.embeddings.create invocations in src/utils/embedding_store.py (lines 100 and 428) and route both loops through a single helper in src/embedding_client (e.g. batch_embed_texts(texts: list[str]) -> list[list[float]]) to ensure consistent model/version handling, retries/timeouts, and testability.

src/deriver/consumer.py (1)

51-56: Wrap Langfuse trace calls in try/except and use precise ignores

  • In both summary (consumer.py 51–56) and representation (66–71) branches, keep the if settings.LANGFUSE_PUBLIC_KEY check, but wrap the lf.update_current_trace call in a try/except Exception and log failures at debug level.
  • Replace the broad # type: ignore with # type: ignore[attr-defined] for both calls.
src/routers/peers.py (2)

195-197: Improve error logging and avoid leaking internal messages.

Use logger.exception for stack traces and return a generic error to clients.

-        except Exception as e:
-            logger.error(f"Error in stream: {str(e)}")
-            raise HTTPException(status_code=500, detail=str(e)) from e
+        except Exception as e:
+            logger.exception("Error in stream", extra={"workspace_id": workspace_id, "peer_id": peer_id})
+            raise HTTPException(status_code=500, detail="Stream failed") from e

199-201: Add SSE headers to reduce buffering.

Helps proxies and clients stream promptly.

-    return StreamingResponse(
-        content=parse_stream(), media_type="text/event-stream", status_code=200
-    )
+    return StreamingResponse(
+        content=parse_stream(),
+        media_type="text/event-stream",
+        status_code=200,
+        headers={
+            "Cache-Control": "no-cache",
+            "X-Accel-Buffering": "no",
+            "Connection": "keep-alive",
+        },
+    )
src/dialectic/utils.py (4)

80-97: Harden Langfuse updates (JSON-serializable payload + isolation).

  • Ensure payloads are JSON-serializable (use model_dump()).
  • Don’t let telemetry failures affect product flow.
-        if settings.LANGFUSE_PUBLIC_KEY:
-            lf.update_current_generation(
+        if settings.LANGFUSE_PUBLIC_KEY:
+            lf = get_client()
+            try:
+                lf.update_current_generation(
                     input={
                         "query": query,
                         "include_premises": include_premises,
                     },
                     output={
-                    "search_queries": search_queries_result,
+                    "search_queries": search_queries_result.model_dump(),
                     "all_results": all_results,
                     "unique_observations": unique_observations,
                     },
-            )
-
-            lf.update_current_trace(
-                metadata={
-                    "search_queries": search_queries,
-                    "observations_retrieved": unique_observations,
-                }
-            )
+                )
+                lf.update_current_trace(
+                    metadata={
+                        "search_queries": search_queries,
+                        "observations_retrieved_count": len(unique_observations),
+                    }
+                )
+            except Exception:
+                logger.debug("Langfuse update skipped due to client error", exc_info=True)

65-72: Avoid duplicate query entry.

Guard against the original query already appearing in the model output.

-        # Include the original query in the search queries
-        search_queries.append(query)
+        if query not in search_queries:
+            search_queries.append(query)

232-247: Tighten LLM call: enable JSON mode + add tracking + (optional) provider cast.

  • json_mode=True increases robustness for structured output.
  • track_name adds traceability.
  • If basedpyright complains about provider type, cast to SupportedProviders.
 async def generate_semantic_queries(
     query: str, target_peer_name: str
 ) -> SemanticQueries:
     """Generate semantic search queries for observation retrieval."""
     prompt = query_generation_prompt(query, target_peer_name)
     response = await honcho_llm_call(
-        provider=settings.DIALECTIC.QUERY_GENERATION_PROVIDER,
+        provider=settings.DIALECTIC.QUERY_GENERATION_PROVIDER,
         model=settings.DIALECTIC.QUERY_GENERATION_MODEL,
         prompt=prompt,
         max_tokens=settings.LLM.DEFAULT_MAX_TOKENS,
+        track_name="dialectic.generate_semantic_queries",
         response_model=SemanticQueries,
+        json_mode=True,
         enable_retry=True,
         retry_attempts=3,
     )
     return response.content

If strict typing is desired, also update imports at the top:

from typing import Any, cast
from src.utils.clients import honcho_llm_call, SupportedProviders

and cast provider:

provider=cast(SupportedProviders, settings.DIALECTIC.QUERY_GENERATION_PROVIDER),

35-48: Docstring param completeness.

Document target_peer_name in get_observations’ docstring for clarity.

src/dialectic/chat.py (1)

343-346: Nit: remove stale comment and unnecessary str() cast

response is already a str; the Anthropic wrapper note is obsolete.

-    log_performance_metrics(f"dialectic_chat_{dialectic_chat_uuid}")
-    # Convert AnthropicCallResponse to string for compatibility
-    return str(response)
+    log_performance_metrics(f"dialectic_chat_{dialectic_chat_uuid}")
+    return response
src/utils/summarizer.py (1)

350-361: Fix metrics: avoid non-numeric value and guard empty messages

accumulate_metric likely expects numeric values; passing f"{summary_tokens} (est.)" risks runtime issues. Also, deriving metric keys from messages[-1] can crash on empty lists.

-    response: HonchoLLMCallResponse[str] | None = None
+    response: HonchoLLMCallResponse[str] | None = None
@@
-        summary_tokens = response.output_tokens
+        summary_tokens = response.output_tokens
@@
-    accumulate_metric(
-        f"summary_{messages[-1].workspace_name}_{messages[-1].id}",
+    workspace = messages[-1].workspace_name if messages else "unknown"
+    last_id = messages[-1].id if messages else 0
+    accumulate_metric(
+        f"summary_{workspace}_{last_id}",
         f"{summary_type.name}_summary_size",
-        response.output_tokens if response else f"{summary_tokens} (est.)",
+        response.output_tokens if response else summary_tokens,
         "tokens",
     )

Also applies to: 380-385

src/deriver/deriver.py (1)

362-378: Thinking extraction likely always None now

Given the new client surface, ReasoningResponse won’t naturally carry _response or thinking. If “thinking” matters operationally (it’s logged and sent to Langfuse), we should explicitly pass it from the client layer rather than rely on introspection.

If acceptable, I can open a follow-up to add an optional thinking field to HonchoLLMCallResponse and set it for OpenAI reasoning models; we then consume it here.

Also applies to: 385-397

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2c5c0f0 and 63a06ef.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • docs/v2/documentation/introduction/vibecoding.mdx (0 hunks)
  • docs/v2/documentation/reference/platform.mdx (2 hunks)
  • pyproject.toml (3 hunks)
  • src/config.py (4 hunks)
  • src/crud/representation.py (1 hunks)
  • src/deriver/consumer.py (4 hunks)
  • src/deriver/deriver.py (6 hunks)
  • src/deriver/prompts.py (0 hunks)
  • src/dialectic/chat.py (8 hunks)
  • src/dialectic/prompts.py (1 hunks)
  • src/dialectic/utils.py (5 hunks)
  • src/routers/peers.py (2 hunks)
  • src/utils/clients.py (1 hunks)
  • src/utils/embedding_store.py (3 hunks)
  • src/utils/logging.py (1 hunks)
  • src/utils/summarizer.py (10 hunks)
  • src/utils/types.py (1 hunks)
  • tests/bench/peer_card_bench.py (2 hunks)
  • tests/conftest.py (2 hunks)
💤 Files with no reviewable changes (2)
  • src/deriver/prompts.py
  • docs/v2/documentation/introduction/vibecoding.mdx
🧰 Additional context used
📓 Path-based instructions (4)
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright

Files:

  • src/utils/logging.py
  • src/utils/types.py
  • src/config.py
  • src/crud/representation.py
  • src/utils/clients.py
  • src/dialectic/prompts.py
  • src/dialectic/utils.py
  • src/deriver/consumer.py
  • tests/bench/peer_card_bench.py
  • src/utils/embedding_store.py
  • src/dialectic/chat.py
  • tests/conftest.py
  • src/deriver/deriver.py
  • src/utils/summarizer.py
  • src/routers/peers.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)

Files:

  • src/utils/logging.py
  • src/utils/types.py
  • src/config.py
  • src/crud/representation.py
  • src/utils/clients.py
  • src/dialectic/prompts.py
  • src/dialectic/utils.py
  • src/deriver/consumer.py
  • src/utils/embedding_store.py
  • src/dialectic/chat.py
  • src/deriver/deriver.py
  • src/utils/summarizer.py
  • src/routers/peers.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Write tests using pytest

Files:

  • tests/bench/peer_card_bench.py
  • tests/conftest.py
tests/conftest.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place shared pytest fixtures in tests/conftest.py

Files:

  • tests/conftest.py
🧬 Code graph analysis (6)
src/utils/clients.py (3)
src/dialectic/chat.py (1)
  • chat (152-345)
src/routers/peers.py (1)
  • chat (156-201)
tests/conftest.py (3)
  • client (168-189)
  • async_wrapper (460-466)
  • sync_wrapper (468-474)
src/dialectic/utils.py (3)
src/utils/shared_models.py (1)
  • SemanticQueries (179-184)
src/dialectic/prompts.py (1)
  • query_generation_prompt (86-135)
src/utils/clients.py (4)
  • honcho_llm_call (102-118)
  • honcho_llm_call (122-137)
  • honcho_llm_call (141-156)
  • honcho_llm_call (159-221)
tests/bench/peer_card_bench.py (3)
src/deriver/prompts.py (1)
  • peer_card_prompt (102-158)
src/utils/clients.py (4)
  • honcho_llm_call (102-118)
  • honcho_llm_call (122-137)
  • honcho_llm_call (141-156)
  • honcho_llm_call (159-221)
src/utils/shared_models.py (1)
  • PeerCardQuery (196-205)
src/dialectic/chat.py (2)
src/utils/clients.py (5)
  • HonchoLLMCallStreamChunk (86-98)
  • honcho_llm_call (102-118)
  • honcho_llm_call (122-137)
  • honcho_llm_call (141-156)
  • honcho_llm_call (159-221)
src/dialectic/prompts.py (1)
  • dialectic_prompt (4-83)
src/deriver/deriver.py (3)
src/utils/shared_models.py (2)
  • ReasoningResponse (91-101)
  • PeerCardQuery (196-205)
src/deriver/prompts.py (2)
  • critical_analysis_prompt (12-99)
  • peer_card_prompt (102-158)
src/utils/clients.py (4)
  • honcho_llm_call (102-118)
  • honcho_llm_call (122-137)
  • honcho_llm_call (141-156)
  • honcho_llm_call (159-221)
src/utils/summarizer.py (2)
src/utils/clients.py (5)
  • HonchoLLMCallResponse (70-83)
  • honcho_llm_call (102-118)
  • honcho_llm_call (122-137)
  • honcho_llm_call (141-156)
  • honcho_llm_call (159-221)
src/models.py (1)
  • Message (173-230)
🪛 LanguageTool
docs/v2/documentation/reference/platform.mdx

[grammar] ~74-~74: There might be a mistake here.
Context: ...tive utilities page. Next to the Peer name you can edit the [Global Peer Configura...

(QB_NEW_EN_OTHER)


[grammar] ~74-~74: Use correct spacing
Context: ...w, explore all utilities for the Peer. Peer Management Dashboard Utilities include: - Message search ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~94-~94: Use correct spacing
Context: ... a Session to open its utilities page. Session Utilities Here you can: - *View and add Messages...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

🔇 Additional comments (27)
src/dialectic/prompts.py (1)

1-1: All mirascope decorators removed – no @prompt_template or from mirascope references detected; prompt functions are now plain.

src/utils/types.py (1)

3-3: Provider strings are consistent
No lingering Providers alias; the "google" literal in SupportedProviders, config.py, and clients.py aligns across runtime configuration and client initialization.

src/utils/embedding_store.py (1)

73-76: Harden Langfuse update_current_generation call

pyproject.toml already pins langfuse≥3.3.2, so update_current_generation is guaranteed to exist—wrap this call in a client‐existence check and swallow any exceptions to avoid flaky tests:

-                lf.update_current_generation(
-                    input={"observations": [obs.model_dump() for obs in observations]},
-                    output={"unique_conclusions": unique_conclusions},
-                )
+                if lf:
+                    try:
+                        lf.update_current_generation(  # type: ignore[attr-defined]
+                            input={"observations": [obs.model_dump() for obs in observations]},
+                            output={"unique_conclusions": unique_conclusions},
+                        )
+                    except Exception as exc:
+                        logger.debug("Langfuse update_current_generation failed: %s", exc)

Likely an incorrect or invalid review comment.

pyproject.toml (2)

12-12: Groq client addition looks good.

No issues spotted; matches Dialectic QUERY_GENERATION_PROVIDER default in config.


30-30: API key plumbing for google-genai is correct. settings.LLM.GEMINI_API_KEY is loaded from LLM_GEMINI_API_KEY and passed directly to genai.client.Client(api_key=…), so no reliance on GOOGLE_API_KEY.

src/config.py (2)

190-191: LGTM on provider type migrations.

Annotations align with the new multi-provider surface; no functional changes.

Also applies to: 197-198, 219-225, 246-247


17-17: SupportedProviders includes all default providers.

tests/conftest.py (1)

305-371: No changes needed to mock dialectic_stream or add embeddings stub
dialectic_stream isn’t implemented as an async generator nor iterated with async for, so returning an AsyncMock works; and there’s no src.embedding_client…embeddings.create call to stub.

Likely an incorrect or invalid review comment.

tests/bench/peer_card_bench.py (2)

145-147: LGTM on prompt construction.

Prompt assembly matches the prompt API and keeps inputs explicit.


161-162: Return model content directly.

Returning the parsed model keeps call sites simple and typed.

src/routers/peers.py (2)

2-2: OK to import AsyncIterator.

Switching to collections.abc.AsyncIterator is the right runtime check target.


169-179: Remove suggestion: non-stream dialectic_chat already returns a string
async def chat ends with return str(response), so in the non-stream path response is the content itself, not a wrapper. Wrapping str(response) in DialecticResponse is correct—no change needed.

Likely an incorrect or invalid review comment.

src/dialectic/chat.py (4)

39-41: Module-level Langfuse client initialization looks good

Creating a single lf via get_client() and gating usage on settings.LANGFUSE_PUBLIC_KEY is sensible.


152-160: Return type change is fine; verify callers handle AsyncIterator

chat() now returns str | AsyncIterator[HonchoLLMCallStreamChunk]. Confirm all call sites (e.g., routers/handlers) branch on stream=True and iterate chunks appropriately.


193-199: Nice: richer Langfuse metadata

Adding dialectic_model alongside query-generation fields improves trace clarity.


66-75: Confirm provider typing compatibility

Type-check tool wasn’t found in the sandbox—manually verify that settings.DIALECTIC.PROVIDER is one of the SupportedProviders enum/literal values so your type checker won’t flag a str vs literal mismatch.

src/utils/summarizer.py (3)

5-5: Good call using cleandoc alias c

Keeps prompt templates tight and readable.


80-85: Public return types updated to HonchoLLMCallResponse[str]—good

This enables token accounting downstream.

Also applies to: 130-134


97-121: Prompts are clear and scoped; nice upgrade

Inclusive-of-previous-summary instruction will reduce drift. No issues spotted.

Also applies to: 143-169

src/deriver/deriver.py (3)

51-51: Module-level Langfuse client is consistent with the new pattern

Matches the rest of the refactor.


54-86: Return only parsed content—verify “thinking” preservation

critical_analysis_call returns ReasoningResponse via response.content. Downstream (derive_new_insights) tries to pull thinking from hidden attributes on the object. Confirm honcho_llm_call attaches provider “thinking” to the parsed model (e.g., via a private attribute) or we’ll silently lose it. If not attached, consider returning the full HonchoLLMCallResponse[ReasoningResponse] and extending it to carry optional thinking.

Would you like a patch to plumb an optional thinking: str | None field through HonchoLLMCallResponse?


320-328: Langfuse generation input/output updates look correct

Nice integration points for reasoning traces.

Also applies to: 411-413

src/utils/clients.py (5)

65-67: Good validation of required providers at module load time

This proactive validation ensures all required LLM providers are configured before the application starts, preventing runtime failures. This is a solid defensive programming practice.


190-194: Verify retry configuration parameters are appropriate

The retry configuration uses an exponential backoff with multiplier=1, min=4s, max=10s. This means wait times of 4s, 4s, 8s (clamped to max=10s). Consider if these values are appropriate for your use case.

Are these retry parameters (3 attempts, 4-10s waits) aligned with your SLA requirements and typical LLM provider response times?


343-351: Excellent handling of GPT-5 specific parameters

The conditional logic properly differentiates between GPT-5 models (using max_completion_tokens) and other models (using max_tokens), while also supporting GPT-5 specific features like reasoning_effort and verbosity.


479-502: Well-structured streaming response handler

The _handle_streaming_response function provides a clean abstraction for handling streaming across different providers. The docstring is comprehensive and the parameter handling is consistent.


642-694: Clean implementation of retry decorator

The retry decorator properly handles both sync and async functions with appropriate sleep mechanisms. The implementation is well-structured and follows Python best practices.

Comment thread docs/v2/documentation/reference/platform.mdx
Comment thread docs/v2/documentation/reference/platform.mdx
Comment thread pyproject.toml
Comment thread pyproject.toml Outdated
Comment thread src/crud/representation.py Outdated
Comment thread src/utils/embedding_store.py
Comment thread src/utils/summarizer.py
Comment thread tests/bench/peer_card_bench.py
Comment thread tests/bench/peer_card_bench.py
Comment thread tests/conftest.py
@dr-frmr dr-frmr marked this pull request as ready for review September 4, 2025 20:32
@dr-frmr

dr-frmr commented Sep 4, 2025

Copy link
Copy Markdown
Contributor Author

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Sep 4, 2025

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
src/crud/representation.py (1)

18-22: Also mark remaining module constants as Final.

Keep consistency for constants used as keys.

-# The key for the working representation in the session peer's internal_metadata
-WORKING_REPRESENTATION_METADATA_KEY = "working_representation"
+# The key for the working representation in the session peer's internal_metadata
+WORKING_REPRESENTATION_METADATA_KEY: Final[str] = "working_representation"

-# Old working representation key--remove in 2.3.0?
-WORKING_REPRESENTATION_LEGACY_METADATA_KEY = "global_representation"
+# Old working representation key--remove in 2.3.0?
+WORKING_REPRESENTATION_LEGACY_METADATA_KEY: Final[str] = "global_representation"
src/dialectic/chat.py (1)

345-347: Avoid redundant str() conversion.

response is already a string from dialectic_call.

-    # Convert AnthropicCallResponse to string for compatibility
-    return str(response)
+    return response
src/deriver/deriver.py (6)

314-321: Same: guard Langfuse generation update behind client presence

Replace if settings.LANGFUSE_PUBLIC_KEY: with if lf:.


356-371: JSON fallback likely dead after structured return; either remove or ensure it's reachable

Given critical_analysis_call currently returns ReasoningResponse, this branch won’t run. If you keep provider-gated unstructured calls, leave it; otherwise remove for simplicity.


444-448: Type mismatch passed to _save_new_observations

You pass ReasoningResponseWithThinking while the callee expects ReasoningResponse. Works at runtime but will fail type-checking.

Apply:

-    async def _save_new_observations(
-        self,
-        original_working_representation: ReasoningResponse,
-        revised_observations: ReasoningResponse,
+    async def _save_new_observations(
+        self,
+        original_working_representation: ReasoningResponseWithThinking,
+        revised_observations: ReasoningResponseWithThinking,
     ) -> dict[str, list[str]]:

…and update the return type below (see next comment).


487-491: Return type under-declares actual element types

new_observations_by_level can hold DeductiveObservation entries, not only str.

Apply:

-        new_observations_by_level: dict[str, list[str]] = find_new_observations(
+        new_observations_by_level: dict[str, list[str | DeductiveObservation]] = find_new_observations(
             original_working_representation, revised_observations
         )

Optionally introduce a type alias for readability.


553-566: Handle string peer-card responses defensively

If peer_card_call ever returns a string (e.g., Anthropic path), .card will raise. Parse when needed.

Apply:

-            response = await peer_card_call(old_peer_card, new_observations)
-            new_peer_card = response.card
+            response = await peer_card_call(old_peer_card, new_observations)
+            if isinstance(response, str):
+                try:
+                    parsed = PeerCardQuery.model_validate_json(response)
+                except Exception:
+                    logger.info("No changes to peer card")
+                    return
+                new_peer_card = parsed.card
+            else:
+                new_peer_card = response.card

61-86: Gate structured output by provider

  • In src/deriver/deriver.py (lines 61–86), add a flag before calling honcho_llm_call to disable response_model when settings.DERIVER.PROVIDER == "anthropic".
  • Pass response_model=ReasoningResponse if use_structured else None in the call.
  • Update the function’s return annotation to ReasoningResponse | str to reflect the possible string fallback.
♻️ Duplicate comments (13)
src/crud/representation.py (1)

15-15: Constant annotated as Final — resolved.

GLOBAL_REPRESENTATION_COLLECTION_NAME is now immutable as suggested previously.

pyproject.toml (1)

26-26: Bump Langfuse to a version exposing update_current_trace.

chat.py uses lf.update_current_trace; this isn’t available in 3.3.2. Raise the floor.

-    "langfuse>=3.3.2",
+    "langfuse>=3.4.0",
src/dialectic/chat.py (1)

90-92: Redact/truncate prompts in logs to reduce PII exposure.

Log a bounded slice and use formatter placeholders.

-    logger.debug("=== DIALECTIC PROMPT ===")
-    logger.debug(prompt)
-    logger.debug("=== END DIALECTIC PROMPT ===")
+    logger.debug("=== DIALECTIC PROMPT (truncated) ===")
+    logger.debug("%s", (prompt[:2000] + "…") if len(prompt) > 2000 else prompt)
+    logger.debug("=== END DIALECTIC PROMPT ===")
-    logger.debug("=== DIALECTIC PROMPT (STREAM) ===")
-    logger.debug(prompt)
-    logger.debug("=== END DIALECTIC PROMPT ===")
+    logger.debug("=== DIALECTIC PROMPT (STREAM, truncated) ===")
+    logger.debug("%s", (prompt[:2000] + "…") if len(prompt) > 2000 else prompt)
+    logger.debug("=== END DIALECTIC PROMPT ===")

Also applies to: 145-147

src/deriver/deriver.py (1)

51-52: Avoid module-level client init; gate on config or lazy-init

Module-level lf = get_client() can race at import time and is unnecessary if Langfuse is disabled. Gate on settings or lazy-init.

Apply:

-lf = get_client()
+lf = get_client() if settings.LANGFUSE_PUBLIC_KEY else None

And guard uses:

-    if settings.LANGFUSE_PUBLIC_KEY:
+    if lf:
         lf.update_current_trace(
             output=format_reasoning_response_as_markdown(final_observations)
         )

Also apply the same if lf: guard at Lines 314-321 and 404-406.

src/utils/clients.py (9)

27-28: Avoid module-level Langfuse client initialization

Same concern: prefer lazy-init or gating to prevent import-time side effects.

Apply:

-lf = get_client()
+lf = get_client() if settings.LANGFUSE_PUBLIC_KEY else None

175-177: Improve error message for missing runtime client

Mirror the earlier error improvement here as well.

Apply:

-    if not client:
-        raise ValueError(f"Missing client for {provider}")
+    if not client:
+        available = ", ".join(sorted(CLIENTS.keys())) or "none"
+        raise ValueError(f"Missing client for {provider}. Available: {available}")

315-319: Anthropic JSON-mode hack is brittle

Appending "{" as assistant content is fragile. Prefer a system instruction for JSON-only or a robust extractor.

Apply:

-            if json_mode:
-                anthropic_params["messages"].append(
-                    {"role": "assistant", "content": "{"}
-                )
+            if json_mode:
+                anthropic_params["system"] = "Respond with valid JSON only."

Also update the streaming branch (Lines 517-519).


404-422: Gemini: surface None content as error rather than empty string

Raising helps callers differentiate empty content vs. transport/model issues.

Apply:

-                text_content = gemini_response.text if gemini_response.text else ""
+                if gemini_response.text is None:
+                    raise ValueError("No text content in Gemini response")
+                text_content = gemini_response.text

582-599: Gemini streaming likely blocks the event loop

Use the async streaming API to avoid blocking.

Apply:

-            if response_model is not None:
-                response_stream = client.models.generate_content_stream(
+            if response_model is not None:
+                response_stream = await client.aio.models.generate_content_stream(  # type: ignore[attr-defined]
                     model=params["model"],
                     contents=prompt_text,
                     config={
                         "response_mime_type": "application/json",
                         "response_schema": response_model,
                     },
                 )
             else:
-                response_stream = client.models.generate_content_stream(
+                response_stream = await client.aio.models.generate_content_stream(  # type: ignore[attr-defined]
                     model=params["model"],
                     contents=prompt_text,
                     config={
                         "response_mime_type": "application/json" if json_mode else None,
                     },
                 )
@@
-            final_chunk = None
-            for chunk in response_stream:
+            final_chunk = None
+            async for chunk in response_stream:
                 if chunk.text:
                     yield HonchoLLMCallStreamChunk(content=chunk.text)
                 final_chunk = chunk

721-749: Avoid shadowing built-ins (min, max)

Rename to prevent confusion and improve readability.

Apply:

-def wait_exponential(
-    *, multiplier: float = 1.0, min: float | None = None, max: float | None = None
+def wait_exponential(
+    *, multiplier: float = 1.0, min_wait: float | None = None, max_wait: float | None = None
 ) -> Callable[[int], float]:
@@
-    clamped_min = float(min) if min is not None else None
-    clamped_max = float(max) if max is not None else None
+    clamped_min = float(min_wait) if min_wait is not None else None
+    clamped_max = float(max_wait) if max_wait is not None else None

And update the caller (Line 193):

-            wait=wait_exponential(multiplier=1, min=4, max=10),
+            wait=wait_exponential(multiplier=1, min_wait=4, max_wait=10),

751-757: Ensure Langfuse generation is closed on success/error

Avoid dangling generations.

Apply:

 def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]:
     @wraps(func)
     async def wrapper(*args: Any, **kwargs: Any) -> Any:
-        lf.start_as_current_generation(name="LLM Call")
-        return await func(*args, **kwargs)
+        if lf:
+            generation = lf.start_as_current_generation(name="LLM Call")
+        try:
+            result = await func(*args, **kwargs)
+            if lf:
+                generation.end()  # type: ignore[reportAttributeAccessIssue]
+            return result
+        except Exception as e:
+            if lf:
+                generation.end(error=str(e))  # type: ignore[reportAttributeAccessIssue]
+            raise

101-119: Overload consistency for stream typing

Minor: consider consistent use of Literal[True, False] across overloads or keep as-is but add a short docstring clarifying the overload behavior.

Also applies to: 141-157


65-67: Improve missing-client error to aid ops

Include available providers to speed up misconfig triage.

Apply:

-    if provider_value not in CLIENTS:
-        raise ValueError(f"Missing client for {provider_name}: {provider_value}")
+    if provider_value not in CLIENTS:
+        available = ", ".join(sorted(CLIENTS.keys())) or "none"
+        raise ValueError(
+            f"Missing client for {provider_name}: {provider_value}. "
+            f"Available providers: {available}"
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 63a06ef and ab16e01.

📒 Files selected for processing (6)
  • pyproject.toml (3 hunks)
  • src/crud/representation.py (2 hunks)
  • src/deriver/deriver.py (6 hunks)
  • src/deriver/prompts.py (0 hunks)
  • src/dialectic/chat.py (8 hunks)
  • src/utils/clients.py (1 hunks)
💤 Files with no reviewable changes (1)
  • src/deriver/prompts.py
🧰 Additional context used
📓 Path-based instructions (2)
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright

Files:

  • src/crud/representation.py
  • src/utils/clients.py
  • src/dialectic/chat.py
  • src/deriver/deriver.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)

Files:

  • src/crud/representation.py
  • src/utils/clients.py
  • src/dialectic/chat.py
  • src/deriver/deriver.py
🧬 Code graph analysis (1)
src/utils/clients.py (2)
src/routers/peers.py (1)
  • chat (156-201)
tests/conftest.py (2)
  • async_wrapper (460-466)
  • sync_wrapper (468-474)
🔇 Additional comments (3)
src/crud/representation.py (1)

2-2: Importing Final is correct.

Matches the coding guideline and enables immutability signaling.

pyproject.toml (1)

97-97: Cleanup looks good.

Dropping "langfuse.decorators" from allowedUntypedLibraries matches the codebase.

src/dialectic/chat.py (1)

160-160: Return type update to AsyncIterator for streaming is correct.

Matches the new streaming surface.

Comment thread pyproject.toml
Comment thread src/deriver/deriver.py
Comment thread src/deriver/deriver.py
Comment thread src/dialectic/chat.py
Comment thread src/dialectic/chat.py
Comment thread src/utils/clients.py
@plastic-labs plastic-labs deleted a comment from coderabbitai Bot Sep 5, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/deriver/deriver.py (1)

358-376: Consider simplifying response parsing logic

The response parsing logic handles both string and object responses, but this dual path increases complexity. Consider standardizing on one approach.

The current implementation correctly handles both string and structured responses from the LLM call. However, since honcho_llm_call is configured with response_model=ReasoningResponse, it should always return a properly typed response. Consider asserting the expected type:

-        # If response is a string, try to parse as JSON
-        if isinstance(response_obj, str):
-            try:
-                response_data = json.loads(response_obj)
-                new_insights = ReasoningResponse(
-                    explicit=response_data.get("explicit", []),
-                    deductive=[
-                        DeductiveObservation(**item)
-                        for item in response_data.get("deductive", [])
-                    ],
-                )
-            except (json.JSONDecodeError, KeyError, TypeError) as e:
-                if settings.SENTRY.ENABLED:
-                    sentry_sdk.capture_exception(e)
-                logger.warning("Failed to parse string response as JSON: %s", e)
-                new_insights = ReasoningResponse(explicit=[], deductive=[])
-        else:
-            # If response is already a ReasoningResponse object
-            new_insights = response_obj
+        # Response should always be a ReasoningResponse when response_model is set
+        if not isinstance(response_obj, ReasoningResponse):
+            logger.error("Unexpected response type: %s", type(response_obj))
+            raise TypeError(f"Expected ReasoningResponse, got {type(response_obj)}")
+        new_insights = response_obj
♻️ Duplicate comments (14)
src/deriver/deriver.py (6)

54-86: Add return type annotation to critical_analysis_call

The function is missing an explicit return type annotation, which would improve type safety and documentation.

Apply this diff to add the return type:

 async def critical_analysis_call(
     peer_id: str,
     peer_card: list[str] | None,
     message_created_at: datetime.datetime,
     working_representation: str | None,
     history: str,
     new_turn: str,
 ) -> ReasoningResponse:

316-324: Use consistent Langfuse client check

Update this check to match the conditional initialization pattern.

Apply this diff:

-        if settings.LANGFUSE_PUBLIC_KEY:
+        if lf:
             lf.update_current_generation(

51-52: Guard Langfuse client initialization with conditional check

The Langfuse client is initialized unconditionally but should only be created when Langfuse is enabled.

Apply this diff to conditionally initialize the client:

-lf = get_client()
-
+lf = get_client() if settings.LANGFUSE_PUBLIC_KEY else None

Then update usage sites to check if the client exists:

-    if settings.LANGFUSE_PUBLIC_KEY:
+    if lf:
         lf.update_current_trace(

88-115: Add comprehensive docstring and verify provider compatibility

The function has a minimal docstring and may fail with Anthropic if it doesn't support response_model.

Apply this diff to improve documentation and add provider validation:

 async def peer_card_call(
     old_peer_card: list[str] | None,
     new_observations: list[str],
 ) -> PeerCardQuery:
     """
-    Generate peer card prompt, call LLM with response model.
+    Update a peer card from old card + new observations via LLM.
+
+    Args:
+        old_peer_card: Existing peer card lines or None.
+        new_observations: New observations to merge.
+
+    Returns:
+        PeerCardQuery: Structured peer card result from the model.
+
+    Raises:
+        NotImplementedError: If the configured provider doesn't support response models.
     """
     prompt = peer_card_prompt(
         old_peer_card=old_peer_card,
         new_observations=new_observations,
     )
+    
+    # Verify provider supports response_model
+    if settings.DERIVER.PEER_CARD_PROVIDER == "anthropic":
+        raise NotImplementedError(
+            "Anthropic provider doesn't support response_model for peer card generation"
+        )

286-289: Use consistent Langfuse client check

Following the suggested conditional initialization pattern, update this check for consistency.

Apply this diff after implementing the conditional initialization:

-    if settings.LANGFUSE_PUBLIC_KEY:
+    if lf:
         lf.update_current_trace(

406-409: Use consistent Langfuse client check

Update this check to match the conditional initialization pattern.

Apply this diff:

-        if settings.LANGFUSE_PUBLIC_KEY:
+        if lf:
             lf.update_current_generation(
src/utils/clients.py (8)

27-28: Consider lazy initialization for Langfuse client

The module-level Langfuse client initialization may cause issues in async environments or during testing. Consider lazy initialization to avoid potential race conditions.

Apply this approach for safer initialization:

-lf = get_client()
+lf = None
+
+def get_lf_client():
+    """Get or create the Langfuse client instance.
+    
+    Uses lazy initialization to avoid module-level side effects
+    and potential race conditions in async environments.
+    """
+    global lf
+    if lf is None and settings.LANGFUSE_PUBLIC_KEY:
+        lf = get_client()
+    return lf

Then update the with_langfuse decorator to use this:

 def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]:
     @wraps(func)
     async def wrapper(*args: Any, **kwargs: Any) -> Any:
-        lf.start_as_current_generation(name="LLM Call")
+        client = get_lf_client()
+        if client:
+            client.start_as_current_generation(name="LLM Call")
         return await func(*args, **kwargs)

175-177: Add list of available providers to error message

The error message could be more helpful by listing available providers.

Apply this diff:

     client = CLIENTS.get(provider)
     if not client:
-        raise ValueError(f"Missing client for {provider}")
+        available_providers = list(CLIENTS.keys())
+        raise ValueError(
+            f"Missing client for {provider}. Available providers: {available_providers}"
+        )

315-318: Anthropic JSON mode implementation is fragile

Appending "{" as an assistant message is a workaround that could lead to malformed JSON if the model doesn't complete the structure properly.

Consider using Anthropic's system message for JSON mode or implementing robust JSON extraction:

             if json_mode:
-                anthropic_params["messages"].append(
-                    {"role": "assistant", "content": "{"}
-                )
+                # Use system message to instruct JSON-only response
+                anthropic_params["system"] = (
+                    "You must respond with valid JSON only. "
+                    "Do not include any text outside the JSON structure."
+                )

600-618: Gemini streaming uses synchronous iterator

The Gemini streaming implementation uses a regular for loop which will block the event loop. This should use async iteration.

The google-genai library provides async streaming support. Update to use the async API:

             if response_model is not None:
-                response_stream = client.models.generate_content_stream(
+                response_stream = await client.aio.models.generate_content_stream(
                     model=params["model"],
                     contents=prompt_text,
                     config={
                         "response_mime_type": "application/json",
                         "response_schema": response_model,
                     },
                 )
             else:
-                response_stream = client.models.generate_content_stream(
+                response_stream = await client.aio.models.generate_content_stream(
                     model=params["model"],
                     contents=prompt_text,
                     config={
                         "response_mime_type": "application/json" if json_mode else None,
                     },
                 )

             final_chunk = None
-            for chunk in response_stream:
+            async for chunk in response_stream:
                 if chunk.text:
                     yield HonchoLLMCallStreamChunk(content=chunk.text)
                 final_chunk = chunk

722-723: Avoid shadowing Python built-in functions

The parameter names min and max shadow Python's built-in functions.

Apply this diff to use non-shadowing names:

 def wait_exponential(
-    *, multiplier: float = 1.0, min: float | None = None, max: float | None = None
+    *, multiplier: float = 1.0, min_wait: float | None = None, max_wait: float | None = None
 ) -> Callable[[int], float]:

And update the implementation:

-    clamped_min = float(min) if min is not None else None
-    clamped_max = float(max) if max is not None else None
+    clamped_min = float(min_wait) if min_wait is not None else None
+    clamped_max = float(max_wait) if max_wait is not None else None

Also update the call site:

-            wait=wait_exponential(multiplier=1, min=4, max=10),
+            wait=wait_exponential(multiplier=1, min_wait=4, max_wait=10),

751-757: Add error handling to Langfuse wrapper

The Langfuse wrapper should properly handle errors and ensure the generation is closed even on failure.

Apply this diff to ensure proper cleanup:

 def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]:
     @wraps(func)
     async def wrapper(*args: Any, **kwargs: Any) -> Any:
-        lf.start_as_current_generation(name="LLM Call")
-        return await func(*args, **kwargs)
+        client = get_lf_client()  # Use the lazy init function
+        if not client:
+            return await func(*args, **kwargs)
+        
+        generation = client.start_as_current_generation(name="LLM Call")
+        try:
+            result = await func(*args, **kwargs)
+            generation.end()
+            return result
+        except Exception as e:
+            generation.end(error=str(e))
+            raise

     return wrapper

516-519: Anthropic streaming JSON mode uses same fragile approach

The streaming implementation has the same JSON mode issue as the non-streaming path.

Apply the same fix as suggested for the non-streaming path:

             if json_mode:
-                anthropic_params["messages"].append(
-                    {"role": "assistant", "content": "{"}
-                )
+                # Use system message to instruct JSON-only response
+                anthropic_params["system"] = (
+                    "You must respond with valid JSON only. "
+                    "Do not include any text outside the JSON structure."
+                )

65-67: Include provider value in error message for better debugging

The error message should include the actual provider value to help with debugging configuration issues.

Apply this diff to improve the error message:

 for provider_name, provider_value in SELECTED_PROVIDERS:
     if provider_value not in CLIENTS:
-        raise ValueError(f"Missing client for {provider_name}: {provider_value}")
+        available = list(CLIENTS.keys())
+        raise ValueError(
+            f"Missing client for {provider_name}. "
+            f"Requested: '{provider_value}', Available: {available}"
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ab16e01 and e807b10.

📒 Files selected for processing (5)
  • src/deriver/deriver.py (6 hunks)
  • src/utils/clients.py (1 hunks)
  • tests/bench/harness.py (1 hunks)
  • tests/utils/__init__.py (1 hunks)
  • tests/utils/test_clients.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright

Files:

  • tests/bench/harness.py
  • tests/utils/__init__.py
  • tests/utils/test_clients.py
  • src/utils/clients.py
  • src/deriver/deriver.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Write tests using pytest

Files:

  • tests/bench/harness.py
  • tests/utils/__init__.py
  • tests/utils/test_clients.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)

Files:

  • src/utils/clients.py
  • src/deriver/deriver.py
🧬 Code graph analysis (3)
tests/utils/test_clients.py (1)
src/utils/clients.py (15)
  • HonchoLLMCallResponse (70-83)
  • HonchoLLMCallStreamChunk (86-98)
  • handle_streaming_response (481-645)
  • honcho_llm_call (102-118)
  • honcho_llm_call (122-137)
  • honcho_llm_call (141-156)
  • honcho_llm_call (159-221)
  • honcho_llm_call_inner (225-237)
  • honcho_llm_call_inner (241-253)
  • honcho_llm_call_inner (257-269)
  • honcho_llm_call_inner (272-478)
  • retry (648-699)
  • stop_after_attempt (702-718)
  • wait_exponential (721-748)
  • with_langfuse (751-757)
src/utils/clients.py (3)
src/dialectic/chat.py (1)
  • chat (152-347)
src/routers/peers.py (1)
  • chat (156-201)
tests/conftest.py (3)
  • client (168-189)
  • async_wrapper (460-466)
  • sync_wrapper (468-474)
src/deriver/deriver.py (3)
src/utils/shared_models.py (2)
  • ReasoningResponse (91-101)
  • PeerCardQuery (196-205)
src/deriver/prompts.py (2)
  • critical_analysis_prompt (12-99)
  • peer_card_prompt (102-157)
src/utils/clients.py (4)
  • honcho_llm_call (102-118)
  • honcho_llm_call (122-137)
  • honcho_llm_call (141-156)
  • honcho_llm_call (159-221)
🔇 Additional comments (7)
tests/bench/harness.py (1)

330-331: LGTM!

The addition of "google.genai.models" to the filter list correctly aligns with the dependency update from google-generativeai to google-genai package. This ensures both the old and new module paths are filtered from the output, maintaining backward compatibility during the transition.

tests/utils/test_clients.py (4)

1-27: LGTM! Comprehensive test coverage structure

The file header and imports are well-organized, covering all necessary dependencies for testing the LLM clients. The docstring clearly outlines the test scope across providers, response models, streaming, and error handling.


42-48: LGTM! Well-designed test model

The SampleTestModel Pydantic model is appropriately designed for testing structured output, with a good mix of required and optional fields.


50-93: LGTM! Thorough test coverage for response models

Excellent test coverage for both HonchoLLMCallResponse and HonchoLLMCallStreamChunk models, including generic type testing and default value verification.


1107-1143: LGTM! Excellent edge case coverage

Great job testing edge cases including zero multiplier, very large attempt numbers, and mutable default handling for finish_reasons. These tests ensure robust behavior under unusual conditions.

src/utils/clients.py (2)

553-556: Use reasoning_effort and verbosity in OpenAI streaming

The streaming handler accepts reasoning_effort and verbosity parameters but they're not being used in the OpenAI case.

These parameters are actually being used correctly in lines 553-556 for GPT-5 models. The function signature is correct.


405-405: Consider consistent null checking for Gemini responses

While the code provides a fallback empty string for missing text, consider applying consistent null checking like other providers.

The current implementation with the fallback empty string is actually reasonable for Gemini, as it handles the case gracefully without throwing an error. The defensive coding approach here is appropriate.

Comment thread tests/utils/__init__.py
Comment thread tests/utils/test_clients.py
Comment thread tests/utils/test_clients.py
Comment thread tests/utils/test_clients.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
pyproject.toml (1)

26-26: Bump Langfuse to a version with tracing update APIs

Your code uses get_client()/update_current_trace/update_current_generation; these aren’t in 3.3.2. Increase the minimum.

Apply:

-    "langfuse>=3.3.2",
+    "langfuse>=3.4.0",

Optionally verify minimal working version:

#!/bin/bash
python - <<'PY'
import importlib, pkg_resources
import langfuse as lf
print("langfuse:", pkg_resources.get_distribution("langfuse").version)
print("has get_client:", hasattr(lf, "get_client"))
print("has update_current_trace:", hasattr(lf, "update_current_trace"))
print("has update_current_generation:", hasattr(lf, "update_current_generation"))
PY
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e807b10 and 3fad59c.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • pyproject.toml (3 hunks)
  • src/config.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright

Files:

  • src/config.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)

Files:

  • src/config.py
🔇 Additional comments (3)
pyproject.toml (2)

12-12: groq dependency: remove if unused or justify retention

rg found only a logger reference at src/deriver/main.py:43 — logging.getLogger("groq._base_client").setLevel(...); no direct imports/usages detected. If groq is not required at runtime, remove it; if it’s required via dynamic import/plugin, keep and document.

Apply if unused:

-    "groq>=0.31.0",

97-97: Approve — trimmed allowedUntypedLibraries looks good; re-run type checker

Verification failed: "/bin/bash: line 2: uvx: command not found". Re-run the project's type checker and paste the output (examples: npx pyright --project pyrightconfig.json, pipx run pyright, or poetry run pyright).

src/config.py (1)

17-17: Type alias swap is correct — SupportedProviders includes required literals.

SupportedProviders is defined as Literal["anthropic","openai","google","groq","custom"] (src/utils/types.py:3). src/config.py and src/utils/clients.py use SupportedProviders and set PROVIDER values to "google", "openai", "anthropic", and "groq". No exact 'Providers' alias or direct imports of Providers were found in the repo.

Comment thread pyproject.toml
Comment thread src/config.py
Comment thread src/config.py
Comment thread src/config.py

@Rajat-Ahuja1997 Rajat-Ahuja1997 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, just one comment about the retry logic

Comment thread src/utils/clients.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3fad59c and ce877b8.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • pyproject.toml (3 hunks)
  • src/utils/clients.py (1 hunks)
  • tests/utils/test_clients.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright

Files:

  • tests/utils/test_clients.py
  • src/utils/clients.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Write tests using pytest

Files:

  • tests/utils/test_clients.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)

Files:

  • src/utils/clients.py
🧬 Code graph analysis (2)
tests/utils/test_clients.py (1)
src/utils/clients.py (12)
  • HonchoLLMCallResponse (67-80)
  • HonchoLLMCallStreamChunk (83-95)
  • handle_streaming_response (478-642)
  • honcho_llm_call (99-115)
  • honcho_llm_call (119-134)
  • honcho_llm_call (138-153)
  • honcho_llm_call (156-218)
  • honcho_llm_call_inner (222-234)
  • honcho_llm_call_inner (238-250)
  • honcho_llm_call_inner (254-266)
  • honcho_llm_call_inner (269-475)
  • with_langfuse (645-651)
src/utils/clients.py (2)
src/dialectic/chat.py (1)
  • chat (152-347)
src/routers/peers.py (1)
  • chat (156-201)
🔇 Additional comments (12)
tests/utils/test_clients.py (4)

639-646: Future-proof Google streaming test to async API

Prod code should switch to async Gemini streaming (see src/utils/clients.py comment). When you do, update this test to patch client.aio.models.generate_content_stream and iterate with async for.

Would you like me to push the matching test changes once the impl flips to async?


253-255: Fix async streaming mock: use real async iterator

__aiter__ is set to a sync iterator; async for will fail at runtime.

Apply:

-        # Set up the async iterator (same as working test_streaming_call)
-        mock_stream.__aiter__.return_value = iter(mock_chunks)
+        # Set up the async iterator
+        async def async_iter():
+            for c in mock_chunks:
+                yield c
+        mock_stream.__aiter__.return_value = async_iter()

932-934: Same issue: make __aiter__ async

Repeat of the above in main streaming test; needs the same fix.

-        mock_stream.__aenter__.return_value = mock_stream
-        mock_stream.__aiter__.return_value = iter(mock_chunks)
+        mock_stream.__aenter__.return_value = mock_stream
+        async def async_iter():
+            for c in mock_chunks:
+                yield c
+        mock_stream.__aiter__.return_value = async_iter()

499-508: Optional: simplify by using AsyncMock directly

Cleaner and consistent with other tests.

-        async def mock_create(**_kwargs: Any):
-            return async_chunk_iterator()
-
-        mock_client.chat.completions.create = mock_create
+        mock_client.chat.completions.create = AsyncMock(return_value=async_chunk_iterator())
src/utils/clients.py (6)

172-175: Improve missing‑client error with available providers

Faster diagnosis in prod/test.

-    if not client:
-        raise ValueError(f"Missing client for {provider}")
+    if not client:
+        available = sorted(CLIENTS.keys())
+        raise ValueError(f"Missing client for {provider}. Available: {available}")

312-316: Anthropic JSON “{” prompt hack — prefer system instruction

More robust than seeding with a dangling brace.

-            if json_mode:
-                anthropic_params["messages"].append(
-                    {"role": "assistant", "content": "{"}
-                )
+            if json_mode:
+                anthropic_params["system"] = "Respond with valid JSON only."

645-651: End Langfuse generation on success/error

Ensure cleanup and useful error context.

 def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]:
     @wraps(func)
     async def wrapper(*args: Any, **kwargs: Any) -> Any:
-        lf.start_as_current_generation(name="LLM Call")
-        return await func(*args, **kwargs)
+        generation = lf.start_as_current_generation(name="LLM Call") if lf else None
+        try:
+            result = await func(*args, **kwargs)
+            if generation:
+                generation.end()
+            return result
+        except Exception as e:
+            if generation:
+                generation.end(error=str(e))
+            raise
 
     return wrapper

24-25: Guard Langfuse client init to avoid import‑time failures

get_client() at import time can raise when env keys aren’t set. Only initialize when Langfuse is enabled.

-lf = get_client()
+lf = get_client() if settings.LANGFUSE_PUBLIC_KEY else None

401-419: Optional: fail fast on empty Gemini text

Current fallback to "" can hide provider issues. Consider explicit error when both text and candidates are missing.

-                text_content = gemini_response.text if gemini_response.text else ""
+                if gemini_response.text is None and not getattr(gemini_response, "candidates", []):
+                    raise ValueError("No text content in Gemini response")
+                text_content = gemini_response.text or ""

62-64: Defer provider validation to app startup (avoid import-time ValueError)

Import-time raise makes tests brittle when credentials/config are absent; validate at startup and raise a clearer runtime/config error.

-for provider_name, provider_value in SELECTED_PROVIDERS:
-    if provider_value not in CLIENTS:
-        raise ValueError(f"Missing client for {provider_name}: {provider_value}")
+def validate_clients() -> None:
+    missing = [(n, v) for n, v in SELECTED_PROVIDERS if v not in CLIENTS]
+    if missing:
+        detail = ", ".join(f"{n}={v}" for n, v in missing)
+        raise RuntimeError(f"Missing LLM clients: {detail}")
+# Call validate_clients() during app startup, not at import time.

Confirm where to call validate_clients() in this repo’s startup lifecycle — e.g. FastAPI lifespan handler, Flask app factory after config, Django AppConfig.ready, or worker startup — so the call can be placed in the proper startup hook.

pyproject.toml (2)

98-98: Confirm type-stub noise; add exceptions only if flagged by basedpyright

basedpyright wasn't available in the verification environment (command not found) and the rg invocation failed — run locally: basedpyright -p pyproject.toml and search for "Missing type stubs". If it reports google-genai, groq, or anthropic, add them to allowedUntypedLibraries in pyproject.toml (currently ["langfuse"] at ~line 98); otherwise keep the list minimal.


27-27: Do not bump the Langfuse requirement — v3 already includes the tracing APIs
update_current_trace and start_as_current_generation were added in the Langfuse Python SDK v3 (OTel-based) release (June 5, 2025), so keep "langfuse>=3.3.2".

Likely an incorrect or invalid review comment.

Comment thread src/utils/clients.py
Comment thread src/utils/clients.py
Comment thread src/utils/clients.py
Comment thread tests/utils/test_clients.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ce877b8 and a472b21.

📒 Files selected for processing (1)
  • src/utils/clients.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{src,tests}/**/*.py: Follow isort conventions with absolute imports preferred
Use snake_case for variables/functions and PascalCase for classes
Keep line length to 88 characters (Black compatible)
Ensure code typechecks with basedpyright

Files:

  • src/utils/clients.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use explicit error handling with appropriate exception types
Write docstrings using Google style
Use specific exception types (e.g., ResourceNotFoundException, ValidationException) instead of generic ones
Use structured logging with context; avoid print statements
Use ruff for linting and formatting (ruff check/format)

Files:

  • src/utils/clients.py
🧬 Code graph analysis (1)
src/utils/clients.py (2)
src/dialectic/chat.py (1)
  • chat (152-347)
src/routers/peers.py (1)
  • chat (156-201)
🪛 GitHub Actions: FastAPI Tests with PostgreSQL and uv
src/utils/clients.py

[error] 595-595: TypeError: object Mock can't be used in 'await' expression while awaiting client.aio.models.generate_content_stream in handle_streaming_response (used by tests/utils/test_clients.py::TestGoogleClient::test_google_streaming). The test run (uv run pytest -x) failed due to improper async mocking; consider using AsyncMock or enabling proper async mocks for the genai client in tests.

🔇 Additional comments (9)
src/utils/clients.py (9)

83-95: Good: streaming chunk model avoids mutable defaults.

Using Field(default_factory=list) is correct and pyright-friendly.


47-50: Instantiate Gemini client via genai.Client (ensures match/case and type checks).

Use the top-level constructor to align with your match client: case genai.Client().

Apply this diff:

-    google = genai.client.Client(api_key=settings.LLM.GEMINI_API_KEY)
+    google = genai.Client(api_key=settings.LLM.GEMINI_API_KEY)

24-25: Make Langfuse client lazy and always end generations (even on error).

Avoid module-level initialization in async contexts and ensure proper cleanup.

Apply this diff:

-lf = get_client()
+lf_instance: Any | None = None
+
+def get_lf() -> Any:
+    """Lazily initialize and return the Langfuse client."""
+    global lf_instance
+    if lf_instance is None:
+        lf_instance = get_client()
+    return lf_instance
@@
 def with_langfuse(func: Callable[..., Any]) -> Callable[..., Any]:
     @wraps(func)
     async def wrapper(*args: Any, **kwargs: Any) -> Any:
-        lf.start_as_current_generation(name="LLM Call")
-        return await func(*args, **kwargs)
+        generation = get_lf().start_as_current_generation(name="LLM Call")
+        try:
+            result = await func(*args, **kwargs)
+            generation.end()
+            return result
+        except Exception as e:
+            generation.end(error=str(e))
+            raise
 
     return wrapper

Also applies to: 651-657


172-175: Improve missing-client error message with available providers.

Aids debugging misconfigurations.

Apply this diff:

-    if not client:
-        raise ValueError(f"Missing client for {provider}")
+    if not client:
+        available = sorted(CLIENTS.keys())
+        raise ValueError(
+            f"Missing client for {provider}. Available providers: {available}"
+        )

318-322: Replace Anthropic JSON-mode “{” workaround with system instruction.

Appending a lone brace is brittle and can yield malformed JSON.

Apply this diff:

-            if json_mode:
-                anthropic_params["messages"].append(
-                    {"role": "assistant", "content": "{"}
-                )
+            if json_mode:
+                anthropic_params["system"] = "Respond with valid JSON only."

407-425: Harden Gemini non-streaming: fail loudly on missing text.

Silent fallback to "" can mask upstream issues.

Apply this diff:

-            # Safely extract response data
-            text_content = gemini_response.text if gemini_response.text else ""
+            # Safely extract response data
+            if gemini_response.text is None:
+                raise ValueError("No text content in Gemini response")
+            text_content = gemini_response.text

98-115: Unify overload typing for stream param for readability.

Use consistent Literal[False]/Literal[True] across overloads and group OpenAI-only params together.

Would you like me to push a small typing-only diff aligning overloads (no behavior change)?

Also applies to: 118-135, 137-154, 221-235, 237-251, 253-267


528-545: Guard Anthropic streaming with try/except to avoid resource leaks and emit done chunk on error.

Ensure cleanup and consistent stream termination.

Apply this diff:

-            async with client.messages.stream(**anthropic_params) as anthropic_stream:
-                async for chunk in anthropic_stream:
-                    if (
-                        chunk.type == "content_block_delta"
-                        and hasattr(chunk, "delta")
-                        and hasattr(chunk.delta, "text")
-                    ):
-                        text_content = getattr(chunk.delta, "text", "")
-                        yield HonchoLLMCallStreamChunk(content=text_content)
-                final_message = await anthropic_stream.get_final_message()
-                yield HonchoLLMCallStreamChunk(
-                    content="",
-                    is_done=True,
-                    finish_reasons=[final_message.stop_reason]
-                    if final_message.stop_reason
-                    else [],
-                )
+            try:
+                async with client.messages.stream(**anthropic_params) as anthropic_stream:
+                    async for chunk in anthropic_stream:
+                        if (
+                            chunk.type == "content_block_delta"
+                            and hasattr(chunk, "delta")
+                            and hasattr(chunk.delta, "text")
+                        ):
+                            text_content = getattr(chunk.delta, "text", "")
+                            yield HonchoLLMCallStreamChunk(content=text_content)
+                    final_message = await anthropic_stream.get_final_message()
+                    yield HonchoLLMCallStreamChunk(
+                        content="",
+                        is_done=True,
+                        finish_reasons=[final_message.stop_reason]
+                        if final_message.stop_reason
+                        else [],
+                    )
+            except Exception:
+                yield HonchoLLMCallStreamChunk(
+                    content="", is_done=True, finish_reasons=["error"]
+                )
+                raise

582-621: Fix Gemini streaming: don't await stream factory; handle both awaitable and iterator (unblocks tests).

Awaiting client.aio.models.generate_content_stream(...) causes TypeError with mocked clients and is unnecessary if the SDK returns an async iterator. Handle both cases and iterate with async for.

Apply this diff:

+import inspect
@@
         case genai.Client():
             prompt_text = params["messages"][0]["content"] if params["messages"] else ""
 
             if response_model is not None:
-                response_stream = await client.aio.models.generate_content_stream(
-                    model=params["model"],
-                    contents=prompt_text,
-                    config={
-                        "response_mime_type": "application/json",
-                        "response_schema": response_model,
-                    },
-                )
+                _call = client.aio.models.generate_content_stream(
+                    model=params["model"],
+                    contents=prompt_text,
+                    config={
+                        "response_mime_type": "application/json",
+                        "response_schema": response_model,
+                    },
+                )
+                response_stream = await _call if inspect.isawaitable(_call) else _call
             else:
-                response_stream = await client.aio.models.generate_content_stream(
-                    model=params["model"],
-                    contents=prompt_text,
-                    config={
-                        "response_mime_type": "application/json" if json_mode else None,
-                    },
-                )
+                _call = client.aio.models.generate_content_stream(
+                    model=params["model"],
+                    contents=prompt_text,
+                    config={
+                        "response_mime_type": "application/json" if json_mode else None,
+                    },
+                )
+                response_stream = await _call if inspect.isawaitable(_call) else _call
 
             final_chunk = None
             async for chunk in response_stream:
                 if chunk.text:
                     yield HonchoLLMCallStreamChunk(content=chunk.text)
                 final_chunk = chunk

Comment thread src/utils/clients.py
Comment thread src/utils/clients.py
Comment thread src/utils/clients.py
@VVoruganti VVoruganti merged commit 5d88c45 into main Sep 24, 2025
11 checks passed
@dr-frmr dr-frmr deleted the ben/replace-mirascope branch September 24, 2025 19:51
This was referenced Oct 1, 2025
ranc1 pushed a commit to ranc1/honcho that referenced this pull request Apr 16, 2026
* feat: add optional JWT and webhook secrets to honcho instance creation

* chore: ignore spurious warnings

* feat: add response format if using gpt-5 model family

* feat: add response models to all apis except anthropic

* fix: raise NotImplementedError for response models in AsyncAnthropic client

* chore: address review

* chore: add tests, cleanup

* fix: use tenacity instead of custom fns

* fix: pass all params to streaming, nonblocking streaming

* chore: fix test mock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants