Skip to content

Conversation

@nina-kollman
Copy link
Contributor

@nina-kollman nina-kollman commented Dec 15, 2025

Fixes TLP-1413

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Introduces guardrails framework for AI evaluation, adds medical and travel examples, and improves response handling and evaluator execution.

  • New Features:
    • Introduced guardrails framework for real-time AI evaluation and safety checks.
    • Added guardrail_medical_chat_example.py for interactive medical chat with guardrails and evaluation callbacks.
    • Added guardrail_travel_agent_example.py for travel agent with PII-detection guardrail and privacy alerts.
    • Exposed Guardrails in client.py for easier access.
  • Improvements:
    • Accumulate and return streamed response text in handle_runner_stream() in travel_agent_example.py.
    • run_experiment_evaluator() in evaluator.py now waits for and returns streaming evaluation results.

This description was created by Ellipsis for 7799fd1. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • New Features

    • Introduced guardrails framework for real-time AI evaluation and safety checks.
    • Added interactive medical chat example with guardrails and evaluation callbacks.
    • Added travel agent example with PII-detection guardrail, privacy alerts, and an interactive guarded agent.
  • Improvements

    • Accumulate and return streamed agent response text for richer outputs.
    • Evaluator execution now waits for and returns streaming evaluation results.
    • Exposed guardrails on the client surface for easier access.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 15, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a Guardrails SDK with Pydantic types and a decorator, integrates it into the Traceloop client, makes evaluator calls wait for SSE results, and provides sample guardrail examples (medical chat and PII-aware travel agent) plus travel-agent streaming response aggregation.

Changes

Cohort / File(s) Summary
Guardrails core
packages/traceloop-sdk/traceloop/sdk/guardrails/__init__.py, packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py, packages/traceloop-sdk/traceloop/sdk/guardrails/types.py
New guardrails package: exports, guardrail decorator, Guardrails class with execute_evaluator, and Pydantic models (InputExtractor, ExecuteEvaluatorRequest, OutputSchema).
SDK client integration
packages/traceloop-sdk/traceloop/sdk/client/client.py
Adds a new public guardrails attribute to Client, initialized with the SDK HTTP client.
Evaluator SSE behavior
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py
run_experiment_evaluator now waits for and returns SSE streaming results (uses execution_id and stream_url, with timeout handling).
Travel agent streaming
packages/sample-app/sample_app/agents/travel_agent_example.py
Aggregates streamed response parts into response_text_parts; handle_runner_stream now returns (tool_calls_made, combined_response_text); run_travel_query gains return_response_text: bool = False and conditionally returns response text or tool calls.
Sample examples (medical & travel)
packages/sample-app/sample_app/guardrail_medical_chat_example.py, packages/sample-app/sample_app/guardrail_travel_agent_example.py
New interactive examples: medical chat with evaluation callback and PII-aware travel agent wrapper that captures stdout, applies guardrail evaluation, and emits privacy warnings or sanitized outputs.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant GuardedFn as Decorated Function
    participant GuardrailWrapper as Guardrail Decorator
    participant Client as Traceloop Client
    participant GuardrailsSvc as Guardrails.execute_evaluator
    participant EvaluatorSvc as Evaluator (SSE)
    participant Callback as on_evaluation_complete

    User->>GuardedFn: call(input)
    GuardedFn->>GuardrailWrapper: wrapper runs original (sync/async)
    GuardrailWrapper->>GuardedFn: execute function -> result dict
    GuardrailWrapper->>GuardrailsSvc: execute_evaluator(slug, mapped_input)
    GuardrailsSvc->>Client: send evaluator request (HTTP) -> execution_id, stream_url
    GuardrailsSvc->>EvaluatorSvc: wait for SSE stream using execution_id/stream_url
    EvaluatorSvc-->>GuardrailsSvc: SSE result (OutputSchema)
    alt callback provided
        GuardrailsSvc->>Callback: on_evaluation_complete(output, original)
        Callback-->>GuardrailWrapper: modified or original result
    end
    GuardrailWrapper-->>User: final result (possibly privacy warning)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Inspect async vs sync wrapper handling and event-loop management in guardrails/guardrails.py.
  • Verify SSE wait/timeout/error handling and returned payload shapes in evaluator/evaluator.py.
  • Validate Pydantic model fields and data mapping in guardrails/types.py.
  • Confirm backward compatibility of run_travel_query and its callers in agents/travel_agent_example.py.
  • Ensure Client.guardrails is initialized with the correct HTTP client in client/client.py.

Poem

🐰 I hopped through code with careful paws,

Built a fence of rules and thoughtful laws.
Evaluators listen, streams unwind,
Privacy first — for every mind.
The rabbit smiles: safe outputs, signed.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(guardrail): Add guardrail decorator' directly and clearly describes the main change - introduction of a guardrail decorator for real-time evaluation.
Linked Issues check ✅ Passed All core requirements from TLP-1413 are implemented: guardrail decorator, Guardrails class, type models, client integration, evaluator updates, and example demos.
Out of Scope Changes check ✅ Passed All changes align with TLP-1413 guardrail feature scope; enhancements to travel_agent_example.py support the guardrail demo and are on-scope.
Docstring Coverage ✅ Passed Docstring coverage is 82.61% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nk/new_guardrail

Comment @coderabbitai help to get the list of available commands and usage tips.

@nina-kollman nina-kollman marked this pull request as ready for review December 15, 2025 18:20
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 9afe4b4 in 2 minutes and 17 seconds. Click for details.
  • Reviewed 709 lines of code in 8 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/agents/travel_agent_example.py:908
  • Draft comment:
    Using time.sleep in an async function can block the event loop. Consider using 'await asyncio.sleep(args.delay)' instead.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py:119
  • Draft comment:
    The synchronous wrapper uses asyncio.get_event_loop() and run_until_complete, which can raise an error if called within an already running event loop. Consider documenting this limitation or refactoring the sync wrapper.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is technically correct - using get_event_loop() and run_until_complete() can fail if called within an already running event loop. However, the comment suggests two options: "documenting this limitation" or "refactoring". Suggesting documentation is not actionable as a code change. The "refactoring" part is vague and doesn't provide a clear solution. According to the rules, comments should be about clear code changes required, not speculative issues or documentation requests. The comment also says "Consider documenting" which is a soft suggestion, not a clear requirement. This violates the rule about not making comments unless there is clearly a code change required. While the technical issue is real, the comment might be pointing to a legitimate bug that should be fixed. If this code will fail in common scenarios (like being called from an async context), that's a real problem. However, the comment doesn't definitively say this WILL fail, just that it CAN fail, making it somewhat speculative. The comment is speculative ("can raise an error") rather than definitive. It also suggests documentation as an option, which is not an actionable code change. The comment doesn't provide a clear, specific fix. According to the rules, speculative comments like "If X, then Y is an issue" should be removed - this fits that pattern since it says it "can" raise an error, not that it "will" raise an error. This comment should be deleted because it's speculative ("can raise an error" rather than "will raise an error"), suggests documentation rather than a clear code fix, and doesn't provide an actionable solution. It violates the rule against speculative comments and comments that don't clearly require a code change.

Workflow ID: wflow_b2UL01oUBQnJZmxx

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

if not evaluator_result.success:
# Return a modified dict with error message
print(f"handle_medical_evaluation was activated - evaluator_result: {evaluator_result}")
return {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return statement returns a set literal instead of a dict. Likely intended to return {'text': 'There is an issue with the request. Please try again.'}.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (9)
packages/sample-app/sample_app/agents/travel_agent_example.py (1)

704-736: Consider using typing.overload for better type inference.

The conditional return type (either str or list) works but reduces type safety for callers. For a sample app this is acceptable, but for production code, consider using @overload decorators or separate functions.

from typing import overload, Literal

@overload
async def run_travel_query(query: str, return_response_text: Literal[True]) -> str: ...
@overload
async def run_travel_query(query: str, return_response_text: Literal[False] = ...) -> list: ...

async def run_travel_query(query: str, return_response_text: bool = False) -> str | list:
    ...
packages/traceloop-sdk/traceloop/sdk/guardrails/types.py (1)

5-9: Consider using Literal for source and adding validation for regex consistency.

The source field appears to accept only "input" or "output" based on the comment. Using Literal improves type safety. Additionally, when use_regex=True, regex_pattern should be validated as required.

-from typing import Dict, Optional, Any
+from typing import Dict, Optional, Any, Literal
 from pydantic import BaseModel
+from pydantic import model_validator


 class InputExtractor(BaseModel):
-    source: str  # "input" or "output"
+    source: Literal["input", "output"]
     key: Optional[str] = None  # Key to extract from
     use_regex: bool = False  # Whether to use regex pattern
     regex_pattern: Optional[str] = None  # Regex pattern to apply
+
+    @model_validator(mode='after')
+    def validate_regex_pattern(self):
+        if self.use_regex and not self.regex_pattern:
+            raise ValueError("regex_pattern is required when use_regex is True")
+        return self
packages/traceloop-sdk/traceloop/sdk/guardrails/__init__.py (1)

4-11: Consider sorting __all__ alphabetically.

Static analysis (Ruff RUF022) suggests sorting __all__ for consistency. This is a minor style improvement.

 __all__ = [
+    "ExecuteEvaluatorRequest",
+    "EvaluatorSpec",
+    "Guardrails",
+    "InputExtractor",
+    "OutputSchema",
     "guardrail",
-    "EvaluatorSpec",
-    "Guardrails",
-    "InputExtractor",
-    "OutputSchema",
-    "ExecuteEvaluatorRequest",
 ]
packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py (2)

13-14: Remove unused noqa directives.

Ruff indicates the # noqa: E402 comments are not needed (E402 is not enabled). These can be safely removed.

-from traceloop.sdk.guardrails.guardrails import guardrail  # noqa: E402
-from traceloop.sdk.evaluator import EvaluatorMadeByTraceloop  # noqa: E402
+from traceloop.sdk.guardrails.guardrails import guardrail
+from traceloop.sdk.evaluator import EvaluatorMadeByTraceloop

84-100: Consider moving import io to the top of the file.

The io module import inside the function is unconventional. Moving it to the top of the file with other imports improves readability.

+import io
 import asyncio
 import sys
 from pathlib import Path

And remove the inner import:

     import io
-
     # Capture stdout to prevent streaming output before guardrail check
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (3)

1-7: Unused import: httpx is imported but never used directly.

The httpx module is imported at line 5 but the type hint for httpx.AsyncClient in the Guardrails class doesn't require the import at runtime. Consider removing it or adding a type-checking guard.

 from typing import Dict, Any, Optional, Callable, TypeVar, ParamSpec, Union
 from traceloop.sdk.evaluator.config import EvaluatorDetails
 from traceloop.sdk.evaluator.evaluator import Evaluator
 from .types import InputExtractor, OutputSchema
-import httpx
+from typing import TYPE_CHECKING
 import asyncio
 from functools import wraps
+
+if TYPE_CHECKING:
+    import httpx

Then update line 149 to use a string annotation: async_http_client: "httpx.AsyncClient".


45-46: Use TypeError for invalid type checks.

When the evaluator argument is neither a str nor EvaluatorDetails, a TypeError is more semantically appropriate than ValueError.

     else:
-        raise ValueError(f"evaluator must be str or EvaluatorDetails, got {type(evaluator)}")
+        raise TypeError(f"evaluator must be str or EvaluatorDetails, got {type(evaluator).__name__}")

66-68: Use logging module instead of print() for production SDK code.

Using print() for warnings and errors in SDK code is not appropriate. Users cannot control log levels or redirect output. Consider using Python's logging module.

+import logging
+
+logger = logging.getLogger(__name__)

 # In async_wrapper and sync_wrapper:
-            print(f"Warning: Could not get Traceloop client: {e}")
+            logger.warning("Could not get Traceloop client: %s", e)

 # In execute_evaluator:
-            print(f"Error executing evaluator {slug}. Error: {str(e)}")
+            logger.error("Error executing evaluator %s: %s", slug, e)

Also applies to: 115-117, 199-202

packages/sample-app/sample_app/guardrail_medical_chat_example.py (1)

20-44: Consider extracting the common LLM call logic to reduce duplication.

Three functions (get_doctor_response_simple, get_doctor_response_with_pii_check, get_doctor_response) share identical implementation logic—same system prompt, same API call, same return structure. For a sample app this is acceptable, but extracting the core logic could improve maintainability.

async def _call_doctor_llm(patient_message: str) -> dict:
    """Core LLM call logic for doctor responses."""
    system_prompt = """You are a medical AI assistant..."""
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": patient_message}
        ],
        max_tokens=500,
        temperature=0
    )
    return {"text": response.choices[0].message.content}

@guardrail(evaluator="medical-advice-given")
async def get_doctor_response_simple(patient_message: str) -> dict:
    return await _call_doctor_llm(patient_message)

Also applies to: 72-92, 95-118

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9d8cef6 and 9afe4b4.

📒 Files selected for processing (8)
  • packages/sample-app/sample_app/agents/travel_agent_example.py (3 hunks)
  • packages/sample-app/sample_app/guardrail_medical_chat_example.py (1 hunks)
  • packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/client/client.py (3 hunks)
  • packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/guardrails/__init__.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/guardrails/types.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py
  • packages/traceloop-sdk/traceloop/sdk/guardrails/types.py
  • packages/traceloop-sdk/traceloop/sdk/client/client.py
  • packages/sample-app/sample_app/guardrail_medical_chat_example.py
  • packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py
  • packages/sample-app/sample_app/agents/travel_agent_example.py
  • packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py
  • packages/traceloop-sdk/traceloop/sdk/guardrails/__init__.py
🧬 Code graph analysis (4)
packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py (2)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
  • guardrail (17-140)
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (2)
  • EvaluatorMadeByTraceloop (5-729)
  • pii_detector (22-38)
packages/traceloop-sdk/traceloop/sdk/client/client.py (1)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
  • Guardrails (143-202)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (3)
  • Traceloop (36-267)
  • init (48-198)
  • get (251-267)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
  • guardrail (17-140)
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (2)
  • EvaluatorMadeByTraceloop (5-729)
  • sexism_detector (340-357)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (4)
packages/traceloop-sdk/traceloop/sdk/evaluator/config.py (1)
  • EvaluatorDetails (5-23)
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (2)
  • Evaluator (16-148)
  • run_experiment_evaluator (70-112)
packages/traceloop-sdk/traceloop/sdk/guardrails/types.py (2)
  • InputExtractor (5-9)
  • OutputSchema (18-20)
packages/traceloop-sdk/traceloop/sdk/__init__.py (1)
  • Traceloop (36-267)
🪛 Ruff (0.14.8)
packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py

13-13: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)


14-14: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)


158-158: Do not catch blind exception: Exception

(BLE001)

packages/sample-app/sample_app/guardrail_medical_chat_example.py

15-15: Avoid specifying long messages outside the exception class

(TRY003)


155-155: Do not catch blind exception: Exception

(BLE001)

packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py

46-46: Prefer TypeError exception for invalid type

(TRY004)


46-46: Avoid specifying long messages outside the exception class

(TRY003)


58-61: Prefer TypeError exception for invalid type

(TRY004)


58-61: Avoid specifying long messages outside the exception class

(TRY003)


66-66: Do not catch blind exception: Exception

(BLE001)


90-93: Prefer TypeError exception for invalid type

(TRY004)


90-93: Avoid specifying long messages outside the exception class

(TRY003)


101-104: Avoid specifying long messages outside the exception class

(TRY003)


115-115: Do not catch blind exception: Exception

(BLE001)


199-199: Do not catch blind exception: Exception

(BLE001)


200-200: Use explicit conversion flag

Replace with conversion flag

(RUF010)


202-202: Use explicit conversion flag

Replace with conversion flag

(RUF010)

packages/traceloop-sdk/traceloop/sdk/guardrails/__init__.py

4-11: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Lint
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Test Packages (3.10)
🔇 Additional comments (8)
packages/traceloop-sdk/traceloop/sdk/client/client.py (1)

10-10: LGTM!

The Guardrails integration follows the established pattern for other client components (user_feedback, datasets, experiment). The import, type annotation, and initialization are consistent with the existing architecture.

Also applies to: 29-29, 70-70

packages/sample-app/sample_app/agents/travel_agent_example.py (1)

653-701: LGTM!

The streaming handler correctly accumulates response text alongside tool calls. The tuple return enables the caller to access both.

packages/traceloop-sdk/traceloop/sdk/guardrails/types.py (1)

12-20: LGTM!

ExecuteEvaluatorRequest and OutputSchema are well-defined with appropriate field types and defaults.

packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (1)

104-112: Verify the timeout behavior is intentional.

The timeout_in_sec parameter is applied independently to both the initial POST request and the SSE wait_for_result. In the worst case, the total wait time could be up to 2 * timeout_in_sec. If you intend for timeout_in_sec to be the total deadline, consider tracking elapsed time and passing the remaining time to the SSE client.

packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py (2)

26-61: LGTM!

The PII detection handler correctly differentiates between success/failure cases, avoids mutating the original result by creating a copy, and properly handles the captured stdout.


103-162: LGTM!

The interactive main loop is well-structured with proper error handling. The broad Exception catch (flagged by Ruff BLE001) is appropriate for an interactive CLI demo to ensure graceful error recovery without crashing the session.

packages/sample-app/sample_app/guardrail_medical_chat_example.py (2)

9-17: LGTM on API key handling.

The API key is correctly retrieved from environment variables and validated before use, adhering to the coding guideline of storing API keys only in environment variables.


121-158: LGTM for sample app interactive session.

The interactive chat loop handles user input gracefully with appropriate exit conditions and error recovery. The broad exception catch on line 155 is acceptable for an interactive demo application.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed ab768fb in 1 minute and 36 seconds. Click for details.
  • Reviewed 196 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/guardrail_medical_chat_example.py:61
  • Draft comment:
    Return value changed to a dict with a 'text' key. This fixes the previous bug where an invalid dict literal was returned.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining a change that was made. It doesn't provide a suggestion, ask for confirmation, or point out a potential issue. It simply states what was done and why, which violates the rule against purely informative comments.
2. packages/sample-app/sample_app/guardrail_medical_chat_example.py:68
  • Draft comment:
    Evaluator changed from sexism_detector to pii_detector. Confirm that this switch aligns with your evaluation logic and that the callback handles the updated evaluator result appropriately.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
3. packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py:79
  • Draft comment:
    Converting all values from the original result to strings for evaluator_data. Ensure that this conversion (using str()) is appropriate for all expected data types.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that converting values to strings is appropriate, which is a form of asking for confirmation or double-checking. This violates the rule against asking the author to confirm their intention or ensure behavior is intended.
4. packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py:125
  • Draft comment:
    Removed usage of InputExtractor in the sync_wrapper and now directly converting values to strings. Confirm that this change meets the requirements for data conversion in the evaluator.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that the change meets certain requirements, which violates the rule against asking for confirmation or ensuring behavior is intended. It does not provide a specific suggestion or point out a potential issue with the code itself.
5. packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py:145
  • Draft comment:
    Using loop.run_until_complete in sync_wrapper might cause issues if an event loop is already running. Consider documenting or handling such edge cases.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_Tw4oSAOA7R9gy9zr

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)

145-150: Deprecated event loop pattern may fail in async contexts.

asyncio.get_event_loop() is deprecated in Python 3.10+ when no running loop exists, and loop.run_until_complete() will raise RuntimeError if called from within an already-running event loop (e.g., Jupyter notebooks, async web frameworks).

Consider using a more robust approach:

-            loop = asyncio.get_event_loop()
-            evaluator_result = loop.run_until_complete(
-                client_instance.guardrails.execute_evaluator(
-                    slug, evaluator_data, evaluator_version, evaluator_config
-                )
-            )
+            try:
+                loop = asyncio.get_running_loop()
+                # Already in async context - this is problematic
+                raise RuntimeError(
+                    "Cannot use sync guardrail wrapper within an async context. "
+                    "Use an async function instead."
+                )
+            except RuntimeError:
+                # No running loop, safe to use asyncio.run()
+                evaluator_result = asyncio.run(
+                    client_instance.guardrails.execute_evaluator(
+                        slug, evaluator_data, evaluator_version, evaluator_config
+                    )
+                )

Alternatively, document this limitation clearly in the docstring.

🧹 Nitpick comments (6)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (1)

95-118: Consider extracting shared logic to reduce duplication.

The functions get_doctor_response_simple, get_doctor_response_with_pii_check, and get_doctor_response share identical implementation logic (system prompt, OpenAI call, return structure). Consider extracting the common logic:

async def _call_medical_ai(patient_message: str) -> dict:
    """Internal helper for medical AI calls."""
    system_prompt = """You are a medical AI assistant..."""
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": patient_message}
        ],
        max_tokens=500,
        temperature=0
    )
    return {"text": response.choices[0].message.content}

@guardrail(evaluator="medical-advice-given")
async def get_doctor_response(patient_message: str) -> dict:
    return await _call_medical_ai(patient_message)

For a sample app, some duplication is acceptable if it improves readability for demonstration purposes.

packages/sample-app/sample_app/guardrail_travel_agent_example.py (2)

9-14: Remove unused # noqa directives.

The # noqa: E402 comments on lines 13-14 are unnecessary since these imports don't actually violate E402 (they're after path manipulation which is intentional). Static analysis confirms these are unused.

-from traceloop.sdk.guardrails.guardrails import guardrail  # noqa: E402
-from traceloop.sdk.evaluator import EvaluatorMadeByTraceloop  # noqa: E402
+from traceloop.sdk.guardrails.guardrails import guardrail
+from traceloop.sdk.evaluator import EvaluatorMadeByTraceloop

84-100: Consider moving io import to module level.

The import io on line 84 could be moved to the top of the file with other imports for consistency. However, keeping it local is acceptable if the intent is to minimize imports when the function isn't called.

The stdout capture/restore pattern with try/finally is correctly implemented to ensure cleanup even if exceptions occur.

packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (3)

51-52: Consider using TypeError for type validation.

Per Python conventions, TypeError is preferred over ValueError when the issue is an incorrect type.

     else:
-        raise ValueError(f"evaluator must be str or EvaluatorDetails, got {type(evaluator)}")
+        raise TypeError(f"evaluator must be str or EvaluatorDetails, got {type(evaluator)}")

227-230: Graceful error handling with minor style suggestion.

The broad exception handling with failure fallback is appropriate for guardrails—evaluation failures shouldn't crash the application. Consider using f-string conversion flags for cleaner formatting:

         except Exception as e:
-            print(f"Error executing evaluator {slug}. Error: {str(e)}")
+            print(f"Error executing evaluator {slug}. Error: {e!s}")
             # Return a failure result on error
-            return OutputSchema(reason=f"Evaluation error: {str(e)}", success=False)
+            return OutputSchema(reason=f"Evaluation error: {e!s}", success=False)

17-20: Type annotation doesn't reflect sync function support.

The return type Callable[[Callable[P, Awaitable[Dict[str, Any]]]], Callable[P, Awaitable[Dict[str, Any]]]] indicates only async functions are supported, but the decorator handles both async and sync functions (lines 163-166). This could cause type checker warnings for sync usage.

Consider using @overload or Union to properly type both cases:

from typing import overload

@overload
def guardrail(
    evaluator: EvaluatorSpec,
    on_evaluation_complete: Optional[Callable[[OutputSchema, Any], Any]] = None
) -> Callable[[Callable[P, Awaitable[Dict[str, Any]]]], Callable[P, Awaitable[Dict[str, Any]]]]: ...

@overload
def guardrail(
    evaluator: EvaluatorSpec,
    on_evaluation_complete: Optional[Callable[[OutputSchema, Any], Any]] = None
) -> Callable[[Callable[P, Dict[str, Any]]], Callable[P, Dict[str, Any]]]: ...

For now, the # type: ignore comments handle this, but proper typing would improve developer experience.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9afe4b4 and ab768fb.

📒 Files selected for processing (3)
  • packages/sample-app/sample_app/guardrail_medical_chat_example.py (1 hunks)
  • packages/sample-app/sample_app/guardrail_travel_agent_example.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/sample-app/sample_app/guardrail_medical_chat_example.py
  • packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py
  • packages/sample-app/sample_app/guardrail_travel_agent_example.py
🧬 Code graph analysis (2)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (3)
  • Traceloop (36-267)
  • init (48-198)
  • get (251-267)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
  • guardrail (17-168)
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (1)
  • pii_detector (22-38)
packages/sample-app/sample_app/guardrail_travel_agent_example.py (2)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
  • guardrail (17-168)
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (1)
  • pii_detector (22-38)
🪛 Ruff (0.14.8)
packages/sample-app/sample_app/guardrail_medical_chat_example.py

15-15: Avoid specifying long messages outside the exception class

(TRY003)


155-155: Do not catch blind exception: Exception

(BLE001)

packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py

52-52: Prefer TypeError exception for invalid type

(TRY004)


52-52: Avoid specifying long messages outside the exception class

(TRY003)


62-65: Prefer TypeError exception for invalid type

(TRY004)


62-65: Avoid specifying long messages outside the exception class

(TRY003)


75-78: Avoid specifying long messages outside the exception class

(TRY003)


88-88: Do not catch blind exception: Exception

(BLE001)


114-117: Prefer TypeError exception for invalid type

(TRY004)


114-117: Avoid specifying long messages outside the exception class

(TRY003)


127-130: Avoid specifying long messages outside the exception class

(TRY003)


141-141: Do not catch blind exception: Exception

(BLE001)


227-227: Do not catch blind exception: Exception

(BLE001)


228-228: Use explicit conversion flag

Replace with conversion flag

(RUF010)


230-230: Use explicit conversion flag

Replace with conversion flag

(RUF010)

packages/sample-app/sample_app/guardrail_travel_agent_example.py

13-13: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)


14-14: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)


158-158: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Lint
🔇 Additional comments (10)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (5)

13-17: LGTM! Secure API key handling.

The API key is correctly retrieved from an environment variable and validated before use, following the coding guidelines for secure secret handling.


20-43: LGTM!

The function correctly demonstrates slug-based guardrail usage and returns a properly structured dict with the text field.


46-64: LGTM! Previous issue resolved.

The callback now correctly returns a dict with the text key on evaluation failure, addressing the previous review concern.


67-92: LGTM! Previous issue resolved.

The function name and docstring now correctly match the pii_detector evaluator being used.


155-157: Broad exception handling acceptable for sample CLI app.

The broad except Exception is appropriate here for user-facing error handling in an interactive CLI context. For production code, consider catching more specific exceptions.

packages/sample-app/sample_app/guardrail_travel_agent_example.py (2)

26-61: LGTM!

The callback properly handles both PII detection (returning sanitized warning) and safe pass-through cases. Good use of .copy() to avoid mutating the original result, and proper cleanup of the internal _captured_stdout field.


155-162: Broad exception handling acceptable for sample CLI app.

The broad except Exception with traceback printing is reasonable for a sample/demo application to aid debugging. The import of traceback inside the except block is fine for this rarely-executed error path.

packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (3)

55-104: LGTM! Async wrapper implementation is now consistent with sync wrapper.

Both wrappers correctly build evaluator_data as Dict[str, str] by converting field values to strings. The previous review concern about inconsistency between async and sync wrappers has been addressed.


180-186: LGTM! Type annotation is consistent with callers.

The data: Dict[str, str] parameter type correctly matches what the decorator wrappers pass. The previous review concern about InputExtractor mismatch has been addressed.


213-225: Good dual-format result parsing.

The result parsing correctly handles both "Made by Traceloop" evaluator format (evaluator_result: pass/fail) and custom evaluator format. Using model_validate for the fallback case provides proper validation.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed 8e27a1a in 2 minutes and 3 seconds. Click for details.
  • Reviewed 81 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/guardrail_medical_chat_example.py:89
  • Draft comment:
    Evaluator slug changed from 'medical-advice-given' to 'medicaladvicegiven'. Confirm if the new slug is intentional and consistent.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. packages/sample-app/sample_app/guardrail_medical_chat_example.py:89
  • Draft comment:
    The evaluator argument value "medicaladvicegiven" has been changed from "medical-advice-given". Please confirm if this change is intentional or a typo.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is asking the author to "confirm if this change is intentional or a typo." This directly violates the rule: "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended." The comment doesn't identify a definite bug - it's just questioning whether the change was intentional. The evaluator string could have been changed intentionally (perhaps the backend system changed its naming convention from kebab-case to camelCase). Without strong evidence that this is wrong, I should delete this comment. Could this actually be a typo that would break functionality? If the evaluator name is incorrect, it might fail at runtime. However, I don't have evidence that "medicaladvicegiven" is wrong - it could be the correct new name for the evaluator. While it's possible this could break functionality, the comment is explicitly asking for confirmation rather than stating there's a problem. If this were truly a bug, the comment should state definitively that the evaluator name is incorrect. Without that evidence, this is just asking the author to double-check, which violates the rules. This comment should be deleted because it asks the PR author to confirm their intention rather than identifying a definite issue. The change could be intentional (e.g., a naming convention update), and without strong evidence that it's wrong, the comment is not actionable.
3. packages/sample-app/sample_app/guardrail_medical_chat_example.py:91
  • Draft comment:
    Typo detected: "GPT-4o" in the docstring may be a mistake. Should it be "GPT-4"?
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_2OzR14oHf6FBpBPU

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (3)

13-17: Defer API key validation so importing the module doesn’t raise unexpectedly

Lines 13–17 read OPENAI_API_KEY and raise ValueError at import time, which will blow up simple imports of this example (e.g. from tests or docs) when the env var isn’t set. Consider moving the env-var check and AsyncOpenAI client initialization into main() or a small helper used by the async functions so module import stays side‑effect‑light.


88-106: DRY up the duplicated medical system prompt

Both get_doctor_response_with_pii_check and get_doctor_response embed the same system_prompt string. Consider extracting this to a module‑level constant (e.g., MEDICAL_SYSTEM_PROMPT) and reusing it in both functions so prompt changes only need to be made in one place.


122-150: Consider narrowing the broad except Exception in the chat loop

Catching a blanket Exception on lines 148–150 trips Ruff’s BLE001 and can hide real programming errors (e.g., bugs in prompt construction or unexpected attribute errors). For a sample it’s workable, but you may want to:

  • Catch only expected runtime issues (e.g., OpenAI/Traceloop/network‑related errors, ValueError from input handling), and
  • Let truly unexpected exceptions surface, or log them and re‑raise, or explicitly silence the Ruff rule with a comment if you intend to keep it broad.

This will make failures during development easier to notice and keep the linter happy if BLE001 is enforced.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ab768fb and 8e27a1a.

📒 Files selected for processing (1)
  • packages/sample-app/sample_app/guardrail_medical_chat_example.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/sample-app/sample_app/guardrail_medical_chat_example.py
🧬 Code graph analysis (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (2)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
  • guardrail (17-168)
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (1)
  • pii_detector (22-38)
🪛 Flake8 (7.3.0)
packages/sample-app/sample_app/guardrail_medical_chat_example.py

[error] 54-54: local variable 'personal_info_system_prompt' is assigned to but never used

(F841)

🪛 Ruff (0.14.8)
packages/sample-app/sample_app/guardrail_medical_chat_example.py

15-15: Avoid specifying long messages outside the exception class

(TRY003)


54-54: Local variable personal_info_system_prompt is assigned to but never used

Remove assignment to unused variable personal_info_system_prompt

(F841)


61-61: String contains ambiguous (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?

(RUF001)


148-148: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)
🔇 Additional comments (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (1)

21-38: Callback return shape is consistent with guardrail usage and callers

handle_medical_evaluation now always returns a dict and provides a "text" key on the failure path, which matches the guardrail decorator’s expectation (dict output) and the later doctor_response.get("text", ...) usage in medical_chat_session(). This resolves the earlier set/dict mismatch and looks good.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 7799fd1 in 1 minute and 51 seconds. Click for details.
  • Reviewed 15 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/guardrail_medical_chat_example.py:53
  • Draft comment:
    The variable 'personal_info_system_prompt' is annotated with '# noqa: F841' to suppress unused warnings. Consider adding a comment on its intended future use or removing it if not needed.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 35% vs. threshold = 50% The comment is asking the author to either add a comment explaining future use or remove the variable. This is a reasonable code quality suggestion since having unused variables with noqa suppressions is generally a code smell. However, I need to check if this violates any rules. The comment is about a change (the noqa was added in the diff), it's actionable (remove the variable or add a comment), and it's not speculative. But is it obvious? The fact that a noqa was added suggests the author is aware the variable is unused and chose to keep it. The comment is essentially asking "why keep this unused variable?" which could be seen as asking the author to explain their intention, which violates the rules. The comment could be seen as asking the author to explain their intention ("Consider adding a comment on its intended future use"), which violates the rule about not asking authors to confirm intentions or explain things. Additionally, the author explicitly added the noqa, suggesting they intentionally want to keep the unused variable, so this comment might be questioning an intentional decision. While the comment does suggest explaining future use, it also provides a clear alternative action: "removing it if not needed." This makes it more actionable than just asking for explanation. However, the phrasing "Consider adding a comment" is still somewhat soft and not a clear code change requirement. The strongest interpretation is that this is a code quality issue where an unused variable should be removed. The comment is borderline. It's about a real code quality issue (unused variable with suppression), but the phrasing asks the author to either explain or remove it, which could be seen as asking for confirmation of intent. Given the rule to err on the side of deleting comments when unsure, and the fact that the author explicitly added the noqa (suggesting intentional behavior), this comment should be deleted.

Workflow ID: wflow_L9VFQJJwSCiyPXFo

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (2)

53-70: Remove unused variable instead of suppressing the warning.

The personal_info_system_prompt variable is assigned but never used in the function. While the # noqa: F841 comment suppresses the Flake8 warning, the coding guidelines require adhering to Flake8 rules rather than suppressing them. Since only system_prompt is passed to the API call (line 75), this entire block should be removed to maintain code cleanliness.

-    # This is the system prompt for the personal information case
-    personal_info_system_prompt = """You are a medical AI assistant that provides helpful, general medical information  # noqa: F841
-        tailored to the individual user.
-
-        When personal information is available (such as age, sex, symptoms, medical history,
-        lifestyle, medications, or concerns), actively incorporate it into your responses
-        to make guidance more relevant and personalized.
-
-        Adapt explanations, examples, and recommendations to the user's context whenever possible.
-        If key personal details are missing, ask concise and relevant follow-up questions
-        before giving advice.
-
-        Be clear about your limitations as an AI and do not provide diagnoses or definitive
-        treatment plans. Always encourage consultation with qualified healthcare professionals
-        for diagnosis, treatment, or urgent concerns.
-
-        Maintain a professional, empathetic, and supportive tone.
-        Avoid assumptions, respect privacy, and clearly distinguish general information
-        from personalized considerations."""
-

As per coding guidelines: "Use Flake8 for code linting and adhere to its rules."


60-60: Replace ambiguous quote character with standard apostrophe.

Line 60 contains a RIGHT SINGLE QUOTATION MARK (') instead of a standard apostrophe ('). This can cause encoding issues and may confuse code editors or linters.

-        If key personal details are missing, ask concise and relevant follow-up questions
+        If key personal details are missing, ask concise and relevant follow-up questions
🧹 Nitpick comments (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (1)

144-149: Consider catching more specific exceptions.

The broad except Exception can mask unexpected errors. For better error handling and debugging, consider catching specific exceptions that are expected in this context.

-        except Exception as e:
-            print(f"\n❌ An error occurred: {e}")
-            print("Please try again or type 'quit' to exit.")
+        except (openai.OpenAIError, ValueError, KeyError) as e:
+            print(f"\n❌ An error occurred: {e}")
+            print("Please try again or type 'quit' to exit.")
+        except Exception as e:
+            print(f"\n❌ Unexpected error: {e}")
+            print("Please try again or type 'quit' to exit.")

This allows you to handle expected API errors gracefully while still catching unexpected issues for debugging.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8e27a1a and 7799fd1.

📒 Files selected for processing (1)
  • packages/sample-app/sample_app/guardrail_medical_chat_example.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/sample-app/sample_app/guardrail_medical_chat_example.py
🧬 Code graph analysis (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (3)
  • Traceloop (36-267)
  • init (48-198)
  • get (251-267)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
  • guardrail (17-168)
packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (2)
  • EvaluatorMadeByTraceloop (5-729)
  • pii_detector (22-38)
🪛 Ruff (0.14.8)
packages/sample-app/sample_app/guardrail_medical_chat_example.py

15-15: Avoid specifying long messages outside the exception class

(TRY003)


53-53: Local variable personal_info_system_prompt is assigned to but never used

Remove assignment to unused variable personal_info_system_prompt

(F841)


60-60: String contains ambiguous (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?

(RUF001)


147-147: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Lint
🔇 Additional comments (2)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (2)

21-38: LGTM! Callback properly returns dict structure.

The callback correctly returns a dict with the expected structure on both success and failure paths. The previous issue with returning a set has been properly addressed.


88-110: LGTM! Function correctly implements guardrail pattern.

The function properly uses the guardrail decorator with a string evaluator slug and returns the expected dict structure with a "text" field.

@nina-kollman nina-kollman merged commit b7b4dc7 into main Dec 15, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants