-
Notifications
You must be signed in to change notification settings - Fork 868
feat(guardrail): Add guardrail decorator #3521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a Guardrails SDK with Pydantic types and a decorator, integrates it into the Traceloop client, makes evaluator calls wait for SSE results, and provides sample guardrail examples (medical chat and PII-aware travel agent) plus travel-agent streaming response aggregation. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant GuardedFn as Decorated Function
participant GuardrailWrapper as Guardrail Decorator
participant Client as Traceloop Client
participant GuardrailsSvc as Guardrails.execute_evaluator
participant EvaluatorSvc as Evaluator (SSE)
participant Callback as on_evaluation_complete
User->>GuardedFn: call(input)
GuardedFn->>GuardrailWrapper: wrapper runs original (sync/async)
GuardrailWrapper->>GuardedFn: execute function -> result dict
GuardrailWrapper->>GuardrailsSvc: execute_evaluator(slug, mapped_input)
GuardrailsSvc->>Client: send evaluator request (HTTP) -> execution_id, stream_url
GuardrailsSvc->>EvaluatorSvc: wait for SSE stream using execution_id/stream_url
EvaluatorSvc-->>GuardrailsSvc: SSE result (OutputSchema)
alt callback provided
GuardrailsSvc->>Callback: on_evaluation_complete(output, original)
Callback-->>GuardrailWrapper: modified or original result
end
GuardrailWrapper-->>User: final result (possibly privacy warning)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 9afe4b4 in 2 minutes and 17 seconds. Click for details.
- Reviewed
709lines of code in8files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/agents/travel_agent_example.py:908
- Draft comment:
Using time.sleep in an async function can block the event loop. Consider using 'await asyncio.sleep(args.delay)' instead. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py:119
- Draft comment:
The synchronous wrapper uses asyncio.get_event_loop() and run_until_complete, which can raise an error if called within an already running event loop. Consider documenting this limitation or refactoring the sync wrapper. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is technically correct - usingget_event_loop()andrun_until_complete()can fail if called within an already running event loop. However, the comment suggests two options: "documenting this limitation" or "refactoring". Suggesting documentation is not actionable as a code change. The "refactoring" part is vague and doesn't provide a clear solution. According to the rules, comments should be about clear code changes required, not speculative issues or documentation requests. The comment also says "Consider documenting" which is a soft suggestion, not a clear requirement. This violates the rule about not making comments unless there is clearly a code change required. While the technical issue is real, the comment might be pointing to a legitimate bug that should be fixed. If this code will fail in common scenarios (like being called from an async context), that's a real problem. However, the comment doesn't definitively say this WILL fail, just that it CAN fail, making it somewhat speculative. The comment is speculative ("can raise an error") rather than definitive. It also suggests documentation as an option, which is not an actionable code change. The comment doesn't provide a clear, specific fix. According to the rules, speculative comments like "If X, then Y is an issue" should be removed - this fits that pattern since it says it "can" raise an error, not that it "will" raise an error. This comment should be deleted because it's speculative ("can raise an error" rather than "will raise an error"), suggests documentation rather than a clear code fix, and doesn't provide an actionable solution. It violates the rule against speculative comments and comments that don't clearly require a code change.
Workflow ID: wflow_b2UL01oUBQnJZmxx
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
| if not evaluator_result.success: | ||
| # Return a modified dict with error message | ||
| print(f"handle_medical_evaluation was activated - evaluator_result: {evaluator_result}") | ||
| return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return statement returns a set literal instead of a dict. Likely intended to return {'text': 'There is an issue with the request. Please try again.'}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (9)
packages/sample-app/sample_app/agents/travel_agent_example.py (1)
704-736: Consider usingtyping.overloadfor better type inference.The conditional return type (either
strorlist) works but reduces type safety for callers. For a sample app this is acceptable, but for production code, consider using@overloaddecorators or separate functions.from typing import overload, Literal @overload async def run_travel_query(query: str, return_response_text: Literal[True]) -> str: ... @overload async def run_travel_query(query: str, return_response_text: Literal[False] = ...) -> list: ... async def run_travel_query(query: str, return_response_text: bool = False) -> str | list: ...packages/traceloop-sdk/traceloop/sdk/guardrails/types.py (1)
5-9: Consider usingLiteralforsourceand adding validation for regex consistency.The
sourcefield appears to accept only"input"or"output"based on the comment. UsingLiteralimproves type safety. Additionally, whenuse_regex=True,regex_patternshould be validated as required.-from typing import Dict, Optional, Any +from typing import Dict, Optional, Any, Literal from pydantic import BaseModel +from pydantic import model_validator class InputExtractor(BaseModel): - source: str # "input" or "output" + source: Literal["input", "output"] key: Optional[str] = None # Key to extract from use_regex: bool = False # Whether to use regex pattern regex_pattern: Optional[str] = None # Regex pattern to apply + + @model_validator(mode='after') + def validate_regex_pattern(self): + if self.use_regex and not self.regex_pattern: + raise ValueError("regex_pattern is required when use_regex is True") + return selfpackages/traceloop-sdk/traceloop/sdk/guardrails/__init__.py (1)
4-11: Consider sorting__all__alphabetically.Static analysis (Ruff RUF022) suggests sorting
__all__for consistency. This is a minor style improvement.__all__ = [ + "ExecuteEvaluatorRequest", + "EvaluatorSpec", + "Guardrails", + "InputExtractor", + "OutputSchema", "guardrail", - "EvaluatorSpec", - "Guardrails", - "InputExtractor", - "OutputSchema", - "ExecuteEvaluatorRequest", ]packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py (2)
13-14: Remove unusednoqadirectives.Ruff indicates the
# noqa: E402comments are not needed (E402 is not enabled). These can be safely removed.-from traceloop.sdk.guardrails.guardrails import guardrail # noqa: E402 -from traceloop.sdk.evaluator import EvaluatorMadeByTraceloop # noqa: E402 +from traceloop.sdk.guardrails.guardrails import guardrail +from traceloop.sdk.evaluator import EvaluatorMadeByTraceloop
84-100: Consider movingimport ioto the top of the file.The
iomodule import inside the function is unconventional. Moving it to the top of the file with other imports improves readability.+import io import asyncio import sys from pathlib import PathAnd remove the inner import:
import io - # Capture stdout to prevent streaming output before guardrail checkpackages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (3)
1-7: Unused import:httpxis imported but never used directly.The
httpxmodule is imported at line 5 but the type hint forhttpx.AsyncClientin theGuardrailsclass doesn't require the import at runtime. Consider removing it or adding a type-checking guard.from typing import Dict, Any, Optional, Callable, TypeVar, ParamSpec, Union from traceloop.sdk.evaluator.config import EvaluatorDetails from traceloop.sdk.evaluator.evaluator import Evaluator from .types import InputExtractor, OutputSchema -import httpx +from typing import TYPE_CHECKING import asyncio from functools import wraps + +if TYPE_CHECKING: + import httpxThen update line 149 to use a string annotation:
async_http_client: "httpx.AsyncClient".
45-46: UseTypeErrorfor invalid type checks.When the
evaluatorargument is neither astrnorEvaluatorDetails, aTypeErroris more semantically appropriate thanValueError.else: - raise ValueError(f"evaluator must be str or EvaluatorDetails, got {type(evaluator)}") + raise TypeError(f"evaluator must be str or EvaluatorDetails, got {type(evaluator).__name__}")
66-68: Useloggingmodule instead ofprint()for production SDK code.Using
print()for warnings and errors in SDK code is not appropriate. Users cannot control log levels or redirect output. Consider using Python'sloggingmodule.+import logging + +logger = logging.getLogger(__name__) # In async_wrapper and sync_wrapper: - print(f"Warning: Could not get Traceloop client: {e}") + logger.warning("Could not get Traceloop client: %s", e) # In execute_evaluator: - print(f"Error executing evaluator {slug}. Error: {str(e)}") + logger.error("Error executing evaluator %s: %s", slug, e)Also applies to: 115-117, 199-202
packages/sample-app/sample_app/guardrail_medical_chat_example.py (1)
20-44: Consider extracting the common LLM call logic to reduce duplication.Three functions (
get_doctor_response_simple,get_doctor_response_with_pii_check,get_doctor_response) share identical implementation logic—same system prompt, same API call, same return structure. For a sample app this is acceptable, but extracting the core logic could improve maintainability.async def _call_doctor_llm(patient_message: str) -> dict: """Core LLM call logic for doctor responses.""" system_prompt = """You are a medical AI assistant...""" response = await client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": patient_message} ], max_tokens=500, temperature=0 ) return {"text": response.choices[0].message.content} @guardrail(evaluator="medical-advice-given") async def get_doctor_response_simple(patient_message: str) -> dict: return await _call_doctor_llm(patient_message)Also applies to: 72-92, 95-118
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
packages/sample-app/sample_app/agents/travel_agent_example.py(3 hunks)packages/sample-app/sample_app/guardrail_medical_chat_example.py(1 hunks)packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/client/client.py(3 hunks)packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/guardrails/__init__.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/guardrails/types.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/guardrail_travel_agent_wrapper.pypackages/traceloop-sdk/traceloop/sdk/guardrails/types.pypackages/traceloop-sdk/traceloop/sdk/client/client.pypackages/sample-app/sample_app/guardrail_medical_chat_example.pypackages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.pypackages/sample-app/sample_app/agents/travel_agent_example.pypackages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.pypackages/traceloop-sdk/traceloop/sdk/guardrails/__init__.py
🧬 Code graph analysis (4)
packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py (2)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
guardrail(17-140)packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (2)
EvaluatorMadeByTraceloop(5-729)pii_detector(22-38)
packages/traceloop-sdk/traceloop/sdk/client/client.py (1)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
Guardrails(143-202)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (3)
Traceloop(36-267)init(48-198)get(251-267)packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
guardrail(17-140)packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (2)
EvaluatorMadeByTraceloop(5-729)sexism_detector(340-357)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (4)
packages/traceloop-sdk/traceloop/sdk/evaluator/config.py (1)
EvaluatorDetails(5-23)packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (2)
Evaluator(16-148)run_experiment_evaluator(70-112)packages/traceloop-sdk/traceloop/sdk/guardrails/types.py (2)
InputExtractor(5-9)OutputSchema(18-20)packages/traceloop-sdk/traceloop/sdk/__init__.py (1)
Traceloop(36-267)
🪛 Ruff (0.14.8)
packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py
13-13: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
14-14: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
158-158: Do not catch blind exception: Exception
(BLE001)
packages/sample-app/sample_app/guardrail_medical_chat_example.py
15-15: Avoid specifying long messages outside the exception class
(TRY003)
155-155: Do not catch blind exception: Exception
(BLE001)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py
46-46: Prefer TypeError exception for invalid type
(TRY004)
46-46: Avoid specifying long messages outside the exception class
(TRY003)
58-61: Prefer TypeError exception for invalid type
(TRY004)
58-61: Avoid specifying long messages outside the exception class
(TRY003)
66-66: Do not catch blind exception: Exception
(BLE001)
90-93: Prefer TypeError exception for invalid type
(TRY004)
90-93: Avoid specifying long messages outside the exception class
(TRY003)
101-104: Avoid specifying long messages outside the exception class
(TRY003)
115-115: Do not catch blind exception: Exception
(BLE001)
199-199: Do not catch blind exception: Exception
(BLE001)
200-200: Use explicit conversion flag
Replace with conversion flag
(RUF010)
202-202: Use explicit conversion flag
Replace with conversion flag
(RUF010)
packages/traceloop-sdk/traceloop/sdk/guardrails/__init__.py
4-11: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
🔇 Additional comments (8)
packages/traceloop-sdk/traceloop/sdk/client/client.py (1)
10-10: LGTM!The
Guardrailsintegration follows the established pattern for other client components (user_feedback,datasets,experiment). The import, type annotation, and initialization are consistent with the existing architecture.Also applies to: 29-29, 70-70
packages/sample-app/sample_app/agents/travel_agent_example.py (1)
653-701: LGTM!The streaming handler correctly accumulates response text alongside tool calls. The tuple return enables the caller to access both.
packages/traceloop-sdk/traceloop/sdk/guardrails/types.py (1)
12-20: LGTM!
ExecuteEvaluatorRequestandOutputSchemaare well-defined with appropriate field types and defaults.packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py (1)
104-112: Verify the timeout behavior is intentional.The
timeout_in_secparameter is applied independently to both the initial POST request and the SSEwait_for_result. In the worst case, the total wait time could be up to2 * timeout_in_sec. If you intend fortimeout_in_secto be the total deadline, consider tracking elapsed time and passing the remaining time to the SSE client.packages/sample-app/sample_app/guardrail_travel_agent_wrapper.py (2)
26-61: LGTM!The PII detection handler correctly differentiates between success/failure cases, avoids mutating the original result by creating a copy, and properly handles the captured stdout.
103-162: LGTM!The interactive main loop is well-structured with proper error handling. The broad
Exceptioncatch (flagged by Ruff BLE001) is appropriate for an interactive CLI demo to ensure graceful error recovery without crashing the session.packages/sample-app/sample_app/guardrail_medical_chat_example.py (2)
9-17: LGTM on API key handling.The API key is correctly retrieved from environment variables and validated before use, adhering to the coding guideline of storing API keys only in environment variables.
121-158: LGTM for sample app interactive session.The interactive chat loop handles user input gracefully with appropriate exit conditions and error recovery. The broad exception catch on line 155 is acceptable for an interactive demo application.
packages/sample-app/sample_app/guardrail_medical_chat_example.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed ab768fb in 1 minute and 36 seconds. Click for details.
- Reviewed
196lines of code in2files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/guardrail_medical_chat_example.py:61
- Draft comment:
Return value changed to a dict with a 'text' key. This fixes the previous bug where an invalid dict literal was returned. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, explaining a change that was made. It doesn't provide a suggestion, ask for confirmation, or point out a potential issue. It simply states what was done and why, which violates the rule against purely informative comments.
2. packages/sample-app/sample_app/guardrail_medical_chat_example.py:68
- Draft comment:
Evaluator changed from sexism_detector to pii_detector. Confirm that this switch aligns with your evaluation logic and that the callback handles the updated evaluator result appropriately. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py:79
- Draft comment:
Converting all values from the original result to strings for evaluator_data. Ensure that this conversion (using str()) is appropriate for all expected data types. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the author to ensure that converting values to strings is appropriate, which is a form of asking for confirmation or double-checking. This violates the rule against asking the author to confirm their intention or ensure behavior is intended.
4. packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py:125
- Draft comment:
Removed usage of InputExtractor in the sync_wrapper and now directly converting values to strings. Confirm that this change meets the requirements for data conversion in the evaluator. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to confirm that the change meets certain requirements, which violates the rule against asking for confirmation or ensuring behavior is intended. It does not provide a specific suggestion or point out a potential issue with the code itself.
5. packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py:145
- Draft comment:
Using loop.run_until_complete in sync_wrapper might cause issues if an event loop is already running. Consider documenting or handling such edge cases. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_Tw4oSAOA7R9gy9zr
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
145-150: Deprecated event loop pattern may fail in async contexts.
asyncio.get_event_loop()is deprecated in Python 3.10+ when no running loop exists, andloop.run_until_complete()will raiseRuntimeErrorif called from within an already-running event loop (e.g., Jupyter notebooks, async web frameworks).Consider using a more robust approach:
- loop = asyncio.get_event_loop() - evaluator_result = loop.run_until_complete( - client_instance.guardrails.execute_evaluator( - slug, evaluator_data, evaluator_version, evaluator_config - ) - ) + try: + loop = asyncio.get_running_loop() + # Already in async context - this is problematic + raise RuntimeError( + "Cannot use sync guardrail wrapper within an async context. " + "Use an async function instead." + ) + except RuntimeError: + # No running loop, safe to use asyncio.run() + evaluator_result = asyncio.run( + client_instance.guardrails.execute_evaluator( + slug, evaluator_data, evaluator_version, evaluator_config + ) + )Alternatively, document this limitation clearly in the docstring.
🧹 Nitpick comments (6)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (1)
95-118: Consider extracting shared logic to reduce duplication.The functions
get_doctor_response_simple,get_doctor_response_with_pii_check, andget_doctor_responseshare identical implementation logic (system prompt, OpenAI call, return structure). Consider extracting the common logic:async def _call_medical_ai(patient_message: str) -> dict: """Internal helper for medical AI calls.""" system_prompt = """You are a medical AI assistant...""" response = await client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": patient_message} ], max_tokens=500, temperature=0 ) return {"text": response.choices[0].message.content} @guardrail(evaluator="medical-advice-given") async def get_doctor_response(patient_message: str) -> dict: return await _call_medical_ai(patient_message)For a sample app, some duplication is acceptable if it improves readability for demonstration purposes.
packages/sample-app/sample_app/guardrail_travel_agent_example.py (2)
9-14: Remove unused# noqadirectives.The
# noqa: E402comments on lines 13-14 are unnecessary since these imports don't actually violate E402 (they're after path manipulation which is intentional). Static analysis confirms these are unused.-from traceloop.sdk.guardrails.guardrails import guardrail # noqa: E402 -from traceloop.sdk.evaluator import EvaluatorMadeByTraceloop # noqa: E402 +from traceloop.sdk.guardrails.guardrails import guardrail +from traceloop.sdk.evaluator import EvaluatorMadeByTraceloop
84-100: Consider movingioimport to module level.The
import ioon line 84 could be moved to the top of the file with other imports for consistency. However, keeping it local is acceptable if the intent is to minimize imports when the function isn't called.The stdout capture/restore pattern with
try/finallyis correctly implemented to ensure cleanup even if exceptions occur.packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (3)
51-52: Consider usingTypeErrorfor type validation.Per Python conventions,
TypeErroris preferred overValueErrorwhen the issue is an incorrect type.else: - raise ValueError(f"evaluator must be str or EvaluatorDetails, got {type(evaluator)}") + raise TypeError(f"evaluator must be str or EvaluatorDetails, got {type(evaluator)}")
227-230: Graceful error handling with minor style suggestion.The broad exception handling with failure fallback is appropriate for guardrails—evaluation failures shouldn't crash the application. Consider using f-string conversion flags for cleaner formatting:
except Exception as e: - print(f"Error executing evaluator {slug}. Error: {str(e)}") + print(f"Error executing evaluator {slug}. Error: {e!s}") # Return a failure result on error - return OutputSchema(reason=f"Evaluation error: {str(e)}", success=False) + return OutputSchema(reason=f"Evaluation error: {e!s}", success=False)
17-20: Type annotation doesn't reflect sync function support.The return type
Callable[[Callable[P, Awaitable[Dict[str, Any]]]], Callable[P, Awaitable[Dict[str, Any]]]]indicates only async functions are supported, but the decorator handles both async and sync functions (lines 163-166). This could cause type checker warnings for sync usage.Consider using
@overloadorUnionto properly type both cases:from typing import overload @overload def guardrail( evaluator: EvaluatorSpec, on_evaluation_complete: Optional[Callable[[OutputSchema, Any], Any]] = None ) -> Callable[[Callable[P, Awaitable[Dict[str, Any]]]], Callable[P, Awaitable[Dict[str, Any]]]]: ... @overload def guardrail( evaluator: EvaluatorSpec, on_evaluation_complete: Optional[Callable[[OutputSchema, Any], Any]] = None ) -> Callable[[Callable[P, Dict[str, Any]]], Callable[P, Dict[str, Any]]]: ...For now, the
# type: ignorecomments handle this, but proper typing would improve developer experience.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
packages/sample-app/sample_app/guardrail_medical_chat_example.py(1 hunks)packages/sample-app/sample_app/guardrail_travel_agent_example.py(1 hunks)packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/guardrail_medical_chat_example.pypackages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.pypackages/sample-app/sample_app/guardrail_travel_agent_example.py
🧬 Code graph analysis (2)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (3)
Traceloop(36-267)init(48-198)get(251-267)packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
guardrail(17-168)packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (1)
pii_detector(22-38)
packages/sample-app/sample_app/guardrail_travel_agent_example.py (2)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
guardrail(17-168)packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (1)
pii_detector(22-38)
🪛 Ruff (0.14.8)
packages/sample-app/sample_app/guardrail_medical_chat_example.py
15-15: Avoid specifying long messages outside the exception class
(TRY003)
155-155: Do not catch blind exception: Exception
(BLE001)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py
52-52: Prefer TypeError exception for invalid type
(TRY004)
52-52: Avoid specifying long messages outside the exception class
(TRY003)
62-65: Prefer TypeError exception for invalid type
(TRY004)
62-65: Avoid specifying long messages outside the exception class
(TRY003)
75-78: Avoid specifying long messages outside the exception class
(TRY003)
88-88: Do not catch blind exception: Exception
(BLE001)
114-117: Prefer TypeError exception for invalid type
(TRY004)
114-117: Avoid specifying long messages outside the exception class
(TRY003)
127-130: Avoid specifying long messages outside the exception class
(TRY003)
141-141: Do not catch blind exception: Exception
(BLE001)
227-227: Do not catch blind exception: Exception
(BLE001)
228-228: Use explicit conversion flag
Replace with conversion flag
(RUF010)
230-230: Use explicit conversion flag
Replace with conversion flag
(RUF010)
packages/sample-app/sample_app/guardrail_travel_agent_example.py
13-13: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
14-14: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
158-158: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (10)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (5)
13-17: LGTM! Secure API key handling.The API key is correctly retrieved from an environment variable and validated before use, following the coding guidelines for secure secret handling.
20-43: LGTM!The function correctly demonstrates slug-based guardrail usage and returns a properly structured dict with the
textfield.
46-64: LGTM! Previous issue resolved.The callback now correctly returns a dict with the
textkey on evaluation failure, addressing the previous review concern.
67-92: LGTM! Previous issue resolved.The function name and docstring now correctly match the
pii_detectorevaluator being used.
155-157: Broad exception handling acceptable for sample CLI app.The broad
except Exceptionis appropriate here for user-facing error handling in an interactive CLI context. For production code, consider catching more specific exceptions.packages/sample-app/sample_app/guardrail_travel_agent_example.py (2)
26-61: LGTM!The callback properly handles both PII detection (returning sanitized warning) and safe pass-through cases. Good use of
.copy()to avoid mutating the original result, and proper cleanup of the internal_captured_stdoutfield.
155-162: Broad exception handling acceptable for sample CLI app.The broad
except Exceptionwith traceback printing is reasonable for a sample/demo application to aid debugging. The import oftracebackinside the except block is fine for this rarely-executed error path.packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (3)
55-104: LGTM! Async wrapper implementation is now consistent with sync wrapper.Both wrappers correctly build
evaluator_dataasDict[str, str]by converting field values to strings. The previous review concern about inconsistency between async and sync wrappers has been addressed.
180-186: LGTM! Type annotation is consistent with callers.The
data: Dict[str, str]parameter type correctly matches what the decorator wrappers pass. The previous review concern aboutInputExtractormismatch has been addressed.
213-225: Good dual-format result parsing.The result parsing correctly handles both "Made by Traceloop" evaluator format (
evaluator_result: pass/fail) and custom evaluator format. Usingmodel_validatefor the fallback case provides proper validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed 8e27a1a in 2 minutes and 3 seconds. Click for details.
- Reviewed
81lines of code in1files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/guardrail_medical_chat_example.py:89
- Draft comment:
Evaluator slug changed from 'medical-advice-given' to 'medicaladvicegiven'. Confirm if the new slug is intentional and consistent. - Reason this comment was not posted:
Comment looked like it was already resolved.
2. packages/sample-app/sample_app/guardrail_medical_chat_example.py:89
- Draft comment:
The evaluator argument value "medicaladvicegiven" has been changed from "medical-advice-given". Please confirm if this change is intentional or a typo. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is asking the author to "confirm if this change is intentional or a typo." This directly violates the rule: "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended." The comment doesn't identify a definite bug - it's just questioning whether the change was intentional. The evaluator string could have been changed intentionally (perhaps the backend system changed its naming convention from kebab-case to camelCase). Without strong evidence that this is wrong, I should delete this comment. Could this actually be a typo that would break functionality? If the evaluator name is incorrect, it might fail at runtime. However, I don't have evidence that "medicaladvicegiven" is wrong - it could be the correct new name for the evaluator. While it's possible this could break functionality, the comment is explicitly asking for confirmation rather than stating there's a problem. If this were truly a bug, the comment should state definitively that the evaluator name is incorrect. Without that evidence, this is just asking the author to double-check, which violates the rules. This comment should be deleted because it asks the PR author to confirm their intention rather than identifying a definite issue. The change could be intentional (e.g., a naming convention update), and without strong evidence that it's wrong, the comment is not actionable.
3. packages/sample-app/sample_app/guardrail_medical_chat_example.py:91
- Draft comment:
Typo detected: "GPT-4o" in the docstring may be a mistake. Should it be "GPT-4"? - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_2OzR14oHf6FBpBPU
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
packages/sample-app/sample_app/guardrail_medical_chat_example.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (3)
13-17: Defer API key validation so importing the module doesn’t raise unexpectedlyLines 13–17 read
OPENAI_API_KEYand raiseValueErrorat import time, which will blow up simple imports of this example (e.g. from tests or docs) when the env var isn’t set. Consider moving the env-var check andAsyncOpenAIclient initialization intomain()or a small helper used by the async functions so module import stays side‑effect‑light.
88-106: DRY up the duplicated medical system promptBoth
get_doctor_response_with_pii_checkandget_doctor_responseembed the samesystem_promptstring. Consider extracting this to a module‑level constant (e.g.,MEDICAL_SYSTEM_PROMPT) and reusing it in both functions so prompt changes only need to be made in one place.
122-150: Consider narrowing the broadexcept Exceptionin the chat loopCatching a blanket
Exceptionon lines 148–150 trips Ruff’sBLE001and can hide real programming errors (e.g., bugs in prompt construction or unexpected attribute errors). For a sample it’s workable, but you may want to:
- Catch only expected runtime issues (e.g., OpenAI/Traceloop/network‑related errors,
ValueErrorfrom input handling), and- Let truly unexpected exceptions surface, or log them and re‑raise, or explicitly silence the Ruff rule with a comment if you intend to keep it broad.
This will make failures during development easier to notice and keep the linter happy if
BLE001is enforced.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/guardrail_medical_chat_example.py
🧬 Code graph analysis (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (2)
packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
guardrail(17-168)packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (1)
pii_detector(22-38)
🪛 Flake8 (7.3.0)
packages/sample-app/sample_app/guardrail_medical_chat_example.py
[error] 54-54: local variable 'personal_info_system_prompt' is assigned to but never used
(F841)
🪛 Ruff (0.14.8)
packages/sample-app/sample_app/guardrail_medical_chat_example.py
15-15: Avoid specifying long messages outside the exception class
(TRY003)
54-54: Local variable personal_info_system_prompt is assigned to but never used
Remove assignment to unused variable personal_info_system_prompt
(F841)
61-61: String contains ambiguous ’ (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
148-148: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (1)
21-38: Callback return shape is consistent with guardrail usage and callers
handle_medical_evaluationnow always returns adictand provides a"text"key on the failure path, which matches the guardrail decorator’s expectation (dict output) and the laterdoctor_response.get("text", ...)usage inmedical_chat_session(). This resolves the earlier set/dict mismatch and looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 7799fd1 in 1 minute and 51 seconds. Click for details.
- Reviewed
15lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/guardrail_medical_chat_example.py:53
- Draft comment:
The variable 'personal_info_system_prompt' is annotated with '# noqa: F841' to suppress unused warnings. Consider adding a comment on its intended future use or removing it if not needed. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 35% vs. threshold = 50% The comment is asking the author to either add a comment explaining future use or remove the variable. This is a reasonable code quality suggestion since having unused variables with noqa suppressions is generally a code smell. However, I need to check if this violates any rules. The comment is about a change (the noqa was added in the diff), it's actionable (remove the variable or add a comment), and it's not speculative. But is it obvious? The fact that a noqa was added suggests the author is aware the variable is unused and chose to keep it. The comment is essentially asking "why keep this unused variable?" which could be seen as asking the author to explain their intention, which violates the rules. The comment could be seen as asking the author to explain their intention ("Consider adding a comment on its intended future use"), which violates the rule about not asking authors to confirm intentions or explain things. Additionally, the author explicitly added the noqa, suggesting they intentionally want to keep the unused variable, so this comment might be questioning an intentional decision. While the comment does suggest explaining future use, it also provides a clear alternative action: "removing it if not needed." This makes it more actionable than just asking for explanation. However, the phrasing "Consider adding a comment" is still somewhat soft and not a clear code change requirement. The strongest interpretation is that this is a code quality issue where an unused variable should be removed. The comment is borderline. It's about a real code quality issue (unused variable with suppression), but the phrasing asks the author to either explain or remove it, which could be seen as asking for confirmation of intent. Given the rule to err on the side of deleting comments when unsure, and the fact that the author explicitly added the noqa (suggesting intentional behavior), this comment should be deleted.
Workflow ID: wflow_L9VFQJJwSCiyPXFo
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (2)
53-70: Remove unused variable instead of suppressing the warning.The
personal_info_system_promptvariable is assigned but never used in the function. While the# noqa: F841comment suppresses the Flake8 warning, the coding guidelines require adhering to Flake8 rules rather than suppressing them. Since onlysystem_promptis passed to the API call (line 75), this entire block should be removed to maintain code cleanliness.- # This is the system prompt for the personal information case - personal_info_system_prompt = """You are a medical AI assistant that provides helpful, general medical information # noqa: F841 - tailored to the individual user. - - When personal information is available (such as age, sex, symptoms, medical history, - lifestyle, medications, or concerns), actively incorporate it into your responses - to make guidance more relevant and personalized. - - Adapt explanations, examples, and recommendations to the user's context whenever possible. - If key personal details are missing, ask concise and relevant follow-up questions - before giving advice. - - Be clear about your limitations as an AI and do not provide diagnoses or definitive - treatment plans. Always encourage consultation with qualified healthcare professionals - for diagnosis, treatment, or urgent concerns. - - Maintain a professional, empathetic, and supportive tone. - Avoid assumptions, respect privacy, and clearly distinguish general information - from personalized considerations.""" -As per coding guidelines: "Use Flake8 for code linting and adhere to its rules."
60-60: Replace ambiguous quote character with standard apostrophe.Line 60 contains a RIGHT SINGLE QUOTATION MARK (') instead of a standard apostrophe ('). This can cause encoding issues and may confuse code editors or linters.
- If key personal details are missing, ask concise and relevant follow-up questions + If key personal details are missing, ask concise and relevant follow-up questions
🧹 Nitpick comments (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (1)
144-149: Consider catching more specific exceptions.The broad
except Exceptioncan mask unexpected errors. For better error handling and debugging, consider catching specific exceptions that are expected in this context.- except Exception as e: - print(f"\n❌ An error occurred: {e}") - print("Please try again or type 'quit' to exit.") + except (openai.OpenAIError, ValueError, KeyError) as e: + print(f"\n❌ An error occurred: {e}") + print("Please try again or type 'quit' to exit.") + except Exception as e: + print(f"\n❌ Unexpected error: {e}") + print("Please try again or type 'quit' to exit.")This allows you to handle expected API errors gracefully while still catching unexpected issues for debugging.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/guardrail_medical_chat_example.py
🧬 Code graph analysis (1)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (3)
packages/traceloop-sdk/traceloop/sdk/__init__.py (3)
Traceloop(36-267)init(48-198)get(251-267)packages/traceloop-sdk/traceloop/sdk/guardrails/guardrails.py (1)
guardrail(17-168)packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py (2)
EvaluatorMadeByTraceloop(5-729)pii_detector(22-38)
🪛 Ruff (0.14.8)
packages/sample-app/sample_app/guardrail_medical_chat_example.py
15-15: Avoid specifying long messages outside the exception class
(TRY003)
53-53: Local variable personal_info_system_prompt is assigned to but never used
Remove assignment to unused variable personal_info_system_prompt
(F841)
60-60: String contains ambiguous ’ (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
147-147: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
🔇 Additional comments (2)
packages/sample-app/sample_app/guardrail_medical_chat_example.py (2)
21-38: LGTM! Callback properly returns dict structure.The callback correctly returns a dict with the expected structure on both success and failure paths. The previous issue with returning a set has been properly addressed.
88-110: LGTM! Function correctly implements guardrail pattern.The function properly uses the guardrail decorator with a string evaluator slug and returns the expected dict structure with a "text" field.
Fixes TLP-1413
feat(instrumentation): ...orfix(instrumentation): ....Important
Introduces guardrails framework for AI evaluation, adds medical and travel examples, and improves response handling and evaluator execution.
guardrail_medical_chat_example.pyfor interactive medical chat with guardrails and evaluation callbacks.guardrail_travel_agent_example.pyfor travel agent with PII-detection guardrail and privacy alerts.Guardrailsinclient.pyfor easier access.handle_runner_stream()intravel_agent_example.py.run_experiment_evaluator()inevaluator.pynow waits for and returns streaming evaluation results.This description was created by
for 7799fd1. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.