-
Notifications
You must be signed in to change notification settings - Fork 614
[PERFORMANCE][PLUGIN]: Optimize Cedar plugin - Replace synchronous requests with async #2082
Description
Some performance bottlenecks caused by synchronous calls in the Cedar plugin:
1. Blocking Policy Evaluation (is_authorized)
The call to is_authorized inside _evaluate_policy is synchronous. Although cedarpy wraps Rust code, standard Python bindings often do not release the Global Interpreter Lock (GIL) or, even if they do, the operation runs on the main thread, blocking the event loop until it returns.
Impact: Every policy check pauses the entire gateway, preventing it from handling other concurrent requests (e.g., streaming tokens for other users).
Fix: Offload this CPU-bound task to a thread pool using asyncio.to_thread (Python 3.9+).
# Change _evaluate_policy to be async or call it using to_thread
async def _evaluate_policy_async(self, request: dict, policy_expr: str) -> str:
# Offload the blocking Rust call to a separate thread
result: AuthzResult = await asyncio.to_thread(
is_authorized, request, policy_expr, []
)
return "Allow" if result.decision == Decision.Allow else "Deny"2. Redundant & Blocking Policy Parsing
Code converts the policy from YAML/DSL to Cedar text inside every hook (prompt_pre_fetch, tool_pre_invoke, etc.).
Line 268-278 (and others): self._yamlpolicy2text and self._dsl2cedar are called on every request.
Impact: String manipulation and regex parsing are CPU-intensive. Doing this per-request is computationally expensive and blocks the loop.
Fix: Parse the policy once during init and store the resulting Cedar text string. Only re-parse if the configuration actually changes (unlikely in a plugin context).
# In __init__
self.cached_policy_text = None
if self.cedar_config.policy:
if self.cedar_config.policy_lang == "cedar":
self.cached_policy_text = self._yamlpolicy2text(self.cedar_config.policy)
elif self.cedar_config.policy_lang == "custom_dsl":
self.cached_policy_text = self._dsl2cedar(self.cedar_config.policy)
# Then in hooks, simply use self.cached_policy_text3. Synchronous Regex Redaction
The _redact_output method uses re.sub (Line 233). For large LLM outputs (e.g., tool_post_invoke payloads), regex operations on the main thread can cause noticeable latency spikes.
Fix: In case of large payloads, offload this to a thread as well.
class CedarPolicyPlugin(Plugin):
def __init__(self, config: PluginConfig):
super().__init__(config)
self.cedar_config = CedarConfig.model_validate(self._config.config)
self.jwt_info = {}
# OPTIMIZATION 1: Pre-compute policy text at startup
self.policy_text = ""
if self.cedar_config.policy:
if self.cedar_config.policy_lang == "cedar":
self.policy_text = self._yamlpolicy2text(self.cedar_config.policy)
elif self.cedar_config.policy_lang == "custom_dsl":
self.policy_text = self._dsl2cedar(self.cedar_config.policy)
logger.info(f"CedarPolicyPlugin initialised with configuration {self.cedar_config}")
# OPTIMIZATION 2: Async wrapper for the blocking library call
async def _evaluate_policy(self, request: dict, policy_expr: str) -> str:
"""Async wrapper for blocking is_authorized call."""
def blocking_check():
result = is_authorized(request, policy_expr, [])
return "Allow" if result.decision == Decision.Allow else "Deny"
return await asyncio.to_thread(blocking_check)
async def prompt_pre_fetch(self, payload: PromptPrehookPayload, context: PluginContext) -> PromptPrehookResult:
# ... setup code ...
# Use cached policy text instead of re-parsing
if not self.policy_text:
# handle error
pass
if self.cedar_config.policy_output_keywords:
# ... prepare requests ...
if view_full and self.policy_text:
request = self._preprocess_request(user, view_full, payload.prompt_id, hook_type)
# Await the new async evaluator
result_full = await self._evaluate_policy(request, self.policy_text)
# ... repeat for view_redacted ...