Skip to content

[PERFORMANCE][PLUGIN]: Optimize Cedar plugin - Replace synchronous requests with async #2082

@monshri

Description

@monshri

Some performance bottlenecks caused by synchronous calls in the Cedar plugin:

1. Blocking Policy Evaluation (is_authorized)

The call to is_authorized inside _evaluate_policy is synchronous. Although cedarpy wraps Rust code, standard Python bindings often do not release the Global Interpreter Lock (GIL) or, even if they do, the operation runs on the main thread, blocking the event loop until it returns.

Impact: Every policy check pauses the entire gateway, preventing it from handling other concurrent requests (e.g., streaming tokens for other users).

Fix: Offload this CPU-bound task to a thread pool using asyncio.to_thread (Python 3.9+).

# Change _evaluate_policy to be async or call it using to_thread
async def _evaluate_policy_async(self, request: dict, policy_expr: str) -> str:
    # Offload the blocking Rust call to a separate thread
    result: AuthzResult = await asyncio.to_thread(
        is_authorized, request, policy_expr, []
    )
    return "Allow" if result.decision == Decision.Allow else "Deny"

2. Redundant & Blocking Policy Parsing

Code converts the policy from YAML/DSL to Cedar text inside every hook (prompt_pre_fetch, tool_pre_invoke, etc.).

Line 268-278 (and others): self._yamlpolicy2text and self._dsl2cedar are called on every request.

Impact: String manipulation and regex parsing are CPU-intensive. Doing this per-request is computationally expensive and blocks the loop.

Fix: Parse the policy once during init and store the resulting Cedar text string. Only re-parse if the configuration actually changes (unlikely in a plugin context).

# In __init__
self.cached_policy_text = None
if self.cedar_config.policy:
    if self.cedar_config.policy_lang == "cedar":
        self.cached_policy_text = self._yamlpolicy2text(self.cedar_config.policy)
    elif self.cedar_config.policy_lang == "custom_dsl":
        self.cached_policy_text = self._dsl2cedar(self.cedar_config.policy)

# Then in hooks, simply use self.cached_policy_text

3. Synchronous Regex Redaction

The _redact_output method uses re.sub (Line 233). For large LLM outputs (e.g., tool_post_invoke payloads), regex operations on the main thread can cause noticeable latency spikes.

Fix: In case of large payloads, offload this to a thread as well.

class CedarPolicyPlugin(Plugin):
    def __init__(self, config: PluginConfig):
        super().__init__(config)
        self.cedar_config = CedarConfig.model_validate(self._config.config)
        self.jwt_info = {}
        
        # OPTIMIZATION 1: Pre-compute policy text at startup
        self.policy_text = ""
        if self.cedar_config.policy:
            if self.cedar_config.policy_lang == "cedar":
                self.policy_text = self._yamlpolicy2text(self.cedar_config.policy)
            elif self.cedar_config.policy_lang == "custom_dsl":
                self.policy_text = self._dsl2cedar(self.cedar_config.policy)
        
        logger.info(f"CedarPolicyPlugin initialised with configuration {self.cedar_config}")

    # OPTIMIZATION 2: Async wrapper for the blocking library call
    async def _evaluate_policy(self, request: dict, policy_expr: str) -> str:
        """Async wrapper for blocking is_authorized call."""
        def blocking_check():
            result = is_authorized(request, policy_expr, [])
            return "Allow" if result.decision == Decision.Allow else "Deny"
            
        return await asyncio.to_thread(blocking_check)

    async def prompt_pre_fetch(self, payload: PromptPrehookPayload, context: PluginContext) -> PromptPrehookResult:
        # ... setup code ...
        
        # Use cached policy text instead of re-parsing
        if not self.policy_text:
             # handle error
             pass

        if self.cedar_config.policy_output_keywords:
            # ... prepare requests ...
            if view_full and self.policy_text:
                request = self._preprocess_request(user, view_full, payload.prompt_id, hook_type)
                # Await the new async evaluator
                result_full = await self._evaluate_policy(request, self.policy_text)
            
            # ... repeat for view_redacted ...

Metadata

Metadata

Assignees

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseperformancePerformance related itemspluginspythonPython / backend development (FastAPI)

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions