Skip to content

fix(api): resolve WebSocket 403 rejection (#549)#556

Merged
Aureliolo merged 7 commits intomainfrom
fix/websocket-403-bug
Mar 18, 2026
Merged

fix(api): resolve WebSocket 403 rejection (#549)#556
Aureliolo merged 7 commits intomainfrom
fix/websocket-403-bug

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Closes #549

  • Root cause: Litestar's DI misidentified channels_plugin: ChannelsPlugin in the WebSocket handler signature as a query parameter, causing a 4500 close before the handler ran. Uvicorn logged this as HTTP 403.
  • Fix: Resolve ChannelsPlugin from socket.app.plugins.get() inside the handler body instead of relying on DI parameter injection
  • Hardening: Restrict auth middleware scopes to HTTP-only (ScopeType.HTTP), exclude WS path from rate limiting, add opt={"exclude_from_auth": True} on the handler
  • Observability: Add API_AUTH_GUARD_SKIPPED event constant and DEBUG logging to guard bypass paths, ticket validation stages, and role checks
  • Tests: Strengthen close code assertions (4001 vs 4500), add test_ws_accepts_valid_ticket (full round-trip), add regression guards for middleware scopes, rate limit exclusion, and handler signature

Test plan

  • test_ws_rejects_missing_ticket -- verifies close code 4001 (handler ran, not middleware rejection)
  • test_ws_rejects_invalid_ticket -- verifies close code 4001
  • test_ws_accepts_valid_ticket -- full round-trip: ticket creation, WS connect, subscribe, verify response
  • test_ws_auth_middleware_scopes_http_only -- verifies scopes == {ScopeType.HTTP}
  • test_ws_rate_limit_excludes_ws_path -- verifies WS path in rate limit exclude list
  • test_ws_handler_signature_no_channels_plugin_param -- regression guard against re-adding ChannelsPlugin as DI param
  • Full unit suite: 9265 passed, 93.91% coverage

Pre-reviewed by 6 agents (code-reviewer, security-reviewer, logging-audit, test-quality, issue-resolution, docs-consistency), 4 findings addressed.

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 18, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cb4ca731-02a6-410f-b9cb-778bc32914bb

📥 Commits

Reviewing files that changed from the base of the PR and between 5577644 and 0897242.

📒 Files selected for processing (2)
  • src/synthorg/api/controllers/ws.py
  • tests/unit/api/controllers/test_ws.py

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • WebSocket connections now bypass HTTP-only auth middleware, are excluded from HTTP rate limiting, and defer acceptance until authentication, preventing premature/incorrect accepts.
    • Improved runtime handling for channel/plugin resolution to fail gracefully when unavailable.
    • Enhanced observability with extra auth-stage and guard-skipped logging for better troubleshooting.
  • Documentation

    • Clarified WebSocket auth, rate-limiting, and plugin-resolution guidance.
  • Tests

    • Expanded WebSocket auth, rate-limit, observability, and handler-signature tests.

Walkthrough

WebSocket upgrade requests are excluded from HTTP rate-limiting and the HTTP-scoped JWT/auth middleware; the WS handler opts out of auth middleware, resolves ChannelsPlugin at runtime, performs ticket and role checks before accepting, emits guard-skipped observability events, and updates tests and docs accordingly.

Changes

Cohort / File(s) Summary
Middleware & App
src/synthorg/api/app.py, src/synthorg/api/auth/middleware.py
Compute ws_path once and add it to rate-limiter and auth exclusion lists; auth middleware restricted to ScopeType.HTTP so WebSocket scope is not processed by HTTP JWT/auth middleware.
WebSocket Handler
src/synthorg/api/controllers/ws.py
ws_handler now uses opt={"exclude_from_auth": True} and drops channels_plugin param; handler resolves ChannelsPlugin at runtime from socket.app.plugins, defers accept until after ticket/role/subscription succeed, and adds auth-stage and plugin-resolution logging and error handling.
Auth Controller & Observability
src/synthorg/api/auth/controller.py, src/synthorg/observability/events/api.py
Added API_AUTH_GUARD_SKIPPED constant and emit guard-skipped events when guards are bypassed for exempt suffixes or when no user is present in scope.
Tests
tests/unit/api/controllers/test_ws.py
Expanded WS tests: parameterized bad-ticket rejections, valid-ticket acceptance flow, assertions for JSON error messages, middleware scope enforcement, rate-limit exclusion, handler signature/opt flag, and guard behavior without a user in WS scope.
Documentation & Misc
docs/..., CLAUDE.md, docs/architecture/tech-stack.md, docs/design/operations.md, docs/security.md
Documentation updated to document runtime plugin resolution for WS handlers, clarify WS auth and rate-limit guidance, add an operations endpoint row, and minor doc edits.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant RateLimiter as RateLimiter
    participant AuthMiddleware as AuthMiddleware
    participant WSHandler as WSHandler
    participant TicketService as TicketService
    participant ChannelsPlugin as ChannelsPlugin

    Client->>RateLimiter: WS upgrade /api/v1/ws?ticket=...
    Note right of RateLimiter: WS path excluded -> pass through
    RateLimiter->>AuthMiddleware: Forward (scope=websocket)
    Note right of AuthMiddleware: Scoped to HTTP -> bypassed
    AuthMiddleware->>WSHandler: Invoke WS handler
    WSHandler->>TicketService: Validate ticket
    TicketService-->>WSHandler: ticket_valid / ticket_invalid
    alt ticket_valid
        WSHandler->>ChannelsPlugin: Resolve via app.plugins.get(...)
        ChannelsPlugin-->>WSHandler: subscribe / accept
        WSHandler-->>Client: Connection accepted (connected)
    else ticket_invalid
        WSHandler-->>Client: Close with auth-failed code (auth-failed)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: fixing WebSocket 403 rejection errors, which is directly addressed by the entire changeset.
Description check ✅ Passed The description is detailed and directly related to the changeset, covering root cause, fix, hardening measures, observability improvements, and test coverage with explicit references to issue #549.
Linked Issues check ✅ Passed All coding requirements from issue #549 are met: WS handler runs and logs validation events, auth middleware scopes restricted to HTTP-only, WS path excluded from rate limiting, handler signature fixed to avoid DI misidentification, and comprehensive test coverage added.
Out of Scope Changes check ✅ Passed All changes are directly scoped to resolving the WebSocket 403 issue and supporting hardening. Documentation updates (CLAUDE.md, tech-stack.md, operations.md, security.md) are all contextually related to the WebSocket fixes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/websocket-403-bug
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/websocket-403-bug
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 18, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses and resolves a critical issue causing WebSocket connections to be prematurely rejected with a 403 error due to a misconfiguration in Litestar's dependency injection for ChannelsPlugin. The changes significantly improve the robustness and security of WebSocket handling by refining authentication middleware application, optimizing rate limiting for persistent connections, and enhancing debugging capabilities through detailed logging.

Highlights

  • WebSocket DI Fix: Resolved an issue where Litestar's Dependency Injection misidentified ChannelsPlugin in WebSocket handler signatures, leading to 403 rejections.
  • Improved WebSocket Handling: Modified the WebSocket handler to dynamically resolve ChannelsPlugin from the application's plugins instead of relying on parameter injection.
  • Enhanced Security & Performance: Restricted authentication middleware to HTTP-only scopes, excluded WebSocket paths from rate limiting, and marked the WebSocket handler to bypass general authentication.
  • Increased Observability: Introduced a new API_AUTH_GUARD_SKIPPED event constant and added detailed DEBUG logging for authentication guard bypasses, ticket validation, and role checks.
  • Comprehensive Testing: Added new tests to verify correct WebSocket close codes (4001 vs 4500), successful connections with valid tickets, proper middleware scope configuration, rate limit exclusion, and the updated WebSocket handler signature.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 18, 2026 19:50 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves the WebSocket 403 rejection by correctly handling the ChannelsPlugin dependency and hardening the authentication and rate-limiting middleware configurations for WebSocket connections. The added logging improves observability, and the new tests provide strong regression guards for the fix. I've identified a minor improvement for one of the new tests to make it more robust.

Comment on lines +378 to +380
assert any("/ws" in pat for pat in (rl_config.exclude or [])), (
f"WS path not excluded from rate limit: {rl_config.exclude}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion any("/ws" in pat ...) is too broad and could lead to false positives. For example, it would match a path like /news if it were in the exclude list. It's better to assert that the specific WebSocket path pattern is present in the exclude list for a more robust test.

Suggested change
assert any("/ws" in pat for pat in (rl_config.exclude or [])), (
f"WS path not excluded from rate limit: {rl_config.exclude}"
)
ws_path = f"^{api_config.api_prefix}/ws$"
assert ws_path in (rl_config.exclude or []), (
f"WS path '{ws_path}' not excluded from rate limit: {rl_config.exclude}"
)

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 18, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.55%. Comparing base (0e52c47) to head (0897242).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/api/controllers/ws.py 52.38% 7 Missing and 3 partials ⚠️
src/synthorg/api/app.py 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #556      +/-   ##
==========================================
+ Coverage   92.45%   92.55%   +0.10%     
==========================================
  Files         544      544              
  Lines       26783    26810      +27     
  Branches     2554     2558       +4     
==========================================
+ Hits        24762    24814      +52     
+ Misses       1615     1584      -31     
- Partials      406      412       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Aureliolo and others added 3 commits March 18, 2026 21:20
…#549)

Litestar's DI misidentified the `channels_plugin: ChannelsPlugin`
parameter as a query param, causing a 4500 close before the handler
ran. Uvicorn logged this as 403. Resolve the plugin from `app.plugins`
instead. Additionally restrict auth middleware scopes to HTTP-only,
exclude WS path from rate limiting, add `opt` auth exclusion on the
handler, and add debug logging throughout the WS auth pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-reviewed by 6 agents, 4 findings addressed:
- Rename API_AUTH_BYPASSED to API_AUTH_GUARD_SKIPPED (semantic fix)
- Remove misleading debug log in authenticate_request
- Add test_ws_rate_limit_excludes_ws_path
- Add test_ws_handler_signature_no_channels_plugin_param

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move scope_type to point of use in require_password_changed guard
- Add defensive try/except around plugins.get(ChannelsPlugin)
- Add inline comment explaining opt={"exclude_from_auth": True}
- Clarify Litestar-internal 4500 close code in 3 comments
- Tighten rate limit test to assert exact WS pattern (Gemini finding)
- Assert specific error messages in test_invalid_json/test_unknown_action
- Add test for require_password_changed guard on WS scope
- Add test for ws_handler opt exclude_from_auth marker
- Document HTTP-only middleware scope and WS rate limit exclusion in
  security.md
- Add DI limitation caveat for WS handlers in tech-stack.md
- Add secret.py to CLAUDE.md auth/ package structure
- Add backup endpoints to operations.md API Surface table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/api/controllers/ws.py`:
- Around line 58-63: The debug log in ws.py currently reuses the
API_WS_CONNECTED event constant for pre-connection ticket validation (see the
logger.debug call that passes API_WS_CONNECTED with stage="ticket_check" and
has_ticket), which is misleading; introduce and use a distinct event constant
(e.g., API_WS_AUTH_STAGE or API_WS_TICKET_INVALID) and update the logger.debug
calls in the ticket validation flow to emit that new constant while preserving
the stage and has_ticket/client fields so observability clearly differentiates
auth/pre-connection steps from actual connection events.
- Around line 228-244: Replace the incorrect call to socket.app.plugins.get(...)
(which raises AttributeError in Litestar 2.21.1) with the established pattern
from get_channels_plugin() in channels.py: iterate socket.app.plugins, find the
first plugin where isinstance(plugin, ChannelsPlugin) and assign it to
channels_plugin; if none found, call logger.warning(API_WS_SEND_FAILED,
reason="channels_plugin_not_registered"), await socket.close(code=1011,
reason="Internal error") and return. Ensure you reference ChannelsPlugin and
socket.app.plugins in the replacement so the code no longer relies on a
non-existent .get() method.

In `@tests/unit/api/controllers/test_ws.py`:
- Around line 365-382: The test currently assumes rate limit middleware is at
middleware[2]; instead, locate the rate limit middleware returned by
_build_middleware by identifying the DefineMiddleware instance that carries a
RateLimitConfig (e.g., find the middleware item where isinstance(item,
DefineMiddleware) and "config" in item.kwargs and
isinstance(item.kwargs["config"], RateLimitConfig)), then read rl_config =
that_item.kwargs["config"] and assert the WS path (constructed from
api_config.api_prefix) is in rl_config.exclude; update references to
rl_define_mw and rl_config accordingly so the test no longer relies on a fixed
index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 524777cf-61c9-4520-a546-8e9d5ca8e8e9

📥 Commits

Reviewing files that changed from the base of the PR and between 6f48217 and a8ea652.

📒 Files selected for processing (10)
  • CLAUDE.md
  • docs/architecture/tech-stack.md
  • docs/design/operations.md
  • docs/security.md
  • src/synthorg/api/app.py
  • src/synthorg/api/auth/controller.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/controllers/ws.py
  • src/synthorg/observability/events/api.py
  • tests/unit/api/controllers/test_ws.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) — PEP 758 except syntax enforced by ruff for Python 3.14.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).

Files:

  • src/synthorg/api/auth/controller.py
  • src/synthorg/api/app.py
  • src/synthorg/observability/events/api.py
  • tests/unit/api/controllers/test_ws.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/controllers/ws.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Never mix static config fields with mutable runtime fields in one model. Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields.
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code.
Variable name: always logger (not _logger, not log).
Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events..
Structured kwargs: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
All...

Files:

  • src/synthorg/api/auth/controller.py
  • src/synthorg/api/app.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/controllers/ws.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async testing: asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Tests must use test-provider, test-small-001, etc. instead of real vendor names.
Property-based testing in Python uses Hypothesis (@given + @settings). Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

  • tests/unit/api/controllers/test_ws.py
🧠 Learnings (7)
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to src/synthorg/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to src/synthorg/**/*.py : Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
  • src/synthorg/api/auth/controller.py
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/api/auth/controller.py
  • src/synthorg/observability/events/api.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • src/synthorg/api/auth/controller.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/api/auth/controller.py
  • src/synthorg/observability/events/api.py
🧬 Code graph analysis (1)
tests/unit/api/controllers/test_ws.py (5)
src/synthorg/api/auth/models.py (2)
  • AuthenticatedUser (69-88)
  • AuthMethod (11-16)
src/synthorg/api/auth/ticket_store.py (1)
  • create (84-115)
src/synthorg/api/app.py (1)
  • _build_middleware (560-598)
src/synthorg/api/controllers/ws.py (1)
  • ws_handler (196-255)
src/synthorg/api/auth/controller.py (1)
  • require_password_changed (176-227)
🔇 Additional comments (21)
docs/design/operations.md (1)

999-999: LGTM!

The new API endpoint row for /api/v1/admin/backups is consistent with the table format and accurately reflects the backup REST API documented in the Backup and Restore section below.

docs/architecture/tech-stack.md (1)

93-93: LGTM!

The caveat accurately documents the Litestar DI limitation for WebSocket handlers and provides the correct workaround (app.plugins.get(PluginClass)). The issue reference (#549) adds helpful traceability.

CLAUDE.md (1)

118-118: LGTM!

The updated auth/ subpackage description accurately reflects its contents by adding "secret resolution" to the existing list of components.

src/synthorg/observability/events/api.py (1)

37-37: LGTM!

The new API_AUTH_GUARD_SKIPPED constant follows the established naming convention and is correctly typed with Final[str]. The value "api.auth.guard_skipped" semantically fits within the auth event namespace.

docs/security.md (1)

75-76: LGTM!

The documentation accurately describes the WebSocket authentication model: HTTP-only middleware scope with handler-level ticket validation, and rate limiting exclusion for persistent connections. This aligns with the implementation changes.

src/synthorg/api/app.py (2)

563-574: LGTM!

The WebSocket path exclusion from rate limiting is correctly implemented:

  • The regex pattern ^{prefix}/ws$ properly anchors both start and end for exact matching
  • The duplicate check (if ws_path not in rl_exclude) prevents redundant entries
  • The comment accurately explains the rationale (HTTP-style rate limiting is inappropriate for persistent WebSocket connections)

587-592: LGTM!

The auth middleware exclusion for WebSocket path is correctly implemented:

  • The comment clearly explains that the WS handler performs its own ticket-based auth
  • The tuple extension pattern (*exclude_paths, ws_path) properly preserves immutability
  • Using model_copy(update=...) follows the Pydantic v2 conventions per coding guidelines
src/synthorg/api/auth/controller.py (2)

24-24: LGTM!

The import of API_AUTH_GUARD_SKIPPED from synthorg.observability.events.api follows the coding guidelines for event constant imports.


197-216: LGTM!

The guard-skipped logging additions are well-implemented:

  • Uses the correct API_AUTH_GUARD_SKIPPED event constant
  • Structured kwargs (guard=, path=, reason=, scope_type=) follow coding guidelines
  • DEBUG level is appropriate for internal flow per guidelines ("DEBUG for object creation, internal flow, entry/exit of key functions")
  • The comment at lines 206-208 clearly explains the expected WebSocket upgrade scenario
src/synthorg/api/auth/middleware.py (2)

9-9: LGTM!

The ScopeType import is correctly placed with other litestar imports and is required for the HTTP-only scope restriction.


318-340: LGTM!

The HTTP-only scope restriction is correctly implemented:

  • The updated docstring clearly explains that WebSocket connections use ticket-based auth in the WS handler
  • Passing scopes={ScopeType.HTTP} to super().__init__ correctly restricts the middleware to HTTP requests only, bypassing it for WebSocket connections
  • This aligns with Litestar 2.21.1's AbstractAuthenticationMiddleware API, where scopes parameter accepts a set of scope types (HTTP and/or WebSocket) to control which connection types trigger the middleware
src/synthorg/api/controllers/ws.py (3)

74-78: LGTM!

Good addition of client context to the warning log—this will aid debugging of ticket validation failures in production.


102-107: LGTM!

Debug logging for role checks provides useful observability for troubleshooting access control issues.


192-198: LGTM!

The defense-in-depth approach with opt={"exclude_from_auth": True} combined with removing channels_plugin from the signature correctly addresses the root cause of #549. The layered safeguards (HTTP-only scope, regex exclusion, opt flag) provide robust protection against future regressions.

tests/unit/api/controllers/test_ws.py (7)

52-63: LGTM!

Assertions correctly verify error messages for invalid JSON and unknown action cases.


284-302: LGTM!

Excellent regression test. Asserting the exact close code (4001) rather than just WebSocketDisconnect ensures future regressions that cause middleware/DI failures (4500) will be caught. The docstring clearly explains the rationale.


304-316: LGTM!

Consistent close code assertion for invalid ticket scenario.


318-343: LGTM!

Comprehensive round-trip test covering the happy path: ticket creation, WS connection, subscription flow. This validates the entire fix for #549.


345-363: LGTM!

Important regression guard ensuring auth middleware stays HTTP-only scoped. The # type: ignore comments acknowledge the internal attribute access, which is acceptable for regression tests that need to verify implementation details.


384-401: LGTM!

Critical regression guard that directly tests the root cause fix for #549. Using inspect.signature to verify the handler doesn't declare channels_plugin as a parameter is a robust approach.


403-435: LGTM!

Good defense-in-depth coverage:

  • test_ws_handler_opt_exclude_from_auth validates the tertiary safeguard marker
  • test_ws_guard_passes_without_user_in_scope confirms the guard correctly handles WS upgrade requests where no user is in scope

- Add API_WS_AUTH_STAGE event constant for pre-connection auth debug
  logs -- stops misleading reuse of API_WS_CONNECTED for ticket
  validation and role check stages
- Replace socket.app.plugins.get(ChannelsPlugin) with iteration
  pattern matching get_channels_plugin() in channels.py -- the .get()
  method does not exist on Litestar's plugin registry
- Find rate limit middleware by type inspection (DefineMiddleware +
  RateLimitConfig isinstance check) instead of hardcoded index

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/synthorg/api/controllers/ws.py (1)

218-247: ⚠️ Potential issue | 🟡 Minor

Delay accept() and API_WS_CONNECTED until channel subscription is ready.

Right now the handler accepts the socket and logs a successful connection before it verifies ChannelsPlugin is available and before subscribe() succeeds. On the channels_plugin is None path, that produces a false-positive connected event and a briefly successful upgrade just before a 1011 close.

♻️ Suggested ordering change
-    socket.scope["user"] = user
-    await socket.accept()
-    logger.info(
-        API_WS_CONNECTED,
-        client=str(socket.client),
-        user_id=user.user_id,
-    )
+    socket.scope["user"] = user

     subscribed: set[str] = set()
     filters: dict[str, dict[str, str]] = {}
@@
     if channels_plugin is None:
         logger.warning(
             API_WS_SEND_FAILED,
             reason="channels_plugin_not_registered",
         )
         await socket.close(code=1011, reason="Internal error")
         return
     subscriber = await channels_plugin.subscribe(list(ALL_CHANNELS))
+
+    await socket.accept()
+    logger.info(
+        API_WS_CONNECTED,
+        client=str(socket.client),
+        user_id=user.user_id,
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/api/controllers/ws.py` around lines 218 - 247, Move the
WebSocket accept and the API_WS_CONNECTED log until after ChannelsPlugin
resolution and successful subscription: do not call socket.accept() or
logger.info(API_WS_CONNECTED, ...) before you iterate socket.app.plugins to find
ChannelsPlugin and await channels_plugin.subscribe(list(ALL_CHANNELS)); if
channels_plugin is None or subscribe() fails, log the failure
(API_WS_SEND_FAILED) and close the socket with 1011 without having accepted the
connection; once subscribe() succeeds, then set socket.scope["user"], call
socket.accept(), and emit the API_WS_CONNECTED log.
tests/unit/api/controllers/test_ws.py (1)

284-316: 🧹 Nitpick | 🔵 Trivial

Parameterize the two ticket-rejection cases.

These tests now only differ by the URL under test, so folding them into one parametrized case will keep the close-code regression coverage easier to extend and maintain.

♻️ Suggested refactor
-    def test_ws_rejects_missing_ticket(
-        self,
-        test_client: TestClient[Any],
-    ) -> None:
-        """WS connection without ?ticket= is rejected with code 4001.
-
-        Verifying the close code (not just WebSocketDisconnect) ensures
-        the rejection comes from the handler's ticket validation -- not
-        from a middleware or DI failure (which would produce a different
-        close code, such as Litestar's internal 4500).
-        """
-        from litestar.exceptions import WebSocketDisconnect
-
-        with (
-            pytest.raises(WebSocketDisconnect) as exc_info,
-            test_client.websocket_connect("/api/v1/ws"),
-        ):
-            pass
-        assert exc_info.value.code == _WS_CLOSE_AUTH_FAILED
-
-    def test_ws_rejects_invalid_ticket(
-        self,
-        test_client: TestClient[Any],
-    ) -> None:
-        """WS connection with a bogus ticket is rejected with code 4001."""
-        from litestar.exceptions import WebSocketDisconnect
-
-        with (
-            pytest.raises(WebSocketDisconnect) as exc_info,
-            test_client.websocket_connect("/api/v1/ws?ticket=bogus-ticket"),
-        ):
-            pass
-        assert exc_info.value.code == _WS_CLOSE_AUTH_FAILED
+    `@pytest.mark.parametrize`(
+        "path",
+        [
+            "/api/v1/ws",
+            "/api/v1/ws?ticket=bogus-ticket",
+        ],
+        ids=["missing_ticket", "invalid_ticket"],
+    )
+    def test_ws_rejects_bad_ticket(
+        self,
+        test_client: TestClient[Any],
+        path: str,
+    ) -> None:
+        """WS connection with a missing or invalid ticket is rejected with code 4001."""
+        from litestar.exceptions import WebSocketDisconnect
+
+        with (
+            pytest.raises(WebSocketDisconnect) as exc_info,
+            test_client.websocket_connect(path),
+        ):
+            pass
+        assert exc_info.value.code == _WS_CLOSE_AUTH_FAILED

As per coding guidelines, "Prefer @pytest.mark.parametrize for testing similar cases."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/controllers/test_ws.py` around lines 284 - 316, Replace the
two nearly identical tests test_ws_rejects_missing_ticket and
test_ws_rejects_invalid_ticket with a single parametrized test using
pytest.mark.parametrize that iterates over the two connection URLs (e.g.,
"/api/v1/ws" and "/api/v1/ws?ticket=bogus-ticket"); inside the test use
TestClient.websocket_connect and pytest.raises(WebSocketDisconnect) to assert
exc_info.value.code == _WS_CLOSE_AUTH_FAILED. Keep references to
TestClient[Any], WebSocketDisconnect, and the _WS_CLOSE_AUTH_FAILED constant so
the behavior and assertion remain identical to the original tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/synthorg/api/controllers/ws.py`:
- Around line 218-247: Move the WebSocket accept and the API_WS_CONNECTED log
until after ChannelsPlugin resolution and successful subscription: do not call
socket.accept() or logger.info(API_WS_CONNECTED, ...) before you iterate
socket.app.plugins to find ChannelsPlugin and await
channels_plugin.subscribe(list(ALL_CHANNELS)); if channels_plugin is None or
subscribe() fails, log the failure (API_WS_SEND_FAILED) and close the socket
with 1011 without having accepted the connection; once subscribe() succeeds,
then set socket.scope["user"], call socket.accept(), and emit the
API_WS_CONNECTED log.

In `@tests/unit/api/controllers/test_ws.py`:
- Around line 284-316: Replace the two nearly identical tests
test_ws_rejects_missing_ticket and test_ws_rejects_invalid_ticket with a single
parametrized test using pytest.mark.parametrize that iterates over the two
connection URLs (e.g., "/api/v1/ws" and "/api/v1/ws?ticket=bogus-ticket");
inside the test use TestClient.websocket_connect and
pytest.raises(WebSocketDisconnect) to assert exc_info.value.code ==
_WS_CLOSE_AUTH_FAILED. Keep references to TestClient[Any], WebSocketDisconnect,
and the _WS_CLOSE_AUTH_FAILED constant so the behavior and assertion remain
identical to the original tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9ab6f086-4bf2-4eb4-b096-6347a7591f65

📥 Commits

Reviewing files that changed from the base of the PR and between a8ea652 and 47d5345.

📒 Files selected for processing (3)
  • src/synthorg/api/controllers/ws.py
  • src/synthorg/observability/events/api.py
  • tests/unit/api/controllers/test_ws.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) — PEP 758 except syntax enforced by ruff for Python 3.14.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).

Files:

  • src/synthorg/api/controllers/ws.py
  • tests/unit/api/controllers/test_ws.py
  • src/synthorg/observability/events/api.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Never mix static config fields with mutable runtime fields in one model. Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields.
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code.
Variable name: always logger (not _logger, not log).
Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events..
Structured kwargs: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
All...

Files:

  • src/synthorg/api/controllers/ws.py
  • src/synthorg/observability/events/api.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async testing: asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Tests must use test-provider, test-small-001, etc. instead of real vendor names.
Property-based testing in Python uses Hypothesis (@given + @settings). Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

  • tests/unit/api/controllers/test_ws.py
🧠 Learnings (2)
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/api/controllers/ws.py
  • src/synthorg/observability/events/api.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/api.py
🧬 Code graph analysis (1)
tests/unit/api/controllers/test_ws.py (5)
src/synthorg/api/auth/models.py (2)
  • AuthenticatedUser (69-88)
  • AuthMethod (11-16)
src/synthorg/api/auth/ticket_store.py (1)
  • create (84-115)
src/synthorg/api/app.py (1)
  • _build_middleware (560-598)
src/synthorg/api/controllers/ws.py (1)
  • ws_handler (197-257)
src/synthorg/api/auth/controller.py (1)
  • require_password_changed (176-227)
🔇 Additional comments (5)
src/synthorg/observability/events/api.py (1)

37-60: Dedicated event constants fit the new WS/auth instrumentation.

API_AUTH_GUARD_SKIPPED and API_WS_AUTH_STAGE keep the new logs on named API events instead of ad-hoc strings, which will make filtering and dashboards much cleaner.

Based on learnings, "Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.`."

src/synthorg/api/controllers/ws.py (2)

59-108: The new pre-connection DEBUG stages are well-scoped.

Using API_WS_AUTH_STAGE for ticket_check, ticket_valid, and role_check keeps flow tracing separate from terminal events like API_WS_CONNECTED and API_WS_TICKET_INVALID.

Based on learnings, "Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.`."


193-196: Good defense-in-depth on the WS route.

The explicit exclude_from_auth opt makes the handler’s auth model obvious even with HTTP-only middleware scopes and regex path exclusion upstream.

tests/unit/api/controllers/test_ws.py (2)

47-63: Exact parser error assertions are a good tightening.

These checks lock _handle_message() to its user-facing error contract instead of only asserting that some error payload came back.


318-452: Nice regression net around the WS 403 fix.

The valid-ticket round-trip plus the middleware-scope, rate-limit, handler-signature, decorator-opt, and guard-pass-through checks cover the exact integration edges that caused #549.

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 18, 2026
…ize tests

- Move socket.accept() after ChannelsPlugin resolution and subscribe()
  so infrastructure failures never accept-then-immediately-close
- Merge test_ws_rejects_missing_ticket and test_ws_rejects_invalid_ticket
  into a single parametrized test_ws_rejects_bad_ticket

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/api/controllers/ws.py (1)

237-248: ⚠️ Potential issue | 🟡 Minor

Add explicit error logging if channels_plugin.subscribe() fails.

The subscribe() call on line 237 correctly precedes socket.accept() to avoid the accept-then-close pattern on infrastructure failures. However, if subscribe() raises an exception, it propagates without logging. This violates the "handle errors explicitly" guideline. Add a try/except around line 237 to log at WARNING or ERROR with context (e.g., reason code) before closing the socket, consistent with the error handling pattern used for channels_plugin is None at lines 30–36.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/api/controllers/ws.py` around lines 237 - 248, Wrap the call to
channels_plugin.subscribe(list(ALL_CHANNELS)) in a try/except that logs the
failure (using logger at WARNING or ERROR) with context (e.g., reason code,
client/socket info and user_id) before closing the socket; on exception, do not
call socket.accept(), log the error similarly to the existing channels_plugin is
None handling, close the socket (e.g., await socket.close()) and return/raise as
appropriate so the subsequent lines that set socket.scope["user"] and call
socket.accept() / logger.info (API_WS_CONNECTED) only run on success.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/api/controllers/test_ws.py`:
- Around line 432-450: Update the test
test_ws_guard_passes_without_user_in_scope to make the "no exception"
expectation explicit: call require_password_changed(connection, MagicMock()) and
capture its return (e.g. result = require_password_changed(...)) then add an
assertion that the result is None (or another explicit assertion like assert
result is None) so it's clear the guard passes through for WS connections;
reference the test function name and the require_password_changed function when
making the change.

---

Outside diff comments:
In `@src/synthorg/api/controllers/ws.py`:
- Around line 237-248: Wrap the call to
channels_plugin.subscribe(list(ALL_CHANNELS)) in a try/except that logs the
failure (using logger at WARNING or ERROR) with context (e.g., reason code,
client/socket info and user_id) before closing the socket; on exception, do not
call socket.accept(), log the error similarly to the existing channels_plugin is
None handling, close the socket (e.g., await socket.close()) and return/raise as
appropriate so the subsequent lines that set socket.scope["user"] and call
socket.accept() / logger.info (API_WS_CONNECTED) only run on success.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1836e564-ab5d-451c-a004-c7656f9b0d6d

📥 Commits

Reviewing files that changed from the base of the PR and between 47d5345 and 5577644.

📒 Files selected for processing (2)
  • src/synthorg/api/controllers/ws.py
  • tests/unit/api/controllers/test_ws.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) — PEP 758 except syntax enforced by ruff for Python 3.14.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).

Files:

  • src/synthorg/api/controllers/ws.py
  • tests/unit/api/controllers/test_ws.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Never mix static config fields with mutable runtime fields in one model. Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields.
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code.
Variable name: always logger (not _logger, not log).
Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events..
Structured kwargs: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
All...

Files:

  • src/synthorg/api/controllers/ws.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async testing: asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Tests must use test-provider, test-small-001, etc. instead of real vendor names.
Property-based testing in Python uses Hypothesis (@given + @settings). Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

  • tests/unit/api/controllers/test_ws.py
🧠 Learnings (1)
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/api/controllers/ws.py
🧬 Code graph analysis (1)
tests/unit/api/controllers/test_ws.py (6)
src/synthorg/api/auth/models.py (2)
  • AuthenticatedUser (69-88)
  • AuthMethod (11-16)
src/synthorg/api/auth/ticket_store.py (1)
  • create (84-115)
src/synthorg/api/app.py (1)
  • _build_middleware (560-598)
src/synthorg/communication/config.py (1)
  • RateLimitConfig (192-213)
src/synthorg/api/controllers/ws.py (1)
  • ws_handler (197-261)
src/synthorg/api/auth/controller.py (1)
  • require_password_changed (176-227)
🔇 Additional comments (10)
src/synthorg/api/controllers/ws.py (4)

17-36: LGTM!

The import of ChannelsPlugin is correctly placed for runtime resolution, and API_WS_AUTH_STAGE is properly imported from the domain-specific events module as required by the coding guidelines.


59-64: LGTM!

The debug logging for authentication stages uses the correct event constant API_WS_AUTH_STAGE with structured kwargs, properly addressing the previous review feedback about using a distinct event for pre-connection validation stages.


193-196: LGTM!

The opt={"exclude_from_auth": True} provides defense-in-depth authentication bypass, and the comment clearly explains its role alongside the HTTP-only scope restriction and regex path exclusion.


218-236: LGTM!

The plugin resolution correctly uses the iteration pattern established in get_channels_plugin(), avoiding the non-existent .get() method on Litestar's PluginRegistry. The error handling with warning log and graceful 1011 close is appropriate.

tests/unit/api/controllers/test_ws.py (6)

52-63: LGTM!

The assertions now use exact error message comparisons ("Invalid JSON", "Unknown action") which provides stronger validation than checking for "error" in data.


284-314: LGTM!

Good use of @pytest.mark.parametrize to consolidate missing and invalid ticket scenarios. The close code assertion (_WS_CLOSE_AUTH_FAILED) is crucial for ensuring rejections come from the handler's ticket validation rather than middleware/DI failures.


316-341: LGTM!

Comprehensive round-trip test that validates the happy path: ticket creation → WebSocket connection → subscription → response verification. The test properly uses AuthMethod.WS_TICKET to match the expected auth method.


343-361: LGTM!

Important regression test ensuring the auth middleware is scoped to HTTP-only, preventing it from interfering with WebSocket upgrades.


363-397: LGTM!

The rate limit middleware is now located by type checking (DefineMiddleware with LitestarRateLimitConfig) rather than relying on a fragile index position, addressing the previous review feedback.


399-416: LGTM!

Excellent regression guard that prevents reintroduction of the channels_plugin parameter which caused the original DI misidentification bug (#549). Using inspect.signature on ws_handler.fn is the correct approach.

Comment on lines +432 to +450
def test_ws_guard_passes_without_user_in_scope(self) -> None:
"""require_password_changed guard must pass for WS connections.

The auth middleware is HTTP-only, so WebSocket upgrade requests
arrive at the guard with no user in scope. The guard must
return without raising PermissionDeniedException. The actual
WS auth happens via ticket validation inside the handler.
"""
from unittest.mock import MagicMock

from synthorg.api.auth.controller import require_password_changed

connection = MagicMock()
connection.url.path = "/api/v1/ws"
connection.scope = {"type": "websocket"}

# Should not raise -- the guard passes through when user is
# absent, which is the expected state for WS connections.
require_password_changed(connection, MagicMock())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider making the pass-through assertion explicit.

The test calls require_password_changed(connection, MagicMock()) and expects it to not raise, but there's no explicit assertion. While the test will fail if an exception is raised, consider adding a comment or using a pattern that makes the "no exception" expectation clearer:

♻️ Optional: Make the expectation explicit
         # Should not raise -- the guard passes through when user is
         # absent, which is the expected state for WS connections.
-        require_password_changed(connection, MagicMock())
+        # If this raises PermissionDeniedException, the test fails.
+        result = require_password_changed(connection, MagicMock())
+        assert result is None  # Guard returns None on pass-through

Alternatively, if the guard doesn't return a meaningful value, the current implicit assertion is acceptable—the comment already explains the expectation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/controllers/test_ws.py` around lines 432 - 450, Update the
test test_ws_guard_passes_without_user_in_scope to make the "no exception"
expectation explicit: call require_password_changed(connection, MagicMock()) and
capture its return (e.g. result = require_password_changed(...)) then add an
assertion that the result is None (or another explicit assertion like assert
result is None) so it's clear the guard passes through for WS connections;
reference the test function name and the require_password_changed function when
making the change.

Aureliolo and others added 2 commits March 18, 2026 22:24
- Wrap channels_plugin.subscribe() in try/except so a subscription
  failure logs context and closes without accepting
- Make test_ws_guard_passes_without_user_in_scope assertion explicit
  (assert result is None instead of relying on no-exception)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
require_password_changed is typed as returning None -- cannot
capture its return value.  Use implicit no-exception assertion
with explicit comment instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit 60453d2 into main Mar 18, 2026
15 of 17 checks passed
@Aureliolo Aureliolo deleted the fix/websocket-403-bug branch March 18, 2026 21:34
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 18, 2026 21:34 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Mar 18, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.5](v0.3.4...v0.3.5)
(2026-03-18)


### Features

* **api:** auto-wire backend services at startup
([#555](#555))
([0e52c47](0e52c47))


### Bug Fixes

* **api:** resolve WebSocket 403 rejection
([#549](#549))
([#556](#556))
([60453d2](60453d2))
* **cli:** verify SLSA provenance via GitHub attestation API
([#548](#548))
([91d4f79](91d4f79)),
closes [#532](#532)


### Performance

* **test:** speed up test suite -- reduce Hypothesis examples and
eliminate real sleeps
([#557](#557))
([d5f3a41](d5f3a41))


### Refactoring

* replace _ErrorResponseSpec NamedTuple with TypedDict
([#554](#554))
([71cc6e1](71cc6e1))


### Maintenance

* **docker:** suppress pydantic v1 warning on Python 3.14
([#552](#552))
([cbe1f05](cbe1f05)),
closes [#551](#551)

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: WebSocket connections rejected with 403 despite correct auth exclude path

1 participant