-
Notifications
You must be signed in to change notification settings - Fork 614
[BUG][TESTING]: Playwright UI tests flaky due to shared login state and HTMX sync races #3105
Description
Bug Summary
Several Playwright E2E tests fail intermittently when run together but pass in isolation. Root cause analysis identifies two distinct mechanisms: shared mutable login state across test fixtures, and hard-coded sleep-based HTMX synchronization that breaks under server load.
Related: #3099 covers broader Playwright stability patterns. This issue provides specific root cause identification and targeted fixes for the login state and HTMX sync mechanisms.
Classification: Test infrastructure issue, not a code bug. The gateway code correctly validates JWTs (stateless) on all workers. The flakiness originates entirely from test fixtures and synchronization patterns.
Affected Tests
| Test | Failure mode | Isolated | Together |
|---|---|---|---|
test_edit_user |
Timeout waiting for .user-card after create |
PASS | FAIL |
test_force_password_change |
Timeout on page.goto("/admin") in fixture |
PASS | FAIL |
test_form_validation_feedback |
Timeout on wait_for_selector (60s) |
PASS | FAIL |
test_all_tabs_navigation[logs] |
Login redirect — URL assertion fails | PASS | FAIL |
test_token_lifecycle (6 tests) |
401 Invalid authentication credentials |
PASS | FAIL |
test_tool_form_submission |
401 on POST /admin/tools — login state lost |
PASS | FAIL |
test_tab_content_loading_via_javascript |
Login redirect — fixture _ensure_admin_logged_in fails |
PASS | FAIL |
Affected Component
-
mcpgateway- UI (admin panel) - Other: Playwright test infrastructure (
tests/playwright/conftest.py, page objects)
Root Cause 1: Shared mutable login state (ADMIN_ACTIVE_PASSWORD)
Files: tests/playwright/conftest.py — _ensure_admin_logged_in(), ADMIN_ACTIVE_PASSWORD
The _ensure_admin_logged_in fixture has a multi-step login flow (form login → password change → retry → JWT cookie fallback). ADMIN_ACTIVE_PASSWORD is a module-level mutable list shared across all tests in a session. When tests run together:
- Test A's login triggers
submit_password_change, updatingADMIN_ACTIVE_PASSWORD[0] - Test B's fixture runs concurrently or after, getting stale or changed password state
- The fallback chain usually recovers, but under load
page.goto("/admin")itself times out (60s), or the login POST returns 401 before the JWT cookie fallback can kick in
This explains:
test_force_password_changeandtest_all_tabs_navigation[logs]: fixture-level timeout/redirect during_ensure_admin_logged_intest_token_lifecycle401s: theadmin_apifixture insecurity/conftest.pygenerates a JWT at fixture creation time, but if the server's auth state is mid-transition from a concurrent UI test's password change, the token may be rejectedtest_tool_form_submission: the POST to/admin/toolsreturns 401 because the JWT cookie wasn't properly set or expired between test runs
Fix: Each fixture should use a fresh JWT cookie directly (as security/conftest.py:_make_jwt already does) instead of the multi-step form login. This eliminates shared password state entirely.
Root Cause 2: Hard-coded HTMX sleep in reload_and_navigate_to_users
Files: tests/playwright/pages/users_page.py:183, tests/playwright/entities/test_users.py:38-41
# users_page.py:183
def reload_and_navigate_to_users(self):
self.page.wait_for_timeout(4000) # <-- hard sleep
self.page.wait_for_load_state("domcontentloaded")
self.page.reload(wait_until="domcontentloaded")
self.sidebar.click_users_tab()
self.wait_for_users_loaded()The 4-second hard sleep is the fragile synchronization point. When the server is under load from other tests, the HTMX partial response from user creation may not have completed before the reload fires. The subsequent wait_for_selector(".user-card:has-text('email')") (30s timeout) then fails because the page reload happened before the server committed the new user.
This explains test_edit_user — user creation POST returns 200, but the user card isn't visible after reload because the DB write hadn't flushed before the reload.
Fix: Replace wait_for_timeout(4000) with page.wait_for_load_state("networkidle") or a retry loop on the user card selector itself, so synchronization adapts to actual server speed.
Root Cause 3: Multi-worker environment amplifies login flakiness
Files: tests/playwright/conftest.py — _ensure_admin_logged_in(), _set_admin_jwt_cookie()
In the docker-compose environment (3 gateway workers behind nginx), the login flakiness is amplified:
- The
_ensure_admin_logged_infixture performs a form login against one gateway worker - Subsequent requests (e.g., POST
/admin/tools) may be routed by nginx to a different gateway worker - While JWTs are stateless and should work across workers, the multi-step login flow's fallback chain and cookie-setting timing become more fragile with load balancing
Evidence from 10-iteration stress test:
test_tool_form_submissionfailed 1/10 runs with 401 (not 500 as initially suspected)- The failure is identical to Root Cause 1 — the JWT cookie wasn't properly established before the tool creation POST
Note: A "Tool creation failed" message in the gateway-1 structured logs was initially misattributed to this Playwright test. Investigation confirmed it was from a separate API-level duplicate-name test (test-api-tool-dup-*) that correctly returned 409 via the IntegrityError handler. The structured logger's generic "Tool creation failed" message without the exception class made triage harder.
Steps to Reproduce
# All pass individually:
TEST_BASE_URL=http://localhost:8080 pytest tests/playwright/entities/test_users.py::TestUsersCRUD::test_edit_user -v --browser chromium
TEST_BASE_URL=http://localhost:8080 pytest tests/playwright/test_htmx_interactions.py::TestHTMXInteractions::test_tool_form_submission -v --browser chromium
# Fails intermittently when run together:
TEST_BASE_URL=http://localhost:8080 pytest tests/playwright/entities/test_users.py tests/playwright/test_htmx_interactions.py tests/playwright/security/test_token_lifecycle.py -v --browser chromium
# Rapid sequential runs also trigger it:
for i in $(seq 1 10); do pytest tests/playwright/test_htmx_interactions.py::TestHTMXInteractions::test_tool_form_submission -x -v; doneExpected Behavior
All Playwright E2E tests should pass reliably when run together, not just in isolation.
Environment Info
| Key | Value |
|---|---|
| Runtime | Python 3.13, Playwright 1.58.0 |
| Platform | Linux (WSL2) |
| Browser | Chromium (headless) |
| Database | SQLite (local) / PostgreSQL 18 via pgbouncer (docker-compose) |
| Topology | 3 gateway workers behind nginx (docker-compose) |