Skip to content

[BUG][TESTING]: Playwright UI tests flaky due to shared login state and HTMX sync races #3105

@crivetimihai

Description

@crivetimihai

Bug Summary

Several Playwright E2E tests fail intermittently when run together but pass in isolation. Root cause analysis identifies two distinct mechanisms: shared mutable login state across test fixtures, and hard-coded sleep-based HTMX synchronization that breaks under server load.

Related: #3099 covers broader Playwright stability patterns. This issue provides specific root cause identification and targeted fixes for the login state and HTMX sync mechanisms.

Classification: Test infrastructure issue, not a code bug. The gateway code correctly validates JWTs (stateless) on all workers. The flakiness originates entirely from test fixtures and synchronization patterns.

Affected Tests

Test Failure mode Isolated Together
test_edit_user Timeout waiting for .user-card after create PASS FAIL
test_force_password_change Timeout on page.goto("/admin") in fixture PASS FAIL
test_form_validation_feedback Timeout on wait_for_selector (60s) PASS FAIL
test_all_tabs_navigation[logs] Login redirect — URL assertion fails PASS FAIL
test_token_lifecycle (6 tests) 401 Invalid authentication credentials PASS FAIL
test_tool_form_submission 401 on POST /admin/tools — login state lost PASS FAIL
test_tab_content_loading_via_javascript Login redirect — fixture _ensure_admin_logged_in fails PASS FAIL

Affected Component

  • mcpgateway - UI (admin panel)
  • Other: Playwright test infrastructure (tests/playwright/conftest.py, page objects)

Root Cause 1: Shared mutable login state (ADMIN_ACTIVE_PASSWORD)

Files: tests/playwright/conftest.py_ensure_admin_logged_in(), ADMIN_ACTIVE_PASSWORD

The _ensure_admin_logged_in fixture has a multi-step login flow (form login → password change → retry → JWT cookie fallback). ADMIN_ACTIVE_PASSWORD is a module-level mutable list shared across all tests in a session. When tests run together:

  1. Test A's login triggers submit_password_change, updating ADMIN_ACTIVE_PASSWORD[0]
  2. Test B's fixture runs concurrently or after, getting stale or changed password state
  3. The fallback chain usually recovers, but under load page.goto("/admin") itself times out (60s), or the login POST returns 401 before the JWT cookie fallback can kick in

This explains:

  • test_force_password_change and test_all_tabs_navigation[logs]: fixture-level timeout/redirect during _ensure_admin_logged_in
  • test_token_lifecycle 401s: the admin_api fixture in security/conftest.py generates a JWT at fixture creation time, but if the server's auth state is mid-transition from a concurrent UI test's password change, the token may be rejected
  • test_tool_form_submission: the POST to /admin/tools returns 401 because the JWT cookie wasn't properly set or expired between test runs

Fix: Each fixture should use a fresh JWT cookie directly (as security/conftest.py:_make_jwt already does) instead of the multi-step form login. This eliminates shared password state entirely.

Root Cause 2: Hard-coded HTMX sleep in reload_and_navigate_to_users

Files: tests/playwright/pages/users_page.py:183, tests/playwright/entities/test_users.py:38-41

# users_page.py:183
def reload_and_navigate_to_users(self):
    self.page.wait_for_timeout(4000)  # <-- hard sleep
    self.page.wait_for_load_state("domcontentloaded")
    self.page.reload(wait_until="domcontentloaded")
    self.sidebar.click_users_tab()
    self.wait_for_users_loaded()

The 4-second hard sleep is the fragile synchronization point. When the server is under load from other tests, the HTMX partial response from user creation may not have completed before the reload fires. The subsequent wait_for_selector(".user-card:has-text('email')") (30s timeout) then fails because the page reload happened before the server committed the new user.

This explains test_edit_user — user creation POST returns 200, but the user card isn't visible after reload because the DB write hadn't flushed before the reload.

Fix: Replace wait_for_timeout(4000) with page.wait_for_load_state("networkidle") or a retry loop on the user card selector itself, so synchronization adapts to actual server speed.

Root Cause 3: Multi-worker environment amplifies login flakiness

Files: tests/playwright/conftest.py_ensure_admin_logged_in(), _set_admin_jwt_cookie()

In the docker-compose environment (3 gateway workers behind nginx), the login flakiness is amplified:

  1. The _ensure_admin_logged_in fixture performs a form login against one gateway worker
  2. Subsequent requests (e.g., POST /admin/tools) may be routed by nginx to a different gateway worker
  3. While JWTs are stateless and should work across workers, the multi-step login flow's fallback chain and cookie-setting timing become more fragile with load balancing

Evidence from 10-iteration stress test:

  • test_tool_form_submission failed 1/10 runs with 401 (not 500 as initially suspected)
  • The failure is identical to Root Cause 1 — the JWT cookie wasn't properly established before the tool creation POST

Note: A "Tool creation failed" message in the gateway-1 structured logs was initially misattributed to this Playwright test. Investigation confirmed it was from a separate API-level duplicate-name test (test-api-tool-dup-*) that correctly returned 409 via the IntegrityError handler. The structured logger's generic "Tool creation failed" message without the exception class made triage harder.


Steps to Reproduce

# All pass individually:
TEST_BASE_URL=http://localhost:8080 pytest tests/playwright/entities/test_users.py::TestUsersCRUD::test_edit_user -v --browser chromium
TEST_BASE_URL=http://localhost:8080 pytest tests/playwright/test_htmx_interactions.py::TestHTMXInteractions::test_tool_form_submission -v --browser chromium

# Fails intermittently when run together:
TEST_BASE_URL=http://localhost:8080 pytest tests/playwright/entities/test_users.py tests/playwright/test_htmx_interactions.py tests/playwright/security/test_token_lifecycle.py -v --browser chromium

# Rapid sequential runs also trigger it:
for i in $(seq 1 10); do pytest tests/playwright/test_htmx_interactions.py::TestHTMXInteractions::test_tool_form_submission -x -v; done

Expected Behavior

All Playwright E2E tests should pass reliably when run together, not just in isolation.

Environment Info

Key Value
Runtime Python 3.13, Playwright 1.58.0
Platform Linux (WSL2)
Browser Chromium (headless)
Database SQLite (local) / PostgreSQL 18 via pgbouncer (docker-compose)
Topology 3 gateway workers behind nginx (docker-compose)

Metadata

Metadata

Assignees

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releasebugSomething isn't workingtestingTesting (unit, e2e, manual, automated, etc)uiUser Interface

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions