Skip to content

[EPIC][TESTING][UI]: Comprehensive Playwright E2E Test Suite for MCP Gateway Admin UI #2519

@crivetimihai

Description

@crivetimihai

[EPIC][TESTING][UI]: Comprehensive Playwright E2E Test Suite for MCP Gateway Admin UI

Goal

Implement & test complete end-to-end Playwright test coverage for the MCP Gateway Admin UI, ensuring all user workflows, CRUD operations, HTMX interactions, and administrative features are automatically validated before each release.

The framework (and many of the tests) below should already exist, but have not been refined, verified, expanded, optimized, etc. The goal of this epic is to ensure the testing is complete, comprehensive, and fully working for all UI testing needs.

Prerequisites: #2136 (auth fix) must be resolved first.

Related Issues:


Why Now?

The Admin UI is the primary interface for administrators managing MCP servers, tools, and agents. Automated E2E coverage is needed to:

  1. Prevent UI regressions across releases
  2. Validate HTMX/Alpine.js interactions that are difficult to unit test
  3. Ensure authentication and RBAC work correctly from the user's perspective
  4. Catch integration issues between frontend and backend
  5. Enable confident refactoring of the UI codebase

Environment Configuration

Feature Flags That Control UI Visibility

Tests must detect and adapt to these feature flags. The recommended approach is to check element visibility rather than fail when features are disabled.

Important: Settings defaults (in config.py) differ from .env.example values:

Feature Flag Settings Default .env.example UI Tab/Section Affected
MCPGATEWAY_UI_ENABLED false true Entire Admin UI
MCPGATEWAY_ADMIN_API_ENABLED false true Admin API endpoints
EMAIL_AUTH_ENABLED true true Organization section (Teams, Users, Tokens)
MCPGATEWAY_A2A_ENABLED true true Agents -> A2A Agents tab
MCPGATEWAY_GRPC_ENABLED false false Agents -> gRPC Services tab
PLUGINS_ENABLED false false Extensions -> Plugins tab
LLMCHAT_ENABLED false false LLM -> LLM Chat, LLM Settings tabs
TOOLOPS_ENABLED false false MCP -> ToolOps tab
OBSERVABILITY_ENABLED false false Monitoring -> Observability tab
MCPGATEWAY_PERFORMANCE_TRACKING false false Monitoring -> Performance tab
STRUCTURED_LOGGING_DATABASE_ENABLED false false System -> Logs tab (required for log viewing)

Note: For testing, ensure .env explicitly sets MCPGATEWAY_UI_ENABLED=true and MCPGATEWAY_ADMIN_API_ENABLED=true since the Settings defaults are false.

Recommended Test Environment .env

# Core (required)
MCPGATEWAY_UI_ENABLED=true
MCPGATEWAY_ADMIN_API_ENABLED=true
EMAIL_AUTH_ENABLED=true
MCPGATEWAY_UI_AIRGAPPED=true  # Use local assets to avoid CDN failures

# Test credentials (NEW - replaces BASIC_AUTH_USER/PASSWORD)
PLATFORM_ADMIN_EMAIL=admin@example.com
PLATFORM_ADMIN_PASSWORD=changeme

# Enable ALL optional features for full coverage
MCPGATEWAY_A2A_ENABLED=true
MCPGATEWAY_GRPC_ENABLED=true
PLUGINS_ENABLED=true
LLMCHAT_ENABLED=true
TOOLOPS_ENABLED=true
OBSERVABILITY_ENABLED=true
MCPGATEWAY_PERFORMANCE_TRACKING=true
STRUCTURED_LOGGING_DATABASE_ENABLED=true

# Disable password change requirement for tests
PASSWORD_CHANGE_ENFORCEMENT_ENABLED=false
ADMIN_REQUIRE_PASSWORD_CHANGE_ON_BOOTSTRAP=false

Key Test Infrastructure (conftest.py)

The following fixtures and utilities should be implemented:

Fixture/Class Purpose
authenticated_page Returns a Page logged into admin UI via form POST
enabled_features Detects which feature tabs are visible (depends on authenticated_page)
skip_if_feature_disabled Decorator to skip tests with warning when feature disabled
ConsoleErrorCollector Collects browser console errors, auto-asserts no errors at test end
NetworkRequestCollector Monitors network requests, flags 4xx/5xx failures (with allowlist support)
VisualComparator Screenshot comparison with 1% pixel threshold (requires pixelmatch, Pillow)
PerformanceTimer Measures page load times, asserts < 3s threshold

Important: Do NOT use wait_until='networkidle' for admin pages - the SSE connection (/admin/events) keeps a connection open indefinitely. Use domcontentloaded + selector waits instead.


User Stories

US-1: QA Engineer - Automated Regression Coverage

As a QA Engineer, I want automated E2E tests that cover all Admin UI workflows so that every release is validated without manual effort.

Acceptance Criteria:

  • All CRUD operations tested for each entity type
  • Tab navigation and pagination tested
  • Form validation errors tested
  • Tests run in CI on every PR
  • Tests gracefully skip when optional features are disabled
US-2: Developer - Confident Refactoring

As a Developer, I want comprehensive UI tests so I can refactor frontend code without fear of breaking user workflows.

Acceptance Criteria:

  • Tests cover all critical user paths
  • Tests fail fast on breaking changes
  • Clear error messages indicate what broke
US-3: Security Engineer - Authentication Validation

As a Security Engineer, I want tests that validate authentication flows and access control from the UI perspective.

Acceptance Criteria:

  • Email/password login tested (not HTTP Basic Auth)
  • Password change flow tested
  • Admin-only pages properly protected
  • Session expiration handled correctly

Test Strategy

Test Organization

tests/playwright/
├── conftest.py                    # Shared fixtures, feature detection, console/network monitors
├── pages/                         # Page Object Model (25 page objects)
├── entities/                      # CRUD tests (always-on features)
├── entities_optional/             # CRUD tests (feature-flagged)
├── features/                      # Feature tests (auth, observability, plugins, etc.)
├── interactions/                  # UI interaction tests (navigation, htmx, modals, forms)
├── accessibility/                 # WCAG AA compliance (required)
├── chaos/                         # Multi-tab and stress testing (lower priority)
├── network/                       # Network condition simulation (local only)
├── performance/                   # Performance and memory tests
├── visual/                        # Visual regression (1% threshold)
├── cross_browser/                 # Browser-specific tests
└── realtime/                      # SSE and real-time tests

Test Markers

# pyproject.toml - key markers
markers = [
    "smoke: Critical path tests (< 2 min total)",
    "crud: Entity CRUD operations",
    "a11y: Accessibility tests (WCAG AA required)",
    "chaos: Multi-tab and stress tests (lower priority)",
    "network: Network condition simulation (local only)",
    "perf: Performance threshold tests",
    "requires_*: Feature flag requirements",
]

Implementation Tasks

Phase 1: Foundation & Fixtures

1.1 Core Fixtures

1.2 Page Objects (Always-On Features)

  • PO-1: LoginPage - form login, error handling, SSO buttons
  • PO-2: AdminPage - sidebar, tab navigation, dark mode toggle
  • PO-3: ServersPage (Virtual Servers/Catalog) - full CRUD
  • PO-4: GatewaysPage (MCP Servers) - CRUD + connectivity test
  • PO-5: ToolsPage - CRUD + schema display
  • PO-6: ResourcesPage - CRUD + URI handling
  • PO-7: PromptsPage - CRUD + arguments
  • PO-8: RootsPage - CRUD
  • PO-9: McpRegistryPage - browse, search, register individual servers
  • PO-10: MetricsPage - admin dashboard stats
  • PO-11: ExportImportPage - export/import workflows
  • PO-12: LogsPage - view/stream logs
  • PO-13: MaintenancePage - cleanup, rollup operations
  • PO-14: VersionInfoPage - version info, services status, support bundle

1.3 Page Objects (Optional Features)

  • PO-15: A2AAgentsPage (requires MCPGATEWAY_A2A_ENABLED)
  • PO-16: GrpcServicesPage (requires MCPGATEWAY_GRPC_ENABLED)
  • PO-17: LlmChatPage (requires LLMCHAT_ENABLED)
  • PO-18: LlmSettingsPage (requires LLMCHAT_ENABLED)
  • PO-19: ToolOpsPage (requires TOOLOPS_ENABLED)
  • PO-20: ObservabilityPage (requires OBSERVABILITY_ENABLED)
  • PO-21: PerformancePage (requires MCPGATEWAY_PERFORMANCE_TRACKING)
  • PO-22: PluginsPage (requires PLUGINS_ENABLED)
  • PO-23: TeamsPage (requires EMAIL_AUTH_ENABLED)
  • PO-24: UsersPage (requires EMAIL_AUTH_ENABLED + admin)
  • PO-25: TokensPage (requires EMAIL_AUTH_ENABLED)

Phase 2: Authentication Tests

Note: To test AUTH-6/AUTH-7 (password change flow), you'll need a separate test profile with PASSWORD_CHANGE_ENFORCEMENT_ENABLED=true.

  • AUTH-1: Email/password login success via form POST
  • AUTH-2: Login with invalid credentials (error message display)
  • AUTH-3: Login with missing fields (validation)
  • AUTH-4: Logout functionality (cookie cleared)
  • AUTH-5: Session expiration handling (redirect to login)
  • AUTH-6: Password change required flow ⚠️ requires enforcement enabled
  • AUTH-7: Password validation errors ⚠️ requires enforcement enabled
  • AUTH-8: SSO provider buttons display (when SSO providers enabled)
  • AUTH-9: Admin-only page protection (non-admin gets restricted view)
  • AUTH-10: JWT cookie httpOnly flag (always), Secure flag (HTTPS only)

Phase 3: Core Entity CRUD Tests (Always-On)

3.1 Virtual Servers (Catalog) #catalog-panel

  • SRV-1: List servers with pagination
  • SRV-2: Create new server via inline form #add-server-form
  • SRV-3: Edit existing server (via edit modal)
  • SRV-4: Delete server with confirmation dialog
  • SRV-5: Activate/deactivate server toggle
  • SRV-6: Associate tools/resources/prompts with server
  • SRV-7: Server search functionality (#catalog-search-input)
  • SRV-8: View server details via view modal
  • SRV-9: Filter by team (when EMAIL_AUTH_ENABLED)
  • SRV-10: Show inactive toggle

3.2 MCP Servers (Gateways) #gateways-panel

  • GW-1: List gateways with pagination
  • GW-2: Create new gateway (name, URL, transport type)
  • GW-3: Edit gateway configuration
  • GW-4: Delete gateway
  • GW-5: Activate/deactivate gateway
  • GW-6: Test gateway connectivity button (#gateway-test-modal)
  • GW-7: View associated tools/resources/prompts counts
  • GW-8: OAuth configuration (when OAuth enabled)
  • GW-9: Passthrough headers configuration
  • GW-10: Fetch Tools from MCP Server (fetchToolsForGateway)

3.3 Tools #tools-panel

  • TOOL-1: List tools with pagination
  • TOOL-2: Create new REST tool via #add-tool-form
  • TOOL-3: Create MCP tool (from gateway discovery)
  • TOOL-4: Edit tool configuration
  • TOOL-5: Delete tool
  • TOOL-6: Activate/deactivate tool
  • TOOL-7: Tool search and filtering
  • TOOL-8: View tool details (input schema, annotations)
  • TOOL-9: Test tool execution via #tool-test-modal
  • TOOL-10: Tool visibility settings (public/team/private)
  • TOOL-11: Bulk import dropdown - JSON array paste or file upload
  • TOOL-12: Test case generation modal (#testcase-gen-modal)
  • TOOL-13: Bulk test case generation (#bulk-testcase-gen-modal)

3.4 Resources #resources-panel

  • RES-1: List resources with pagination
  • RES-2: Create new resource
  • RES-3: Edit resource configuration
  • RES-4: Delete resource
  • RES-5: Activate/deactivate resource
  • RES-6: Resource URI validation
  • RES-7: Resource search functionality
  • RES-8: Resource template handling
  • RES-9: Test resource via #resource-test-modal (runResourceTest())
  • RES-10: View resource details

3.5 Prompts #prompts-panel

  • PRMT-1: List prompts with pagination
  • PRMT-2: Create new prompt
  • PRMT-3: Edit prompt details
  • PRMT-4: Delete prompt
  • PRMT-5: Activate/deactivate prompt
  • PRMT-6: Prompt arguments handling
  • PRMT-7: Prompt search functionality
  • PRMT-8: Test prompt via #prompt-test-modal (runPromptTest())

3.6 Roots #roots-panel

  • ROOT-1: List roots
  • ROOT-2: Add new root URI
  • ROOT-3: Delete root
  • ROOT-4: Root path validation
  • ROOT-5: Export root configuration (exportRoot())

3.7 MCP Registry #mcp-registry-panel

  • REG-1: Browse registry servers
  • REG-2: Search registry
  • REG-3: Register individual server (per-server "Add" button)
  • REG-4: Check server status

Note: Bulk registration is not currently implemented. Only per-server registration is available.

Phase 4: Optional Entity CRUD Tests

4.1 A2A Agents (requires MCPGATEWAY_A2A_ENABLED)

  • A2A-1 through A2A-7: Full CRUD + test connectivity + view skills

4.2 gRPC Services (requires MCPGATEWAY_GRPC_ENABLED)

  • GRPC-1 through GRPC-7: Full CRUD + reflection + get methods

4.3 Users (requires EMAIL_AUTH_ENABLED + admin)

  • USR-1 through USR-8: Full CRUD + admin toggle + force password change

4.4 Teams (requires EMAIL_AUTH_ENABLED)

  • TEAM-1 through TEAM-7: Full CRUD + member management

4.5 API Tokens (requires EMAIL_AUTH_ENABLED)

  • TKN-1 through TKN-5: Generate, view, revoke, copy tokens

Phase 5: UI Interaction Tests

5.1 Navigation & Layout

  • NAV-1 through NAV-7: Tab switching, sidebar, hash navigation, dark mode

5.2 Tables & Pagination

  • TBL-1 through TBL-8: Button-based pagination (no text input), per-page size, row actions

Note: Pagination is button-based only (no direct page number text input).

5.3 Modals & Forms

  • MDL-1 through MDL-8: Open/close, backdrop, escape key, validation, nested modals

5.4 HTMX Interactions

  • HTMX-1 through HTMX-7: Tab loading, form submission, partial refresh, indicators

5.5 Search & Filter

  • SRCH-1 through SRCH-5: Client-side search, status filter, team filter, reset

Note: Entity search is client-side only and does NOT update URL params.

Phase 6: Optional Feature Tests

6.1 Observability (requires OBSERVABILITY_ENABLED)

  • OBS-1 through OBS-9: Traces, filters, saved queries, sub-tabs, auto-polling

Prerequisite: Tests require trace data or should handle empty states gracefully.

6.2 Performance (requires MCPGATEWAY_PERFORMANCE_TRACKING)

  • PERF-1 through PERF-5: Metrics, charts, latency percentiles

6.3 LLM Chat (requires LLMCHAT_ENABLED)

  • LLM-1 through LLM-4: Chat interface, model selection, error handling

6.4 LLM Settings (requires LLMCHAT_ENABLED)

  • LLMS-1 through LLMS-5: Provider CRUD, model availability

6.5 ToolOps (requires TOOLOPS_ENABLED)

  • TOP-1 through TOP-2: Panel loads, configuration display

6.6 Plugins (requires PLUGINS_ENABLED, admin only)

  • PLG-1 through PLG-5: List, enable/disable, details, refresh

Phase 7: Admin-Only Features

7.1 Metrics #metrics-panel

  • MET-1 through MET-3: Dashboard cards, key stats, cache statistics

7.2 Export/Import #export-import-panel

  • EXP-1 through IMP-5: Export, selective export, import, validation, progress

7.3 System Logs #logs-panel

Prerequisite: Requires STRUCTURED_LOGGING_DATABASE_ENABLED=true

  • LOG-1 through LOG-4: View, filter, search, download logs

7.4 Maintenance #maintenance-panel

  • MNT-1 through MNT-5: Cleanup, rollup operations

7.5 Version Info #version-info-panel

  • VER-1 through VER-6: App info, platform info, services status, support bundle

Phase 8: Visual & Cross-Browser

8.1 Visual Regression (1% Pixel Threshold)

  • VIS-1 through VIS-8: Page baselines, dark mode, mobile, per-browser baselines

8.2 Responsive Design

  • RESP-1 through RESP-6: Mobile/tablet/desktop viewports, sidebar behavior

8.3 Cross-Browser Matrix

  • XBROW-1 through XBROW-5: Chromium, Firefox, WebKit full suites

Phase 9: CI/CD Integration

  • CI-1 through CI-7: GitHub Actions, parallel execution, artifacts, reports, nightly runs

Phase 10: Accessibility Testing (WCAG AA Required)

⚠️ UI Changes Required: Many a11y tests will fail on current UI without product changes.
Related Issues: #2480 (manual testing), #2275 (keyboard navigation epic)
Recommendation: Mark tests as @pytest.mark.xfail until UI changes are implemented.

10.1 Core Accessibility (axe-core)

  • A11Y-1 through A11Y-6: axe-core scans on all pages, WCAG AA violations

10.2 Keyboard Navigation

  • A11Y-7 through A11Y-13: Tab order, focus indicators, modal trap, arrow keys

10.3 Screen Reader Support

  • A11Y-14 through A11Y-19: Labels, ARIA, live regions, landmarks, headings

10.4 Visual Accessibility

  • A11Y-20 through A11Y-23: Color contrast, dark mode, focus visibility

Phase 11: Performance & Memory Tests

11.1 Page Load (< 3s threshold)

  • PERF-T1 through PERF-T5: Login, dashboard, tab switch, modal, search

11.2 Memory Leak Detection

  • MEM-1 through MEM-5: Tab switching, modal cycles, HTMX loads, Chart.js cleanup

11.3 Resource Cleanup

  • MEM-6 through MEM-8: XHR cancellation, SSE close, intervals cleared

Phase 12: Chaos & Multi-Tab Testing (Lower Priority)

12.1 Multi-Tab Scenarios

  • CHAOS-1 through CHAOS-5: Multiple tabs, concurrent edits, cross-tab logout

12.2 Rapid Interaction

  • CHAOS-6 through CHAOS-9: Rapid navigation, double-click prevention

12.3 State Consistency

  • CHAOS-10 through CHAOS-13: LocalStorage, dark mode, tab state persistence

Phase 13: Network Simulation (Local Only)

13.1 Slow Network (Slow 3G: 500kbps, 400ms latency)

  • NET-1 through NET-4: Loading indicators, navigation, timeout errors

13.2 Offline Mode

  • NET-5 through NET-7: Graceful errors, retry button

13.3 Request Failures

  • NET-8 through NET-12: 500/401/403/404/timeout handling

Phase 14: Negative & Edge Case Tests

14.1 Input Validation

  • VAL-1 through VAL-7: XSS, SQL injection, long input, unicode, whitespace

14.2 Error Handling

  • ERR-1 through ERR-5: Field errors, duplicates, dependencies, session expiry

14.3 Data Integrity

  • DATA-1 through DATA-5: Immediate list updates after CRUD

Phase 15: Real-Time & SSE Tests

15.1 SSE Connection

  • SSE-1 through SSE-4: Connect, reconnect, UI updates, logout close

15.2 Live Updates

  • SSE-5 through SSE-7: Log streaming, observability updates, no duplicates

Makefile Targets

Target Browsers Purpose Time
test-ui-smoke Chromium Quick sanity check ~2 min
test-ui-smoke-all All 3 Smoke across browsers ~5 min
test-ui / test-ui-lite Chromium Full suite, single browser ~10 min
test-ui-full All 3 Complete cross-browser suite ~25 min
test-ui-a11y Chromium WCAG AA compliance ~3 min
test-ui-visual Chromium Visual regression (1% threshold) ~5 min
test-ui-perf Chromium Performance thresholds ~2 min
test-ui-chaos Chromium Multi-tab stability ~3 min
test-ui-slow-network Chromium Slow network (local only) ~5 min
test-ui-ci All 3 CI pipeline target ~25 min

Definition of Done

Per Test

  • Uses appropriate data-testid selectors where available
  • Uses skip_if_feature_disabled for optional features
  • Is deterministic (no flakiness, no time.sleep)
  • Cleans up created entities
  • Uses console_collector fixture to detect JS errors

Epic Complete

  • All phases completed (1-15)
  • CI pipeline runs full suite on PRs
  • Total execution < 15 minutes (parallel, single browser)
  • Visual regression baselines established (1% threshold)
  • Cross-browser tests passing (Chromium, Firefox, WebKit)
  • Accessibility tests pass WCAG AA
  • Performance tests pass (< 3s page load)
  • Feature detection works correctly (warns but doesn't fail)

Test Count Estimate

Category Test Count
Core Functionality ~180
Quality & Robustness ~99
Total Unique Tests ~279
Full (all 3 browsers) ~837

Success Criteria

Metric Target
Entity CRUD coverage 100%
Authentication flows 100%
CI execution time < 15 min (single browser)
Flaky test rate < 5%
Accessibility (WCAG AA) 100% compliance
Page load time < 3 seconds
Visual regression 1% pixel threshold
Console errors 0 (auto-detected)

Available data-testid Selectors

Selector Element
[data-testid="overview-tab"] Overview tab link
[data-testid="gateways-tab"] MCP Servers tab link
[data-testid="servers-tab"] Virtual Servers tab link
[data-testid="tools-tab"] Tools tab link
[data-testid="search-input"] Search input field
(and more...)

⚠️ Important: The data-testid inventory is incomplete. Many tabs/inputs do not have test IDs.
Fallback Strategy: [data-testid]#tab-* / #*-panel#element-id → semantic selectors


References


Important Notes

  1. Feature Detection: Tests MUST gracefully skip disabled features with a warning.

  2. Test Data Isolation: Use UUID-suffixed names for created entities.

  3. HTMX Waits: Do NOT use networkidle for admin pages - SSE keeps connection open. Use domcontentloaded + selector waits.

  4. Authentication: Uses PLATFORM_ADMIN_EMAIL/PASSWORD (form login), NOT BASIC_AUTH_USER/PASSWORD.

  5. Console Error Detection: All tests should use console_collector fixture.

  6. Accessibility is Required: WCAG AA compliance is a requirement, not optional.

  7. Browser Matrix: -lite targets = Chromium only; full targets = all 3 browsers.

  8. Network Simulation: Local testing only, not run in CI.

  9. Visual Regression: 1% pixel threshold, per-browser baselines in baselines/{browser}/.


Open Questions (Resolved)

  1. data-testid: Add incrementally as tests are written
  2. Stub Services: Use skip-with-warning; optionally add stub fixtures
  3. A11y vs UI Changes: Mark as xfail until UI changes implemented ([EPIC][A11Y]: Keyboard navigation and shortcuts #2275, [TESTING][ACCESSIBILITY]: Admin UI WCAG Compliance, Keyboard Navigation, Screen Reader Support #2480)
  4. Multi-Browser Scope: Full matrix for smoke/visual; single browser for full suite
  5. SSE Fixtures: Create fixtures that call APIs to generate observable events
  6. Password Change Tests: Separate test-auth-enforcement.env profile
  7. Feature Detection Auth: enabled_features depends on authenticated_page fixture

Metadata

Metadata

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releasechoreLinting, formatting, dependency hygiene, or project maintenance choresepicLarge feature spanning multiple issuesfrontendFrontend development (HTML, CSS, JavaScript)javascriptJavascript or typescriptplaywrightAutomated UI testing with playwrightpythonPython / backend development (FastAPI)test-automationAutomated testingtestingTesting (unit, e2e, manual, automated, etc)uiUser Interface

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions