Skip to content

2182 - Observability loading issues#2208

Merged
crivetimihai merged 6 commits intomainfrom
2182-observability-loading-issues
Jan 20, 2026
Merged

2182 - Observability loading issues#2208
crivetimihai merged 6 commits intomainfrom
2182-observability-loading-issues

Conversation

@gcgoncalves
Copy link
Copy Markdown
Collaborator

@gcgoncalves gcgoncalves commented Jan 20, 2026

🐛 Bug-fix PR

📌 Summary

This PR addresses several issues:

  1. PostgreSQL GroupingError in Observability: The observability API endpoints for tools, prompts, and resources would fail when using a PostgreSQL database due to an SQL GROUP BY error. This prevented users on PostgreSQL
    from using the observability features.
  2. Frontend Chart Rendering Bugs: The observability dashboard experienced rendering errors, such as "Canvas is already in use," when navigating between tabs or when charts were redrawn. This led to a broken and unreliable
    user experience.
  3. Inconsistent Tooling Behavior: Some data-generating tools did not correctly handle workspace directories, and some write-intensive tools did not properly respect read-only mode.
  4. Partial Data Loading in UI: The observability sub-tabs (Metrics, Tools, Prompts) would re-fetch and re-render their content unnecessarily on every view, leading to slow performance and flickering.

These fixes are important for ensuring the platform is stable, performant, and database-agnostic, providing a reliable experience for administrators and developers monitoring the system.

Closes #2182

🔁 Reproduction Steps

  1. PostgreSQL GroupingError:

    • Configure the application with a PostgreSQL database.
    • Navigate to Admin > Observability and open the "Tools", "Prompts", or "Resources" sub-tabs.
    • Observe API failures for the analytics data and errors in the UI.
  2. Frontend Chart Rendering Errors:

    • Navigate to Admin > Observability.
    • Quickly switch between the "Metrics", "Tools", and "Prompts" tabs.
    • Observe "Canvas is already in use" errors in the browser's developer console and broken chart visualizations.

🐞 Root Cause

  • PostgreSQL Error: The issue stemmed from SQLAlchemy generating different column expressions for the SELECT list and the GROUP BY clause when extracting data from JSON fields. PostgreSQL requires these expressions to be identical, which was not the case, leading to a GroupingError.
  • Frontend Chart Bug: Chart.js instances were not being properly destroyed before being re-initialized on the same <canvas> element. This occurred because each observability panel managed its own charts independently and without a proper cleanup lifecycle when the view was changed.

💡 Fix Description

  • PostgreSQL Error: The queries in mcpgateway/admin.py were refactored to create a single SQLAlchemy expression for the JSON field extraction. This expression is now stored in a variable and reused in both the SELECT and GROUP BY clauses, ensuring compatibility with PostgreSQL's strict grouping requirements.
  • Frontend Chart Bug:
    • A global window.chartRegistry was introduced in mcpgateway/static/admin.js to manage the lifecycle of all Chart.js instances centrally.
    • The observability panels now register their charts with this registry and use it to destroy old charts before creating new ones.
    • Logic was added to automatically destroy charts when a user navigates away from the observability tab, preventing memory leaks and rendering conflicts.
    • The partial views are now loaded only once and cached, preventing redundant API calls and re-renders.

📋 Testing

  • End-to-End Test (tests/e2e/test_admin_apis.py): A new test test_observability_endpoints_with_database was added to call the affected API endpoints and ensure they return a successful response with the correct data
    structure when backed by a database.
  • Unit Tests (tests/unit/mcpgateway/test_admin_observability_sql.py): New test classes were added to verify the query construction. These tests are parameterized to run against both "postgresql" and "sqlite" dialects,
    ensuring the fix works for both and doesn't introduce regressions.

🧪 Verification

Check Command Status
Lint suite make lint
Unit tests make test
Coverage ≥ 90 % make coverage
Manual regression no longer fails steps / screenshots

✅ Checklist

  • Code formatted (make black isort pre-commit)
  • No secrets/credentials committed

@gcgoncalves gcgoncalves force-pushed the 2182-observability-loading-issues branch from 5c76698 to c0d0418 Compare January 20, 2026 10:43
@gcgoncalves gcgoncalves marked this pull request as ready for review January 20, 2026 11:27
gcgoncalves and others added 6 commits January 20, 2026 20:04
Resolves a GroupingError that occurred in PostgreSQL when querying
observability data (tools, prompts, resources). The error was caused by
SQLAlchemy generating different column expressions for the SELECT list
and the GROUP BY clause when extracting data from JSON fields.

This fix refactors the queries to create a single expression for the
JSON field extraction and reuses it in both the SELECT and GROUP BY
clauses, ensuring compatibility with PostgreSQL's strict grouping requirements.

Additionally, this commit introduces:
  - Unit tests parametrized for both PostgreSQL and SQLite dialects to
  verify the fix at the query construction level.
  - An end-to-end test that calls the affected API endpoints to ensure they
  function correctly with a database backend.

Signed-off-by: Gabriel Costa <gabrielcg@proton.me>
Introduces a global Chart.js registry to manage chart instances
across the admin UI, preventing "canvas already in use" errors.
Enhances observability panels (metrics, tools, prompts) by:

 - Centralizing chart destruction and registration.
 - Implementing visibility checks to defer rendering of charts on hidden canvases.
 - Adding safeguards to prevent redundant and concurrent loading of partial views.
 - Ensuring proper cleanup of charts and auto-refresh intervals when navigating
  between tabs or unloading the page.

These changes improve the stability and performance of the observability
dashboard, providing a smoother user experience.

Signed-off-by: Gabriel Costa <gabrielcg@proton.me>
Signed-off-by: Gabriel Costa <gabrielcg@proton.me>
The RBAC decorator expects a dict with 'email' and 'db' keys, not a
MagicMock. Updated the tests to use create_mock_user_context() which
provides the correct structure for permission checks.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
- Remove redundant chart destruction on observability tab entry
  (charts are already destroyed when leaving the tab)
- Replace ineffective 'destroy' DOM event listeners with beforeunload
  handlers (Alpine.js doesn't emit 'destroy' DOM events)
- Move e2e test before __main__ block so it runs with pytest --main
- Update observability_resources.html to use global chart registry
  for consistent chart lifecycle management
- Add resources- prefix to chart cleanup when leaving observability tab

These fixes address blank chart issues when returning to the
observability tab and ensure proper cleanup of intervals/handlers.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Addresses two issues identified in review:

1. Blank charts on return: When leaving observability tab, charts are
   destroyed but `metricsLoaded` etc. flags remained true, preventing
   partials from reloading. Now dispatch `observability:leave` event
   that resets loaded flags so partials reload on return.

2. Orphaned intervals: Auto-refresh timers continued running after
   leaving the tab. Now each partial (metrics, tools, prompts, resources)
   listens for `observability:leave` and calls cleanup() to stop intervals.

The cleanup lifecycle is now:
- Tab leave: dispatches observability:leave event
- Partials: listen for event, call cleanup() to stop intervals
- Parent: resets loaded flags so partials reload on next visit
- Event handlers are properly removed in cleanup() to avoid leaks

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai force-pushed the 2182-observability-loading-issues branch from c0d0418 to 03b638a Compare January 20, 2026 21:43
@crivetimihai crivetimihai merged commit 0dbceb6 into main Jan 20, 2026
51 checks passed
@crivetimihai crivetimihai deleted the 2182-observability-loading-issues branch January 20, 2026 21:52
kcostell06 pushed a commit to kcostell06/mcp-context-forge that referenced this pull request Feb 24, 2026
* Resolve PostgreSQL GROUP BY error in analytics

Resolves a GroupingError that occurred in PostgreSQL when querying
observability data (tools, prompts, resources). The error was caused by
SQLAlchemy generating different column expressions for the SELECT list
and the GROUP BY clause when extracting data from JSON fields.

This fix refactors the queries to create a single expression for the
JSON field extraction and reuses it in both the SELECT and GROUP BY
clauses, ensuring compatibility with PostgreSQL's strict grouping requirements.

Additionally, this commit introduces:
  - Unit tests parametrized for both PostgreSQL and SQLite dialects to
  verify the fix at the query construction level.
  - An end-to-end test that calls the affected API endpoints to ensure they
  function correctly with a database backend.

Signed-off-by: Gabriel Costa <gabrielcg@proton.me>

* 2182 - Improve observability charts lifecycle

Introduces a global Chart.js registry to manage chart instances
across the admin UI, preventing "canvas already in use" errors.
Enhances observability panels (metrics, tools, prompts) by:

 - Centralizing chart destruction and registration.
 - Implementing visibility checks to defer rendering of charts on hidden canvases.
 - Adding safeguards to prevent redundant and concurrent loading of partial views.
 - Ensuring proper cleanup of charts and auto-refresh intervals when navigating
  between tabs or unloading the page.

These changes improve the stability and performance of the observability
dashboard, providing a smoother user experience.

Signed-off-by: Gabriel Costa <gabrielcg@proton.me>

* Linting and testing fix

Signed-off-by: Gabriel Costa <gabrielcg@proton.me>

* fix(tests): use proper user context in observability SQL tests

The RBAC decorator expects a dict with 'email' and 'db' keys, not a
MagicMock. Updated the tests to use create_mock_user_context() which
provides the correct structure for permission checks.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(observability): address chart lifecycle and cleanup issues

- Remove redundant chart destruction on observability tab entry
  (charts are already destroyed when leaving the tab)
- Replace ineffective 'destroy' DOM event listeners with beforeunload
  handlers (Alpine.js doesn't emit 'destroy' DOM events)
- Move e2e test before __main__ block so it runs with pytest --main
- Update observability_resources.html to use global chart registry
  for consistent chart lifecycle management
- Add resources- prefix to chart cleanup when leaving observability tab

These fixes address blank chart issues when returning to the
observability tab and ensure proper cleanup of intervals/handlers.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(observability): stop auto-refresh and reset state on tab leave

Addresses two issues identified in review:

1. Blank charts on return: When leaving observability tab, charts are
   destroyed but `metricsLoaded` etc. flags remained true, preventing
   partials from reloading. Now dispatch `observability:leave` event
   that resets loaded flags so partials reload on return.

2. Orphaned intervals: Auto-refresh timers continued running after
   leaving the tab. Now each partial (metrics, tools, prompts, resources)
   listens for `observability:leave` and calls cleanup() to stop intervals.

The cleanup lifecycle is now:
- Tab leave: dispatches observability:leave event
- Partials: listen for event, call cleanup() to stop intervals
- Parent: resets loaded flags so partials reload on next visit
- Event handlers are properly removed in cleanup() to avoid leaks

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

---------

Signed-off-by: Gabriel Costa <gabrielcg@proton.me>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][UI]: Metrics flickering on

2 participants