Skip to content

perf(dashboard): batch per-source reads to fix source-toggle 429s without exempting GETs (#3735)#3748

Merged
Yeraze merged 2 commits into
mainfrom
fix/dashboard-batch-reads-3735
Jun 25, 2026
Merged

perf(dashboard): batch per-source reads to fix source-toggle 429s without exempting GETs (#3735)#3748
Yeraze merged 2 commits into
mainfrom
fix/dashboard-batch-reads-3735

Conversation

@Yeraze

@Yeraze Yeraze commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes #3735 (source toggling trips the API rate limiter → 429s break the UI for 15 min) by cutting request volume, rather than exempting all GETs from the limiter as proposed in #3741.

Why not just exempt GETs (#3741)?

skip: ... || req.method === 'GET' removes back-pressure entirely for reads — exactly what hammers low-powered or heavily-utilized servers. The actual problem is that the dashboard fires far too many requests:

  • Per 15s poll, the unified view fetched one GET per dataset per source: nodes + traceroutes + neighbor-info + channels (4×N), plus status (N).
  • At 5 sources that's ~25 req/15s ≈ 1.7/sec — already over the 1000-per-15-min (~1.1/sec) limit before the user touches anything. Toggling just spikes it past the edge.

Fix — batch the reads

Backend

  • src/server/services/sourceDashboardData.tsbuildSourceNodes/Channels/Traceroutes/NeighborInfo extracted verbatim from the existing per-dataset handlers (same masking/gating), plus buildSourceDashboard which applies the same per-dataset permission checks the routes did via middleware.
  • GET /api/sources/:id/dashboard — bundles a source's four datasets (single-source view: 5→1).
  • GET /api/unified/dashboard?sources= — one request for the entire unified view (4N→1). A dataset the caller can't read returns [] rather than 403-ing the whole bundle.
  • The four individual endpoints (/nodes, /traceroutes, /neighbor-info, /channels) are now thin wrappers over the shared builders — behavior unchanged, proven by the existing sourceRoutes.* suites staying green.

Frontend

  • useDashboardSourceData and useDashboardUnifiedData consume the aggregates.
  • The source-toggle refresh is debounced so a flurry of rapid toggles coalesces into a single refetch cycle.

Result

Unified view drops from ~4N+N to ~1+N requests per poll, and the existing rate limiter stays fully in place — no protection removed. This supersedes #3741.

Tests

  • sourceRoutes.dashboard.test.ts — bundle shape + per-dataset permission gating (denied dataset → [], no wasted query).
  • unifiedRoutes.test.ts/unified/dashboard returns one bundle per source and honors ?sources=.
  • useDashboardData.test.ts — updated to assert the batched request shape.
  • Full suite: 7544 passed, 0 failed.

Closes #3735

🤖 Generated with Claude Code

…ut request volume (#3735)

Source-heavy dashboards fired one GET per dataset per source on every 15s
poll (nodes+traceroutes+neighbor-info+channels = 4×N, plus status = N). At 5
sources that's ~25 req/15s — already over the 1000-per-15-min API limit before
any user action, and toggling a source spiked it enough to trip 429s that broke
the UI for the full window (#3735). The proposed fix (exempting all GETs from
the limiter, #3741) removes back-pressure protection for low-powered / busy
servers; this instead cuts the volume at the source.

Backend:
- New shared builders (src/server/services/sourceDashboardData.ts) extracted
  verbatim from the per-dataset GET handlers, so individual routes and the new
  aggregates compute identical, identically-masked results.
- GET /api/sources/:id/dashboard — bundles a source's 4 datasets (5→1 for the
  single-source view).
- GET /api/unified/dashboard?sources= — one request for the whole unified view
  (4N→1). Per-dataset permission gating mirrors the individual routes (a denied
  dataset returns [] rather than 403-ing the bundle).

Frontend:
- useDashboardSourceData / useDashboardUnifiedData consume the aggregates.
- Debounce the source-toggle refresh so a flurry of toggles coalesces into one
  refetch cycle.

Net: unified view ~4N+N → ~1+N requests/poll; the existing rate limiter stays
fully in place. The four individual endpoints are unchanged in behavior
(validated by the existing sourceRoutes suites).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SJPe6J5vKrcbwzt6vCdtrt
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude encountered an error —— View job


I'll analyze this and get back to you.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: perf(dashboard): batch per-source reads to fix source-toggle 429s

Tasks

  • Gather context and understand the PR
  • Read changed files
  • Analyze code quality, bugs, security, performance, and test coverage
  • Post review feedback

Overall this is a well-designed, well-motivated change. The core approach — extract shared builder functions, add two aggregate endpoints, swap the frontend to consume them — is exactly right. A few issues worth addressing before merge:


Security

/api/unified/dashboard is accessible to unauthenticated callers

unifiedRoutes.ts:1309 registers this endpoint under router.use(optionalAuth()) (line 73), which means an anonymous user can call GET /api/unified/dashboard with no authentication and receive all source bundles. Each bundle goes through buildSourceDashboard which calls hasPermission(user, …) with user = null. Verify this path returns [] for every dataset for anonymous users. If buildSourceChannels (which uses the optionalAuth pattern from the individual channel route) is intended to return data for anonymous users with channel permissions, that's fine — but make sure the behavior is intentional and consistent with the pre-existing per-source endpoints. The individual /api/sources/:id/dashboard correctly uses optionalAuth() as well. Looks consistent, but worth a comment confirming the anonymous-access surface is deliberate.

CSV injection via ?sources= parameter

unifiedRoutes.ts:1313-1314:

const requested = typeof req.query.sources === 'string' && req.query.sources.length > 0
  ? new Set(req.query.sources.split(',').map((s) => s.trim()).filter(Boolean))
  : null;

The values are only used as keys into allSources.filter(s => requested.has(s.id)). Since s.id values come from the database and the filter is a simple set membership test, there's no injection risk. However, there's no length guard on the sources string — a caller could send a 1 MB comma-separated string. Consider adding req.query.sources.length < 4096 or similar.


Bugs / Correctness

buildSourceNodes passes wrong sourceId to getNode (sourceDashboardData.ts:93)

const localNode = await databaseService.nodes.getNode(localNodeInfo.nodeNum);

The individual GET /:id/nodes handler in the old code called getNode(localNodeInfo.nodeNum, source.id) (two args). The builder here calls it without the sourceId argument. If getNode has an optional sourceId overload and returns results scoped by source, omitting it could return a node from a different source. Compare to the equivalent logic that was in the route before this PR was opened.

traceroutes limit parameter is not forwarded from the aggregate endpoint

buildSourceTraceroutes accepts a limit parameter (default 50), but buildSourceDashboard calls it without specifying a limit (buildSourceTraceroutes(source, user)). The individual GET /:id/traceroutes endpoint honored ?limit= from the query string. The aggregate dashboard endpoint silently discards that knob. This is probably intentional (dashboard always gets the default 50), but the PR description says "behavior unchanged" — worth noting this difference. A ?tracerouteLimit= knob on the aggregate endpoint would let callers tune it without breaking the default.

buildSourceNeighborInfo calls getSetting('maxNodeAgeHours') on every poll per source

sourceDashboardData.ts:205 — this is the same as the pre-existing per-source route (so not a regression), but the buildSourceDashboard function now calls all four builders in parallel via Promise.all. On a 5-source setup that's 5 concurrent calls to getSetting('maxNodeAgeHours'). The setting is global — it could be cached or fetched once and passed in. Not a correctness bug, but relevant to the "cut server load" goal of this PR.


Code Quality

any[] typed mcNodes array in buildSourceNodes (sourceDashboardData.ts:51)

const mcNodes: any[] = [];

This was probably copied from the existing route handler (the PR claims verbatim extraction). A typed intermediate would be better than any[], but acceptable as-is since the shape is clearly documented by the returned literal objects.

Magic constants inline (sourceDashboardData.ts:237-238)

const TX_MQTT = 5;
const TX_UDP = 6;

These are duplicated from the equivalent code in the pre-existing neighbor-info handler. This PR copies them instead of referencing the canonical meshtastic.ts constants (or at least a named import). CLAUDE.md calls out: "Use shared constants from src/server/constants/meshtastic.ts for PortNum, RoutingError, and helper functions — never magic numbers for protocol values." This applies here.

unknown[] return types on public API

All four builder functions return Promise<unknown[]>. The downstream callers cast with (req as any).user and spread ...ni etc., which works but makes type-checking fragile. The SourceDashboardPayload interface uses unknown[] for all four fields. Given the builders are now a stable public API shared across two routes plus tests, tightening these types would help — even if only to a dedicated interface per dataset.

Debounce on source toggle (DashboardPage.tsx:73-78)

The 400ms debounce coalesces rapid toggle refetches, which is good. However, the refreshSourcesTimer ref is never cleaned up on unmount. If the component unmounts while the timer is pending, the invalidateQueries fires against a potentially-stale QueryClient. Add a useEffect cleanup:

useEffect(() => () => {
  if (refreshSourcesTimer.current) clearTimeout(refreshSourcesTimer.current);
}, []);

Test Coverage

Tests are good overall. A few gaps:

  1. sourceRoutes.dashboard.test.ts doesn't test the meshcore source type. buildSourceNodes has a substantial MeshCore branch that returns early — if a meshcore source is passed, the test would not cover it.

  2. unifiedRoutes.test.ts — the ?sources= test only verifies one source is returned, but doesn't test that a malformed/unknown source id (e.g. ?sources=,,,) is handled gracefully. Currently filter(Boolean) drops empty strings, but a source id that exists as whitespace (' ') would be included in the Set but filter would also drop it after trim(). Worth a test.

  3. Debounce behavior in DashboardPage is not tested — the useDashboardSourceData and useDashboardUnifiedData hooks correctly test their fetch shape, but the coalescing of rapid toggle-refetch cycles is only tested implicitly.

  4. Anonymous access path for GET /unified/dashboard has no test — add one asserting the endpoint returns [] arrays (not 403) for datasets that require auth.


Minor Nits

  • sourceDashboardData.ts:37SourceRow is typed as { id: string; name: string; type: string }. This is a subset of the DB row. Fine, but a comment saying "matches the shape returned by getAllSources" would help future readers avoid accidentally trusting fields that aren't there.
  • unifiedRoutes.ts:1317Promise.all(selected.map(…)) runs all source dashboards concurrently. On a server with 20 sources this could spawn 20×4 concurrent DB queries. Consider adding a concurrency limit (e.g. via p-limit) for large deployments, or at least document the tradeoff.

Summary

The approach is sound and the request reduction math checks out. The main items I'd want addressed before merge:

  1. Bug: getNode(localNodeInfo.nodeNum) missing source.id arg (may be a regression vs. prior code).
  2. Rule violation: TX_MQTT = 5; TX_UDP = 6; — use the constants from meshtastic.ts.
  3. Missing cleanup: timer ref leak on unmount in DashboardPage.
  4. Nice to have: test for anonymous access to /unified/dashboard and for the MeshCore branch in the dashboard bundle.

…rs once, guard ?sources= length

- buildSourceNeighborInfo: replace inline TX_MQTT=5/TX_UDP=6 with the canonical
  TransportMechanism.MQTT / .MULTICAST_UDP constants (CLAUDE.md).
- maxNodeAgeHours is global: fetch it once in the /unified/dashboard route and
  thread it through buildSourceDashboard → buildSourceNeighborInfo, so a
  multi-source poll no longer re-queries the same setting once per source.
- Cap the ?sources= query string (4096) before split; document that anonymous
  access yields empty bundles via the existing per-dataset permission gates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SJPe6J5vKrcbwzt6vCdtrt
@Yeraze

Yeraze commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

Thanks for the review — addressed in d6986138:

Fixed

  • Magic transport constantsTX_MQTT=5/TX_UDP=6 now use the canonical TransportMechanism.MQTT / TransportMechanism.MULTICAST_UDP from constants/meshtastic.ts (per CLAUDE.md).
  • maxNodeAgeHours fetched per source — good catch given the PR's load goal. It's a global setting, so /unified/dashboard now fetches it once and threads it through buildSourceDashboard → buildSourceNeighborInfo (single-source callers still self-fetch). One getSetting per poll instead of N.
  • ?sources= length — capped at 4096 chars before split (no injection surface since values are only set-membership keys against DB ids, but this bounds the allocation).
  • Anonymous access — confirmed deliberate and added a comment: with user=null the per-dataset gates in buildSourceDashboard return [] for everything, so anonymous callers get empty bundles. Matches the per-source endpoints.

Declined (with reasoning)

  • getNode(localNodeInfo.nodeNum) missing sourceId — not a regression: this is verbatim from the original nodes route (origin/main sourceRoutes.ts:990), whose comment states the lookup is intentional "regardless of sourceId" (the local node is injected even if its row is owned by another source). The 2-arg call you may be thinking of is in a different handler (the /status route). Kept as-is to preserve behavior.
  • Traceroute ?limit= not exposed on the aggregate — intentional. The dashboard frontend never passed ?limit (it always used the default 50), so behavior is unchanged for the only caller; the individual /:id/traceroutes endpoint still honors the knob for anything else. Didn't add a ?tracerouteLimit= since nothing needs it (YAGNI).
  • any[] on mcNodes — verbatim from the original handler; left as-is to keep the extraction a faithful no-behavior-change move (as you noted, acceptable).

Full suite green (7544 passed) after the changes.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: perf(dashboard): batch per-source reads to fix source-toggle 429s

Tasks

  • Gather context and understand the PR
  • Read changed files (post-fixup commit)
  • Analyze code quality, bugs, security, performance, and test coverage
  • Post review feedback

This is a well-structured, high-impact change and the fixup commit addressed all the critical issues from the first review round. The code is in good shape. A few remaining items worth calling out:


Bug: getNode still called without sourceId in buildSourceNodes

sourceDashboardData.ts:99

const localNode = await databaseService.nodes.getNode(localNodeInfo.nodeNum);

The author declined this in the previous round citing that it's "verbatim from the original nodes route," but that claim doesn't hold up. Looking at the actual original node list route in sourceRoutes.ts:

// sourceRoutes.ts:852 — the original GET /:id/nodes handler
.getNode(localNodeInfo.nodeNum, source.id)

The original uses source.id. The /status route also calls getNode(localNodeInfo.nodeNum, source.id) (line 852). The only place getNode is called without a sourceId is in the nodes.test.ts file as legacy API. The getNode signature (nodeNum, sourceId?: string) scans all sources when sourceId is omitted — so this call in buildSourceNodes can return a node record owned by a different source, which then gets pushed into the current source's node list. This is a cross-source data leak for deployments with 2+ sources and matching local-node nodeNums (uncommon, but possible).

Fix: change line 99 to:

const localNode = await databaseService.nodes.getNode(localNodeInfo.nodeNum, source.id);

Correctness: Anonymous users get empty channels (unintentionally?)

buildSourceChannels (sourceDashboardData.ts:176):

if (user && await hasPermission(user, channelResource, 'read', source.id)) {
  accessible.push(channel);
}

When user is null, no channels are added. But the individual GET /:id/channels endpoint (sourceRoutes.ts:974) uses optionalAuth() and the old channel route (pre-PR) did expose channel metadata to unauthenticated users based on per-channel permission. After this PR, anonymous users get channels: [] from the dashboard bundle. If the pre-existing /channels endpoint served public channels to anonymous users, the dashboard bundle now silently drops those channels for unauthenticated visitors.

This may be intentional — the PR description states "with user=null the per-dataset gates … return [] for everything" — but that's not actually true for channels in the original route: channels at optionalAuth were per-channel gated, not wholesale blocked. Worth verifying the pre-PR behavior on GET /:id/channels with no session matches the new buildSourceChannels(source, null) output.


Missing timer cleanup on unmount (DashboardPage.tsx)

First round identified this; it was not addressed in d698613. The refreshSourcesTimer ref (DashboardPage.tsx:72-78) is never cleaned up:

const refreshSourcesTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
const refreshSourcesDebounced = () => {
  if (refreshSourcesTimer.current) clearTimeout(refreshSourcesTimer.current);
  refreshSourcesTimer.current = setTimeout(() => {
    queryClient.invalidateQueries({ queryKey: ['dashboard', 'sources'] });
  }, 400);
};

If the component unmounts during the 400ms window (e.g., user navigates away immediately after toggling a source), invalidateQueries fires against a stale context. The fix is a single useEffect:

useEffect(() => () => {
  if (refreshSourcesTimer.current) clearTimeout(refreshSourcesTimer.current);
}, []);

Performance: Promise.all over all sources with no concurrency limit

unifiedRoutes.ts:1327:

const bundles = await Promise.all(selected.map((s) => buildSourceDashboard(s, user, { maxNodeAgeHours })));

Each buildSourceDashboard itself runs four concurrent DB queries via its inner Promise.all. At 20 sources that's 80 simultaneous DB queries on every poll. SQLite handles this fine (serialize under WAL), but PostgreSQL connection pools can get saturated. Not blocking, but worth noting for future multi-source scale work.


Test coverage gaps

  1. buildSourceNodes with source.id fix — the sourceRoutes.dashboard.test.ts beforeEach mocks mockDb.nodes.getNode but the test for "bundles all four datasets" doesn't assert the getNode call used source.id. If the bug above is fixed, a targeted assertion expect(mockDb.nodes.getNode).toHaveBeenCalledWith(expect.any(Number), 'src-mqtt') would prevent regression.

  2. Anonymous access to /unified/dashboard — previously flagged, still absent. There's no test asserting that createApp(null) GET /dashboard returns 200 with empty-array datasets rather than 403 or data leakage.

  3. useDashboardUnifiedData request URL — the hook encodes sources as a query param (/api/unified/dashboard?sources=src-1,src-2), but useDashboardData.test.ts doesn't have a test for useDashboardUnifiedData that verifies this URL shape. The test for mergeUnifiedSourceData is good, but the fetch path is untested.


Summary

Issue Severity
🐛 getNode without sourceId in buildSourceNodes — can return a node from a different source Medium
🔍 Anonymous users may silently lose public channel visibility vs. old /channels route Low
🧹 Timer ref not cleaned up on unmount Low
📊 No concurrency cap on Promise.all over all sources Low (future scale)
🧪 Missing test: anonymous access to /unified/dashboard Low

The getNode source-isolation bug is the only one I'd push on before merge — the rest are quality improvements. Everything else from the first review round was correctly addressed.


View job

@Yeraze

Yeraze commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

Closing the loop on the re-review before merge:

  • getNode without sourceId — verified against origin/main: line 990 (getNode(localNodeInfo.nodeNum), no sourceId) is inside the /:id/nodes route (router.get('/:id/nodes', … at 929), carrying the comment "Fetch the full node record from DB (regardless of sourceId) and inject it". Line 852 (getNode(…, source.id)) is a different route — /:id/status (at 763). So the nodes route always omitted sourceId intentionally; my helper is a faithful extraction and introduces no change. If that intentional "regardless of sourceId" injection is actually undesirable, it's a pre-existing behavior on main and should be changed deliberately in its own PR (it would now flow through buildSourceNodes for both callers) — not silently flipped inside a no-behavior-change refactor.
  • Anonymous channelsbuildSourceChannels is the verbatim body of the pre-PR /:id/channels handler (same optionalAuth + per-channel hasPermission gate). Anonymous output is therefore identical to before this PR; no metadata that used to be served is now dropped.
  • Timer cleanup on unmount — the queryClient is app-level (from the provider), not tied to DashboardPage's lifecycle, so a post-unmount invalidateQueries is a harmless no-op rather than a stale-context write. Happy to add the useEffect cleanup as a tidy-up follow-up, but it's not a correctness issue here.

Merging — CI green, full suite 7544 passing.

@Yeraze Yeraze merged commit dcf056c into main Jun 25, 2026
20 checks passed
@Yeraze Yeraze deleted the fix/dashboard-batch-reads-3735 branch June 25, 2026 16:11
Yeraze added a commit that referenced this pull request Jun 25, 2026
buildSourceNodes injected the manager's local node by calling
getNode(localNodeInfo.nodeNum) with no sourceId — a cross-source first-match
that, on a multi-source deployment where two sources have heard the same
nodeNum, could inject another source's row (and its position) into this
source's node list. Pass source.id so the lookup is per-source; if this source
hasn't stored the local node yet, it falls through to the synthesized minimal
record as before.

Pre-existing behavior carried over verbatim during the #3748 extraction
(origin/main sourceRoutes.ts:990, 'regardless of sourceId'); surfaced in the
#3748 review. The /:id/status route already scoped this lookup correctly.


Claude-Session: https://claude.ai/code/session_01SJPe6J5vKrcbwzt6vCdtrt

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Source toggling triggers HTTP 429 rate limit, breaking the UI for 15 minutes (v4.11.6)

1 participant