Skip to content

[Obs Presentation] SLO Error: Can not create edge >rabbitmq/rmdata.Web.Common.Dto:PluginProtocol~rmdata.Web.Service with nonexistant source >rabbitmq/rmdata.Web.Common.Dto:PluginProtocol #256174

@rmyz

Description

@rmyz

Summary

Sub-issue of #255799.

Our SLOs are detecting the following error:

Can not create edge >rabbitmq/rmdata.Web.Common.Dto:PluginProtocol~rmdata.Web.Service with nonexistant source >rabbitmq/rmdata.Web.Common.Dto:PluginProtocol

Investigation

Error details

Field Value
Error type Error
Error label PageFatalReactError
Page /app/apm/service-map
Route pattern /service-map
Service kibana-frontend v9.4.0
Origin apm plugin chunk (apm.chunk.5984.js, functions Yo, restore, new Gm, add)
Environment Production (Serverless Observability, Azure northeurope)
Browser Chrome 145 (Windows)
Feature flag flag_apm_serviceMapUseReactFlow: false (using legacy Cytoscape-based service map)

Root cause analysis

The APM service map page crashes when the graph library (Cytoscape.js) attempts to add an edge between two nodes, but the source node does not exist in the graph. The error message is explicit: it cannot create an edge from >rabbitmq/rmdata.Web.Common.Dto:PluginProtocol to rmdata.Web.Service because the source node >rabbitmq/rmdata.Web.Common.Dto:PluginProtocol has not been added to the graph.

This is a data consistency issue between the service map API response and the graph construction logic. The backend returns a list of nodes (services and external dependencies) and a list of edges (connections between them). When an edge references a node ID that is not present in the node list, Cytoscape.js throws an unrecoverable error.

The problematic node ID contains special characters -- slashes (/), dots (.), and colons (:) -- from a RabbitMQ queue/exchange name (rmdata.Web.Common.Dto:PluginProtocol). This suggests the node ID generation or matching logic may be mishandling complex dependency names, either by:

  1. Filtering out the node but keeping the edge: The backend or frontend filters/deduplicates external dependency nodes differently from how it generates edges, leaving orphaned edge references.
  2. ID mismatch due to encoding/escaping: The node ID and the edge's source reference are generated through different code paths that encode special characters differently, so they don't match.
  3. External dependency grouping inconsistency: External dependencies (like RabbitMQ queues) are grouped under a parent node (e.g., >rabbitmq) on the node side, but the edge references the specific sub-resource ID that was not materialized as a separate node.

The user was viewing the service map with ENVIRONMENT_ALL across a 15-day range (rangeFrom=now-15d), which increases the likelihood of surfacing complex inter-service topologies with messaging queues and unusual dependency names.

Stacktrace analysis

Frame File Function Role
1 apm.chunk.5984.js Yo APM service map code that triggers the graph error -- likely the edge validation/error throw
2 apm.chunk.5984.js restore Cytoscape.js restore method -- restores elements to the graph after batch construction
3 apm.chunk.5984.js new Gm Cytoscape.js graph model constructor -- validates edges during graph initialization
4 apm.chunk.5984.js add Cytoscape.js add method -- adds elements (nodes + edges) to the graph
5 apm.chunk.5984.js <anonymous> APM component that calls cy.add(elements) to populate the service map
6-10 kbn-ui-shared-deps-npm.dll.js PltcFuiiPu React render/commit cycle and effect scheduling

All five APM frames are in the same chunk (apm.chunk.5984.js), which bundles the Cytoscape.js library together with the service map component. The call flow is: React effect (frame 5) calls cy.add() (frame 4) which constructs graph models (frame 3), restores elements (frame 2), and validates edge sources (frame 1), throwing when the source node is missing.

Suspect areas (prioritized)

Priority Location Reason
1 Service map element transformation (nodes/edges generation) The code that transforms the /api/apm/service-map API response into Cytoscape elements likely produces edges whose source/target IDs don't match the generated node IDs, especially for external dependencies with complex names containing /, ., and :.
2 Service map API backend (node/edge generation) The backend may return edge data referencing granular external dependency sub-resources (e.g., specific RabbitMQ queues) while only returning a grouped parent node (e.g., >rabbitmq), creating a mismatch.
3 External dependency node ID encoding The > prefix convention for external dependencies combined with the / and : characters in the queue name may cause inconsistent ID generation between the node-creation and edge-creation code paths.

Key files

  • x-pack/solutions/observability/plugins/apm/public/components/app/service_map/ -- service map React component and Cytoscape integration
  • x-pack/solutions/observability/plugins/apm/public/components/app/service_map/cytoscape_options.ts -- Cytoscape configuration and element handling
  • x-pack/solutions/observability/plugins/apm/server/routes/service_map/ -- backend route that returns nodes and edges for the service map
  • x-pack/solutions/observability/plugins/apm/common/service_map.ts -- shared types and ID generation utilities for service map elements

Suggested fixes

  1. Validate edges before adding to graph: Before calling cy.add(elements), filter out any edges whose source or target node ID is not present in the node set. Log a warning for discarded edges instead of crashing the page.
  2. Ensure node/edge ID consistency: Audit the code paths that generate node IDs and edge source/target references for external dependencies, ensuring they produce identical strings for the same dependency regardless of special characters.
  3. Add missing dependency nodes as fallback: If an edge references a node that doesn't exist, auto-create a placeholder node for that dependency rather than crashing. This is a defensive approach that preserves the graph topology.

Reproduction

  1. Open a serverless observability project with services that communicate via RabbitMQ (or other messaging queues with complex names containing /, ., :)
  2. Navigate to /app/apm/service-map
  3. Set environment to ENVIRONMENT_ALL and a broad time range (e.g., 15 days)
  4. The page crashes when Cytoscape attempts to render edges for messaging queue dependencies whose node IDs don't match

The specific URL from the error document:

/app/apm/service-map?comparisonEnabled=true&environment=ENVIRONMENT_ALL&kuery=&offset=1296000000ms&rangeFrom=now-15d&rangeTo=now&serviceGroup=

Additional context

  • The flag_apm_serviceMapUseReactFlow feature flag is false, meaning this user is on the legacy Cytoscape-based service map. If the ReactFlow-based service map handles edge validation differently, this error may not occur when that flag is enabled.
  • The transaction type is user-interaction, and error.custom.classes contains euiFieldNumber css-vehtd3-euiFieldNumber-compressed, suggesting the error was triggered after the user interacted with a numeric input field on the page (possibly adjusting the time range or comparison offset), which caused a service map re-render with updated data.
  • The same project (rmdata_dev_serverless) and organization also appears in other SLO errors, suggesting this environment has a complex service topology that exercises edge cases in the APM UI.
  • The error is tagged PageFatalReactError, meaning it crashes the entire page for the user.
  • Observed on Kibana 9.4.0 (git rev d83e0ba465ad) in an Azure northeurope serverless observability project.

Metadata

Metadata

Assignees

Labels

Team:obs-presentationFocus: APM UI, Infra UI, Hosts UI, Universal Profiling, Obs Overview and left NavigationbugFixes for quality problems that affect the customer experience

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions