Skip to content

x-pack/filebeat/input/entityanalytics: add minimal-state mode for agentless environments #49159

@efd6

Description

@efd6

The entity-analytics input stores state in a local bbolt database (one file per input at <data_dir>/kvstore/<input_id>.db). Agentless environments don't provide persistent local storage, so the input can't run there.

This issue tracks adding a minimal-state mode that stores only checkpoint data (continuation tokens, timestamps, entity ID sets) in the Elasticsearch state store — the same mechanism CEL and httpjson inputs already use in agentless deployments. The minimal-state mode coexists with the existing implementation behind a config option.

Approach

The current implementation stores full entity data (users, devices, groups) locally for deletion detection and, in EntraID's case, transitive group membership computation. All four providers can operate with just checkpoint state:

Provider Current persistent state Minimal state
EntraID Users, devices, groups, relationship tree, delta URLs User delta URL only
Active Directory Users, devices, whenChanged timestamp whenChanged timestamp + entity ID set
Okta Users, devices, timestamps, pagination cursors lastUpdated timestamp + entity ID set
Jamf Computers, timestamps Page cursor + entity ID set

EntraID achieves minimal state by fetching all groups with members on each sync and computing transitive membership in working storage rather than persisting the full entity graph between syncs.

For AD, Okta, and Jamf, deletion detection stores the previous sync's entity ID set in the ES state store.

A config option selects the implementation. Both modes produce identical output documents, so the legacy implementation remains available as a fallback during validation. The option will be hidden in agentless mode where minimal-state is required.

Implementation runtime

The minimal-state approach applies regardless of implementation runtime. Two paths are available:

  1. Beats input: Add a minimal-state mode to the existing entity-analytics input. Requires refactoring to accept statestore.States (x-pack/filebeat/input/entityanalytics: refactor input to accept statestore.States #49160).
  2. OTel receiver: Implement as a new OpenTelemetry receiver. The elasticsearchstorage extension provides ES-backed state storage in the OTel runtime, making this path viable.

The per-provider sync flows, state requirements, and output documents are identical in both cases.

Related work

  • elastic/beats#41492 — Entity Analytics Input GA
  • The agentless-controller needs to add entity-analytics to AGENTLESS_ELASTICSEARCH_STATE_STORE_INPUT_TYPES.
  • Fleet integration packages will default to minimal-state mode and hide the option in agentless deployments.
  • Downstream latest transforms can provide a "current state" view of entities and handle deletion detection via TTL. These are recommended but not required by this work.

Sub-issues

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions