[linux] migrate Linux metrics data streams to TSDB#17379
[linux] migrate Linux metrics data streams to TSDB#17379AndersonQ merged 17 commits intoelastic:mainfrom
Conversation
57914bb to
15d89c6
Compare
There was a problem hiding this comment.
Pull request overview
Migrates several Linux integration metrics data streams to Elasticsearch TSDB / time_series data streams by enabling index_mode: "time_series" and annotating fields with metric_type/dimension so metrics can be stored and queried as time series efficiently.
Changes:
- Enable TSDB (
elasticsearch.index_mode: "time_series") for conntrack, entropy, iostat, ksm, memory, pageinfo, raid, and service data streams. - Mark common identifying fields (e.g., agent/cloud/container/host) as
dimension: trueand add stream-specific dimensions (e.g., device/service/raid name). - Annotate numeric metric fields with
metric_type(gauge/counter).
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/linux/data_stream/service/manifest.yml | Enables TSDB index mode for the service metrics data stream. |
| packages/linux/data_stream/service/fields/fields.yml | Adds dimension for service name and metric_type for service resource metrics. |
| packages/linux/data_stream/service/fields/ecs.yml | Marks host.name as a TSDB dimension for service metrics. |
| packages/linux/data_stream/service/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container, etc.) for service metrics. |
| packages/linux/data_stream/raid/manifest.yml | Enables TSDB index mode for the raid metrics data stream. |
| packages/linux/data_stream/raid/fields/fields.yml | Marks raid name as a dimension and annotates numeric fields with metric_type. |
| packages/linux/data_stream/raid/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for raid metrics. |
| packages/linux/data_stream/pageinfo/manifest.yml | Enables TSDB index mode for the pageinfo metrics data stream. |
| packages/linux/data_stream/pageinfo/fields/fields.yml | Annotates buddyinfo numeric fields with metric_type: gauge for TSDB. |
| packages/linux/data_stream/pageinfo/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for pageinfo metrics. |
| packages/linux/data_stream/memory/manifest.yml | Enables TSDB index mode for the memory metrics data stream. |
| packages/linux/data_stream/memory/fields/fields.yml | Adds metric_type annotations across paging/swap/hugepages metrics for TSDB. |
| packages/linux/data_stream/memory/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for memory metrics. |
| packages/linux/data_stream/ksm/manifest.yml | Enables TSDB index mode for the ksm metrics data stream. |
| packages/linux/data_stream/ksm/fields/fields.yml | Annotates KSM numeric fields with metric_type for TSDB. |
| packages/linux/data_stream/ksm/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for ksm metrics. |
| packages/linux/data_stream/iostat/manifest.yml | Enables TSDB index mode for the iostat metrics data stream. |
| packages/linux/data_stream/iostat/fields/fields.yml | Marks disk device name as a dimension and annotates iostat numeric fields with metric_type. |
| packages/linux/data_stream/iostat/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for iostat metrics. |
| packages/linux/data_stream/entropy/manifest.yml | Enables TSDB index mode for the entropy metrics data stream. |
| packages/linux/data_stream/entropy/fields/fields.yml | Annotates entropy numeric fields with metric_type: gauge for TSDB. |
| packages/linux/data_stream/entropy/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for entropy metrics. |
| packages/linux/data_stream/conntrack/manifest.yml | Enables TSDB index mode for the conntrack metrics data stream. |
| packages/linux/data_stream/conntrack/fields/fields.yml | Annotates conntrack numeric fields with metric_type for TSDB. |
| packages/linux/data_stream/conntrack/fields/agent.yml | Adds common TSDB dimensions (agent/cloud/container/host.name, etc.) for conntrack metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| description: bytes in | ||
| - name: in.packets | ||
| type: long | ||
| format: bytes |
There was a problem hiding this comment.
system.service.resources.network.in.packets is a packet count but is still declared with format: bytes, which will cause incorrect formatting/units in Kibana and exported field docs. Remove the bytes format (or switch to a numeric format appropriate for counts).
| format: bytes |
Vale Linting ResultsSummary: 1 warning, 4 suggestions found
|
| File | Line | Rule | Message |
|---|---|---|---|
| packages/linux/docs/README.md | 306 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'for example' instead of 'e.g'. |
💡 Suggestions (4)
| File | Line | Rule | Message |
|---|---|---|---|
| packages/linux/docs/README.md | 100 | Elastic.WordChoice | Consider using 'can, might' instead of 'may', unless the term is in the UI. |
| packages/linux/docs/README.md | 214 | Elastic.WordChoice | Consider using 'can, might' instead of 'may', unless the term is in the UI. |
| packages/linux/docs/README.md | 281 | Elastic.WordChoice | Consider using 'can, might' instead of 'may', unless the term is in the UI. |
| packages/linux/docs/README.md | 331 | Elastic.Wordiness | Consider using 'all' instead of 'all of '. |
The Vale linter checks documentation changes against the Elastic Docs style guide.
To use Vale locally or report issues, refer to Elastic style guide for Vale.
Enable time series data streams (TSDB) for 8 of 11 data streams in the Linux integration: conntrack, entropy, iostat, ksm, memory, pageinfo, raid, and service. For each data stream: - Add `elasticsearch.index_mode: "time_series"` to manifest.yml - Annotate numeric fields with appropriate metric_type (gauge/counter) - Mark dimension fields to uniquely identify each time series Common dimensions (all 8 data streams): - agent.id - agent.name - cloud.account.id - cloud.availability_zone - cloud.instance.id - cloud.provider - cloud.region - container.id - host.name Integration-specific dimensions: - iostat: linux.iostat.name (disk device) - raid: system.raid.name (RAID array) - service: system.service.name (systemd service) Excluded data streams: - socket: transient entities with no persistent time series - users: transient sessions with no numeric metrics - network_summary: fields use object wildcard mappings that cannot carry metric_type annotations, limiting TSDB benefits Assisted by Cursor
fa8f4eb to
f53ba33
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| type: long | ||
| format: percent |
There was a problem hiding this comment.
linux.memory.hugepages.used.pct is declared as type: long with format: percent, while other percent fields in this data stream (for example linux.memory.swap.used.pct) use scaled_float with unit: percent. If the hugepages percentage is non-integer, the current mapping will truncate/round; consider switching this field to scaled_float and adding unit: percent for consistency.
| type: long | |
| format: percent | |
| type: scaled_float | |
| format: percent | |
| unit: percent |
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@cmacknz, I checked with AI and they're the same. The only difference is that I'm adding Anyway I'm confirming with the ES/es-storage-engine team that adding redundant dimensions have at most a negligible impact on storage. |
|
hey @Oddly, I was checking and I don't think we can confidently map all fields on the So, I don't think it makes sense trying to change anything here. |
|
Good point, thanks for looking at this! |
…/integrations into 16511-linux-metrics-TSDB
@cmacknz, it's done. I checked with the ES team, they said redundant dimensions have negligible impact and do not help queries, thus, no need to have |
|
@rdner, @orestisfl I believe Craig's questions have been answered. When you have some time, could you review it? |
👍 thanks |
fixes made: +----------------+----------------------------+----------------------------------+ | Data Stream | Field | Added | +----------------+----------------------------+----------------------------------+ | entropy | system.entropy.pct | unit: percent | | iostat | read.per_sec.bytes | unit: byte | | iostat | write.per_sec.bytes | unit: byte | | iostat | busy | format: percent, unit: percent | | memory | hugepages.used.bytes | unit: byte | | memory | hugepages.default_size | unit: byte | | memory | direct_efficiency.pct | unit: percent | | memory | kswapd_efficiency.pct | unit: percent | | service | resources.cpu.usage.ns | unit: nanos | | service | resources.memory.usage.bytes | format: bytes, unit: byte | | service | network.in.bytes | unit: byte | | service | network.out.bytes | unit: byte | +----------------+----------------------------+----------------------------------+
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rdner
left a comment
There was a problem hiding this comment.
Some questions and clarifications.
|
after some more talking to the es team, I got to a final guidance on what should be a dimension for the TSDB. The tl;dr is:
So I'll check to add more fields as dimensions. I'll keep the PR in draft until I update them and find out if |
it comes from here: https://github.com/elastic/package-spec/blob/5f23052266aab46c2b17423f07ac528c4026fded/spec/integration/data_stream/fields/fields.spec.yml#L39-L40 However, it might not be working as expected (elastic/kibana#207849). Even though I think it's better to have it as it should work. |
💚 Build Succeeded
History
cc @AndersonQ |
rdner
left a comment
There was a problem hiding this comment.
Changes look good to me.
I think we should have a proper issue for the TODO/follow up you added about the boolean dimension and remove the comment from this PR.
It's not blocking though.
|
Package linux - 1.1.0 containing this change is available at https://epr.elastic.co/package/linux/1.1.0/ |
Proposed commit message
Summary of the changes
click to expand
Linux Integration - TSDB Field Analysis
Overview
name)vmstat(flattened) not a metricnodes.*(object) not a metricname,level)disks.states.*(object) not a metricname,unit,fragment_path)Common Infrastructure Dimensions (agent.yml)
All 8 TSDB-enabled data streams share the same
agent.ymldefining 24 infrastructure dimensions. These are correct and make sense.agent.idagent.namecloud.account.idcloud.availability_zonecloud.instance.idcloud.instance.namecloud.machine.typecloud.providercloud.regioncloud.project.idcloud.image.idcontainer.idcontainer.image.namecontainer.namehost.architecturehost.domainhost.hostnamehost.idhost.namehost.os.familyhost.os.namehost.os.platformhost.typehost.containerizedbooleandimensions from validating correctly — see package-spec#1106Note on
servicedata streamThe service data stream splits these dimensions differently:
ecs.ymldefines dimension onhost.architecture,host.name,host.os.family,host.os.name,host.os.platform,host.type; whileagent.ymlcovers the rest. The total is equivalent.Common Non-Dimension Fields (shared across TSDB-enabled streams)
These fields appear in every data stream and CANNOT / SHOULD NOT be dimensions:
@timestampdata_stream.typedata_stream.datasetdata_stream.namespaceevent.modulelinux; zero discriminating valueevent.datasetecs.versionevent.durationservice.addressservice.typelinux; zero discriminating valuecontainer.labelsobjecttype not supported for dimensionshost.iphost.machost.os.kernelhost.os.versionhost.os.buildhost.os.codenamePer-Data-Stream Reports
1. conntrack
TSDB: Yes | Entity: Host-level (one series per host)
Dimensions (24)
Only the 24 common infrastructure dimensions (see table above). No domain-specific dimensions needed -- conntrack is a single host-level summary.
Metrics (8)
linux.conntrack.summary.droplinux.conntrack.summary.early_droplinux.conntrack.summary.entrieslinux.conntrack.summary.foundlinux.conntrack.summary.ignorelinux.conntrack.summary.insert_failedlinux.conntrack.summary.invalidlinux.conntrack.summary.search_restartCannot Be Dimension
linux.conntracklinux.conntrack.summaryMissing Dimensions
None. Host-level data is fully identified by the infrastructure dimensions.
2. entropy
TSDB: Yes | Entity: Host-level (one series per host)
Dimensions (24)
Only the 24 common infrastructure dimensions. No domain-specific dimensions needed -- entropy is a single host-level value.
Metrics (2)
system.entropy.available_bitssystem.entropy.pctCannot Be Dimension
system.entropyMissing Dimensions
None. Host-level data is fully identified by the infrastructure dimensions.
3. iostat
TSDB: Yes | Entity: Per-device per host (one series per block device per host)
Dimensions (25)
24 common + 1 domain-specific:
linux.iostat.nameMetrics (13)
linux.iostat.read.request.merges_per_seclinux.iostat.write.request.merges_per_seclinux.iostat.read.request.per_seclinux.iostat.write.request.per_seclinux.iostat.read.per_sec.byteslinux.iostat.read.awaitlinux.iostat.write.per_sec.byteslinux.iostat.write.awaitlinux.iostat.request.avg_sizelinux.iostat.queue.avg_sizelinux.iostat.awaitlinux.iostat.service_timelinux.iostat.busyCannot Be Dimension
linux.iostatMissing Dimensions
None. Device + host fully identifies each time series.
4. ksm
TSDB: Yes | Entity: Host-level (one series per host)
Dimensions (24)
Only the 24 common infrastructure dimensions. KSM is a single host-level subsystem.
Metrics (7)
linux.ksm.stats.pages_sharedlinux.ksm.stats.pages_sharinglinux.ksm.stats.pages_unsharedlinux.ksm.stats.pages_volatilelinux.ksm.stats.full_scanslinux.ksm.stats.stable_node_chainslinux.ksm.stats.stable_node_dupsCannot Be Dimension
linux.ksmlinux.ksm.statsMissing Dimensions
None. Host-level data is fully identified by the infrastructure dimensions.
5. memory
TSDB: Yes | Entity: Host-level (one series per host)
Dimensions (24)
Only the 24 common infrastructure dimensions. Memory is host-level.
Metrics (24)
linux.memory.page_stats.pgscan_kswapd.pageslinux.memory.page_stats.pgscan_direct.pageslinux.memory.page_stats.pgfree.pageslinux.memory.page_stats.pgsteal_kswapd.pageslinux.memory.page_stats.pgsteal_direct.pageslinux.memory.page_stats.direct_efficiency.pctlinux.memory.page_stats.kswapd_efficiency.pctlinux.memory.swap.totallinux.memory.swap.used.byteslinux.memory.swap.freelinux.memory.swap.out.pageslinux.memory.swap.in.pageslinux.memory.swap.readahead.pageslinux.memory.swap.readahead.cachedlinux.memory.swap.used.pctlinux.memory.hugepages.totallinux.memory.hugepages.used.byteslinux.memory.hugepages.used.pctlinux.memory.hugepages.freelinux.memory.hugepages.reservedlinux.memory.hugepages.surpluslinux.memory.hugepages.default_sizelinux.memory.hugepages.swap.out.fallbacklinux.memory.hugepages.swap.out.pagesCannot Be Dimension
linux.memorylinux.memory.page_statslinux.memory.swaplinux.memory.hugepageslinux.memory.vmstatMissing Dimensions
None. Host-level data is fully identified by the infrastructure dimensions.
6. network_summary
TSDB: No (not enabled) | Entity: Host-level
Dimensions (0)
No dimensions defined. The
agent.ymlin this data stream does not havedimension: trueon any field.Metrics (0)
No explicit metrics. All data fields use dynamic objects:
system.network_summary.ip.*system.network_summary.tcp.*system.network_summary.udp.*system.network_summary.udp_lite.*system.network_summary.icmp.*Cannot Be Dimension
system.network_summary.ip.*objecttype not supported for dimensionssystem.network_summary.tcp.*objecttype not supported for dimensionssystem.network_summary.udp.*objecttype not supported for dimensionssystem.network_summary.udp_lite.*objecttype not supported for dimensionssystem.network_summary.icmp.*objecttype not supported for dimensionsTSDB Conversion Blockers
ip.*,tcp.*, etc.) use wildcard field names -- TSDB requires explicit field definitions withmetric_type/proc/net/snmpand/proc/net/netstatbut cannot be marked as such without expanding to explicit fieldsmetric_type: counter, adddimension: trueto agent/host/cloud/container fields in agent.yml, and addindex_mode: "time_series"to manifest7. pageinfo
TSDB: Yes | Entity: Host-level (one series per host)
Dimensions (24)
Only the 24 common infrastructure dimensions.
Metrics (33)
linux.pageinfo.buddy_info.DMA.0linux.pageinfo.buddy_info.DMA.1linux.pageinfo.buddy_info.DMA.2linux.pageinfo.buddy_info.DMA.3linux.pageinfo.buddy_info.DMA.4linux.pageinfo.buddy_info.DMA.5linux.pageinfo.buddy_info.DMA.6linux.pageinfo.buddy_info.DMA.7linux.pageinfo.buddy_info.DMA.8linux.pageinfo.buddy_info.DMA.9linux.pageinfo.buddy_info.DMA.10linux.pageinfo.buddy_info.DMA32.0linux.pageinfo.buddy_info.DMA32.1linux.pageinfo.buddy_info.DMA32.2linux.pageinfo.buddy_info.DMA32.3linux.pageinfo.buddy_info.DMA32.4linux.pageinfo.buddy_info.DMA32.5linux.pageinfo.buddy_info.DMA32.6linux.pageinfo.buddy_info.DMA32.7linux.pageinfo.buddy_info.DMA32.8linux.pageinfo.buddy_info.DMA32.9linux.pageinfo.buddy_info.DMA32.10linux.pageinfo.buddy_info.Normal.0linux.pageinfo.buddy_info.Normal.1linux.pageinfo.buddy_info.Normal.2linux.pageinfo.buddy_info.Normal.3linux.pageinfo.buddy_info.Normal.4linux.pageinfo.buddy_info.Normal.5linux.pageinfo.buddy_info.Normal.6linux.pageinfo.buddy_info.Normal.7linux.pageinfo.buddy_info.Normal.8linux.pageinfo.buddy_info.Normal.9linux.pageinfo.buddy_info.Normal.10Cannot Be Dimension
linux.pageinfolinux.pageinfo.buddy_infolinux.pageinfo.buddy_info.DMAlinux.pageinfo.buddy_info.DMA32linux.pageinfo.buddy_info.Normallinux.pageinfo.nodes.*objecttype not supportedMissing Dimensions
None. The zone names (DMA, DMA32, Normal) are encoded in the field path rather than as a dimension value. This is a structural design choice -- if the zones were dynamic, a
zonedimension would be needed, but since they're hardcoded field names, the current approach works.8. raid
TSDB: Yes | Entity: Per-RAID-device per host
Dimensions (26)
24 common + 2 domain-specific:
system.raid.namesystem.raid.levelMetrics (6)
system.raid.disks.activesystem.raid.disks.totalsystem.raid.disks.sparesystem.raid.disks.failedsystem.raid.blocks.totalsystem.raid.blocks.syncedCannot Be Dimension
system.raidsystem.raid.statussystem.raid.sync_actionsystem.raid.disks.states.*objecttype not supportedMissing Dimensions
None. Device name + level + host fully identifies each RAID time series.
9. service
TSDB: Yes | Entity: Per-systemd-service per host
Dimensions (27)
24 common (split between agent.yml + ecs.yml) + 3 domain-specific:
systemd.fragment_pathsystemd.unitsystem.service.nameMetrics (7)
system.service.resources.cpu.usage.nssystem.service.resources.memory.usage.bytessystem.service.resources.tasks.countsystem.service.resources.network.in.bytessystem.service.resources.network.in.packetssystem.service.resources.network.out.packetssystem.service.resources.network.out.bytesCannot Be Dimension
system.servicesystem.service.resourcessystem.service.resources.networksystem.service.load_statesystem.service.statesystem.service.sub_statesystem.service.state_sincedatetype not supported for dimensionssystem.service.exec_codeprocess.nameprocess.pidprocess.pgidprocess.ppidprocess.exit_codeprocess.working_directoryuser.namehost.os.fullMissing Dimensions
None. Service name + unit + fragment_path + host fully identifies each service time series.
10. socket
TSDB: No (not enabled) | Entity: Per-socket per host
Dimensions (0)
No dimensions defined. The
agent.ymldoes not havedimension: trueon any field.Metrics (0)
No metrics defined. The socket data stream captures point-in-time socket snapshots, not numeric measurements over time.
Cannot Be Dimension
system.socketsystem.socket.local.ipsystem.socket.local.portsystem.socket.remote.ipsystem.socket.remote.portsystem.socket.remote.hostsystem.socket.remote.etld_plus_onesystem.socket.remote.host_errorsystem.socket.process.cmdlinenetwork.directionnetwork.typeprocess.nameprocess.executableprocess.piduser.full_nameuser.idTSDB Conversion Assessment
Not recommended. The socket data stream captures a snapshot of all open sockets at each collection interval. This is event-like data (the set of sockets changes constantly), not a stable set of time series with numeric measurements. There are no numeric metrics to track over time -- the value is in the enumeration itself.
11. users
TSDB: No (not enabled) | Entity: Per-session per host
Dimensions (0)
No dimensions defined. The
agent.ymldoes not havedimension: trueon any field.Metrics (0)
No metrics defined. The users data stream captures point-in-time session records.
Cannot Be Dimension
system.userssystem.users.idsystem.users.seatsystem.users.pathsystem.users.typesystem.users.servicesystem.users.remotesystem.users.statesystem.users.scopesystem.users.leadersystem.users.remote_hostsource.ipsource.portTSDB Conversion Assessment
Not recommended. Like socket, the users data stream captures a point-in-time snapshot of logged-in sessions. Sessions are inherently transient -- they appear and disappear. There are no numeric metrics to track over time.
Summary of Findings
Correctness of Existing TSDB Configuration
All 8 TSDB-enabled data streams have correct dimension and metric assignments:
iostat.name,raid.name,raid.level,service.name,systemd.unit,systemd.fragment_path) correctly identify the measured entityNon-TSDB Data Streams
metric_typeNo Missing Dimensions Found
For all TSDB-enabled data streams, the current dimension set is complete. The entity being measured is fully identified:
Tests with TSDB-migration-test-kit
Use TSDB migration test kit to test.
Run the test for the following data streams:
Checklist
[ ] I have reviewed tips for building integrations and this pull request is aligned with them.changelog.ymlfile.[ ] I have verified that Kibana version constraints are current according to guidelines.[ ] I have verified that any added dashboard complies with Kibana's Dashboard good practicesHow to test this PR locally
conntrack, entropy, iostat, ksm, memory, pageinfo, raid, and serviceelastic-package build -v && elastic-package install -vRelated issues