Add GPU vector indexing monitoring to _xpack/usage by mayya-sharipova · Pull Request #141932 · elastic/elasticsearch

mayya-sharipova · 2026-02-05T14:53:02Z

Expose GPU vector indexing usage via the _xpack/usage and _xpack APIs

GET _xpack response (features section):

{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true
  }
}

_xpack field semantics (local node only):

available: License permits GPU indexing
enabled: the node handling this request has GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false

GET _xpack/usage response:

{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true,
    "index_build_count": 30,
    "nodes_with_gpu": 3,
    "nodes": [
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA A100", "memory_in_bytes": 80000000000,
        "enabled": true, "index_build_count": 10 }
    ]
  }
}

_xpack/usage field semantics (cluster-wide):

available: License permits GPU indexing
enabled: at least one node has configured GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false
index_build_count: total GPU index builds across cluster
nodes_with_gpu: count of data nodes with GPU support
nodes[]: per-node GPU details (type, memory, enabled, build count)

elasticsearchmachine · 2026-02-05T14:53:29Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Expose GPU vector indexing usage via the _xpack/usage and _xpack APIs GET _xpack response (features section): ```json { "gpu_vector_indexing": { "available": true, "enabled": true } } ``` _xpack field semantics (local node only): - available: License permits GPU indexing - enabled: the node handling this request has GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false GET _xpack/usage response: ```json { "gpu_vector_indexing": { "available": true, "enabled": true, "index_build_count": 30, "nodes_with_gpu": 3, "nodes": [ { "type": "NVIDIA L4", "memory_in_bytes": 24000000000, "enabled": true, "index_build_count": 10 }, { "type": "NVIDIA L4", "memory_in_bytes": 24000000000, "enabled": true, "index_build_count": 10 }, { "type": "NVIDIA A100", "memory_in_bytes": 80000000000, "enabled": true, "index_build_count": 10 } ] } } ``` _xpack/usage field semantics (cluster-wide): - available: License permits GPU indexing - enabled: at least one node has configured GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false - index_build_count: total GPU index builds across cluster - nodes_with_gpu: count of data nodes with GPU support - nodes[]: per-node GPU details (type, memory, enabled, build count)

libs/gpu-codec/src/main/java/org/elasticsearch/gpu/codec/ES92GpuHnswVectorsWriter.java

...re/src/test/java/org/elasticsearch/xpack/core/gpu/GpuVectorIndexingFeatureSetUsageTests.java

x-pack/plugin/gpu/src/test/java/org/elasticsearch/xpack/gpu/GpuUsageTransportActionTests.java

- Add no-GPU scenario to randomUsage() for better test coverage - Make buildUsage() static to enable direct unit testing - Simplify GpuUsageTransportActionTests by removing mock boilerplate - Add test for empty cluster response scenario - Fix licenseNotAvailable test: index_build_count must be 0 - Verify GPU names are reported even when setting is disabled

The GPU yaml REST tests call the _xpack/usage and _xpack/info APIs, which internally iterate over all registered XPack feature actions (XPackUsageFeatureAction.ALL / XPackInfoFeatureAction.ALL). With the INTEG_TEST distribution (the default), only x-pack-core and x-pack-security modules are loaded, so action handlers for other features (e.g. aggregate_metric) are missing, causing a 500 error. Switch to DistributionType.DEFAULT so the full set of XPack modules is available and the usage/info endpoints can query all features.

Introduces gpu_vector_indexing_telemetry transport version for proper serialization of GpuVectorIndexingFeatureSetUsage in mixed clusters. Without a proper transport version, sending GPU usage data to older nodes that don't have the NamedWriteable registered would cause deserialization failures. With this change, GPU telemetry is filtered out when responding to nodes older than 9.3. Transport version IDs: - 9.4: 9277000 - 9.3: 9250004

mayya-sharipova · 2026-02-12T14:13:35Z

@ChrisHegarty Do you have any additional feedback?

ChrisHegarty

LGTM

Remove per-node index_build_count assertion since with 2 nodes and 1 shard, the indexing node may not be nodes.0. The cluster-level index_build_count assertion already validates indexing occurred. Also use filter_path to return only gpu_vector_indexing from xpack.usage.

mayya-sharipova · 2026-02-17T20:31:05Z

@elasticsearchmachine run elasticsearch-ci/checkPart2 / gpu-tests

elasticsearchmachine · 2026-02-18T18:37:45Z

💔 Backport failed

The backport operation could not be completed due to the following error:

An unhandled error occurred. Please consult the logs

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 141932

Expose GPU vector indexing usage via the _xpack/usage and _xpack APIs GET _xpack response (features section): ```json { "gpu_vector_indexing": { "available": true, "enabled": true } } ``` _xpack field semantics (local node only): - available: License permits GPU indexing - enabled: the node handling this request has GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false GET _xpack/usage response: ```json { "gpu_vector_indexing": { "available": true, "enabled": true, "index_build_count": 30, "nodes_with_gpu": 3, "nodes": [ { "type": "NVIDIA L4", "memory_in_bytes": 24000000000, "enabled": true, "index_build_count": 10 }, { "type": "NVIDIA L4", "memory_in_bytes": 24000000000, "enabled": true, "index_build_count": 10 }, { "type": "NVIDIA A100", "memory_in_bytes": 80000000000, "enabled": true, "index_build_count": 10 } ] } } ``` _xpack/usage field semantics (cluster-wide): - available: License permits GPU indexing - enabled: at least one node has configured GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false - index_build_count: total GPU index builds across cluster - nodes_with_gpu: count of data nodes with GPU support - nodes[]: per-node GPU details (type, memory, enabled, build count) Backport for elastic#141932

Expose GPU vector indexing usage via the _xpack/usage and _xpack APIs GET _xpack response (features section): ```json { "gpu_vector_indexing": { "available": true, "enabled": true } } ``` _xpack field semantics (local node only): - available: License permits GPU indexing - enabled: the node handling this request has GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false GET _xpack/usage response: ```json { "gpu_vector_indexing": { "available": true, "enabled": true, "index_build_count": 30, "nodes_with_gpu": 3, "nodes": [ { "type": "NVIDIA L4", "memory_in_bytes": 24000000000, "enabled": true, "index_build_count": 10 }, { "type": "NVIDIA L4", "memory_in_bytes": 24000000000, "enabled": true, "index_build_count": 10 }, { "type": "NVIDIA A100", "memory_in_bytes": 80000000000, "enabled": true, "index_build_count": 10 } ] } } ``` _xpack/usage field semantics (cluster-wide): - available: License permits GPU indexing - enabled: at least one node has configured GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false - index_build_count: total GPU index builds across cluster - nodes_with_gpu: count of data nodes with GPU support - nodes[]: per-node GPU details (type, memory, enabled, build count) Backport for #141932

…on-sliced-reindex * upstream/main: (120 commits) [Fleet] Add OpAMP field mappings to fleet-agents (elastic#142550) Clarify `expectedSize` behaviour of `ReleasableBytesStreamOutput` (elastic#142451) Refactor KnnIndexTester to tidy up some options (elastic#142651) Fixed with elastic#142638 already (elastic#142655) Change *OverTimeTests to extend AbstractAggregationTestCase (elastic#142659) Fix byteRefBlockHashSize for release mode (elastic#142668) Mute org.elasticsearch.xpack.esql.tree.EsqlNodeSubclassTests testTransform {class org.elasticsearch.xpack.esql.plan.logical.MMR} elastic#142674 Fix PAUSED_FOR_NODE_REMOVAL shard blocking QUEUED promotion (elastic#142637) Mute org.elasticsearch.xpack.logsdb.RandomizedRollingUpgradeIT testIndexingStandardSource elastic#142670 Revert "[ESQL] Introduce pluggable external datasource framework (elastic#141678) (elastic#142663) Mute org.elasticsearch.xpack.esql.spatial.SpatialPushDownGeoShapeIT testQuantizedXY elastic#141234 PromQL: infer start/end from query DSL filters (elastic#142580) Add GPU vector indexing monitoring to _xpack/usage (elastic#141932) Fix testTrackerClearShutdown: use non-zero startTimeMillis for DONE status (elastic#142646) Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#142426 ESQL_ Move time_zone to GA (elastic#142287) Mute org.elasticsearch.xpack.esql.qa.multi_node.GenerativeIT test elastic#142426 DOCS: Convert Painless diagrams to mermaid (elastic#141851) ES|QL: fix validation in generative tests (elastic#142638) Unmute tests that do not reproduce failures (elastic#141712) ...

mayya-sharipova added >feature :Search Relevance/Vectors Vector search v9.3.1 v9.4.0 labels Feb 5, 2026

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Feb 5, 2026

mayya-sharipova requested review from ChrisHegarty, Copilot and ldematte February 5, 2026 14:55

Copilot AI reviewed Feb 5, 2026

View reviewed changes

mayya-sharipova force-pushed the gpu-monitoring branch from 56f957b to 726c7d6 Compare February 5, 2026 18:27

mayya-sharipova force-pushed the gpu-monitoring branch from 726c7d6 to 74f965e Compare February 5, 2026 19:34

mayya-sharipova added the test-gpu Run tests using a GPU label Feb 5, 2026

ChrisHegarty reviewed Feb 6, 2026

View reviewed changes

mayya-sharipova added 4 commits February 6, 2026 09:10

Add GPU vector indexing actions to non-operator list

235bf9f

Merge branch 'main' into gpu-monitoring

a11ec26

mayya-sharipova added the auto-backport Automatically create backport pull requests when merged label Feb 6, 2026

ChrisHegarty approved these changes Feb 12, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into gpu-monitoring

a08ebfb

mayya-sharipova added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 13, 2026

mayya-sharipova added 5 commits February 13, 2026 15:43

Adjust changes

1aca938

Merge branch 'main' into gpu-monitoring

370f7c9

Merge branch 'main' into gpu-monitoring

cd299d0

Merge remote-tracking branch 'upstream/main' into gpu-monitoring

f0c5f5c

Fix transport version conflicts after merge

80d1e24

mayya-sharipova added 2 commits February 17, 2026 16:28

Merge branch 'main' into gpu-monitoring

378f188

Merge remote-tracking branch 'upstream/main' into gpu-monitoring

be7461c

elasticsearchmachine merged commit 4c8ac0b into elastic:main Feb 18, 2026
36 checks passed

mayya-sharipova deleted the gpu-monitoring branch February 18, 2026 18:37

elasticsearchmachine added the backport pending label Feb 18, 2026

mayya-sharipova mentioned this pull request Feb 18, 2026

Add GPU vector indexing monitoring to _xpack/usage (#141932) #142660

Merged

mayya-sharipova removed the backport pending label Feb 18, 2026

ChrisHegarty mentioned this pull request Feb 24, 2026

Failure serializing org.elasticsearch.xpack.gpu.NodeGpuStatsResponse #142936

Closed

andreidan mentioned this pull request Feb 25, 2026

Guard GpuStatsAction calls by transport version check #142934

Merged

ChrisHegarty mentioned this pull request Mar 2, 2026

Guard gpu_vector_indexing stats transport fanout in mixed-version clusters #143310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU vector indexing monitoring to _xpack/usage#141932

Add GPU vector indexing monitoring to _xpack/usage#141932
elasticsearchmachine merged 15 commits intoelastic:mainfrom
mayya-sharipova:gpu-monitoring

mayya-sharipova commented Feb 5, 2026 •

edited

Loading

Uh oh!

elasticsearchmachine commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mayya-sharipova commented Feb 12, 2026

Uh oh!

ChrisHegarty left a comment

Uh oh!

mayya-sharipova commented Feb 17, 2026

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mayya-sharipova commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mayya-sharipova commented Feb 12, 2026

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova commented Feb 17, 2026

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 18, 2026

💔 Backport failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mayya-sharipova commented Feb 5, 2026 •

edited

Loading