Skip to content

Add GPU vector indexing monitoring to _xpack/usage#141932

Merged
elasticsearchmachine merged 15 commits intoelastic:mainfrom
mayya-sharipova:gpu-monitoring
Feb 18, 2026
Merged

Add GPU vector indexing monitoring to _xpack/usage#141932
elasticsearchmachine merged 15 commits intoelastic:mainfrom
mayya-sharipova:gpu-monitoring

Conversation

@mayya-sharipova
Copy link
Copy Markdown
Contributor

@mayya-sharipova mayya-sharipova commented Feb 5, 2026

Expose GPU vector indexing usage via the _xpack/usage and _xpack APIs

GET _xpack response (features section):

{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true
  }
}

_xpack field semantics (local node only):

  • available: License permits GPU indexing
  • enabled: the node handling this request has GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false

GET _xpack/usage response:

{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true,
    "index_build_count": 30,
    "nodes_with_gpu": 3,
    "nodes": [
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA A100", "memory_in_bytes": 80000000000,
        "enabled": true, "index_build_count": 10 }
    ]
  }
}

_xpack/usage field semantics (cluster-wide):

  • available: License permits GPU indexing
  • enabled: at least one node has configured GPU hardware and hasn't disabled it via vectors.indexing.use_gpu=false
  • index_build_count: total GPU index builds across cluster
  • nodes_with_gpu: count of data nodes with GPU support
  • nodes[]: per-node GPU details (type, memory, enabled, build count)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Feb 5, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Expose GPU vector indexing usage via the _xpack/usage and _xpack APIs

GET _xpack response (features section):
```json
{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true
  }
}
```

_xpack field semantics (local node only):
- available: License permits GPU indexing
- enabled: the node handling this request has GPU hardware and
  hasn't disabled it via vectors.indexing.use_gpu=false

GET _xpack/usage response:
```json
{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true,
    "index_build_count": 30,
    "nodes_with_gpu": 3,
    "nodes": [
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA A100", "memory_in_bytes": 80000000000,
        "enabled": true, "index_build_count": 10 }
    ]
  }
}
```

_xpack/usage field semantics (cluster-wide):
- available: License permits GPU indexing
- enabled: at least one node has configured GPU hardware and hasn't disabled
  it via vectors.indexing.use_gpu=false
- index_build_count: total GPU index builds across cluster
- nodes_with_gpu: count of data nodes with GPU support
- nodes[]: per-node GPU details (type, memory, enabled, build count)
@mayya-sharipova mayya-sharipova added the test-gpu Run tests using a GPU label Feb 5, 2026
- Add no-GPU scenario to randomUsage() for better test coverage
- Make buildUsage() static to enable direct unit testing
- Simplify GpuUsageTransportActionTests by removing mock boilerplate
- Add test for empty cluster response scenario
- Fix licenseNotAvailable test: index_build_count must be 0
- Verify GPU names are reported even when setting is disabled
The GPU yaml REST tests call the _xpack/usage and _xpack/info APIs,
which internally iterate over all registered XPack feature actions
(XPackUsageFeatureAction.ALL / XPackInfoFeatureAction.ALL). With the
INTEG_TEST distribution (the default), only x-pack-core and
x-pack-security modules are loaded, so action handlers for other
features (e.g. aggregate_metric) are missing, causing a 500 error.

Switch to DistributionType.DEFAULT so the full set of XPack modules
is available and the usage/info endpoints can query all features.
@mayya-sharipova mayya-sharipova added the auto-backport Automatically create backport pull requests when merged label Feb 6, 2026
Introduces gpu_vector_indexing_telemetry transport version for proper
serialization of GpuVectorIndexingFeatureSetUsage in mixed clusters.

Without a proper transport version, sending GPU usage data to older
nodes that don't have the NamedWriteable registered would cause
deserialization failures. With this change, GPU telemetry is filtered
out when responding to nodes older than 9.3.

Transport version IDs:
- 9.4: 9277000
- 9.3: 9250004
@mayya-sharipova
Copy link
Copy Markdown
Contributor Author

@ChrisHegarty Do you have any additional feedback?

Copy link
Copy Markdown
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mayya-sharipova mayya-sharipova added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 13, 2026
Remove per-node index_build_count assertion since with 2 nodes and 1
shard, the indexing node may not be nodes.0. The cluster-level
index_build_count assertion already validates indexing occurred.
Also use filter_path to return only gpu_vector_indexing from xpack.usage.
@mayya-sharipova
Copy link
Copy Markdown
Contributor Author

@elasticsearchmachine run elasticsearch-ci/checkPart2 / gpu-tests

@elasticsearchmachine elasticsearchmachine merged commit 4c8ac0b into elastic:main Feb 18, 2026
36 checks passed
@mayya-sharipova mayya-sharipova deleted the gpu-monitoring branch February 18, 2026 18:37
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unhandled error occurred. Please consult the logs

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 141932

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Feb 18, 2026
Expose GPU vector indexing usage via the _xpack/usage and _xpack APIs

GET _xpack response (features section):

```json
{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true
  }
}
```

_xpack field semantics (local node only): - available: License permits
GPU indexing - enabled: the node handling this request has GPU hardware
and hasn't disabled it via vectors.indexing.use_gpu=false

GET _xpack/usage response:

```json
{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true,
    "index_build_count": 30,
    "nodes_with_gpu": 3,
    "nodes": [
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA A100", "memory_in_bytes": 80000000000,
        "enabled": true, "index_build_count": 10 }
    ]
  }
}
```

_xpack/usage field semantics (cluster-wide): - available: License
permits GPU indexing - enabled: at least one node has configured GPU
hardware and hasn't disabled it via vectors.indexing.use_gpu=false -
index_build_count: total GPU index builds across cluster -
nodes_with_gpu: count of data nodes with GPU support - nodes[]: per-node
GPU details (type, memory, enabled, build count)

Backport for elastic#141932
elasticsearchmachine pushed a commit that referenced this pull request Feb 18, 2026
Expose GPU vector indexing usage via the _xpack/usage and _xpack APIs

GET _xpack response (features section):

```json
{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true
  }
}
```

_xpack field semantics (local node only): - available: License permits
GPU indexing - enabled: the node handling this request has GPU hardware
and hasn't disabled it via vectors.indexing.use_gpu=false

GET _xpack/usage response:

```json
{
  "gpu_vector_indexing": {
    "available": true,
    "enabled": true,
    "index_build_count": 30,
    "nodes_with_gpu": 3,
    "nodes": [
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA L4", "memory_in_bytes": 24000000000,
        "enabled": true, "index_build_count": 10 },
      { "type": "NVIDIA A100", "memory_in_bytes": 80000000000,
        "enabled": true, "index_build_count": 10 }
    ]
  }
}
```

_xpack/usage field semantics (cluster-wide): - available: License
permits GPU indexing - enabled: at least one node has configured GPU
hardware and hasn't disabled it via vectors.indexing.use_gpu=false -
index_build_count: total GPU index builds across cluster -
nodes_with_gpu: count of data nodes with GPU support - nodes[]: per-node
GPU details (type, memory, enabled, build count)

Backport for #141932
szybia added a commit to szybia/elasticsearch that referenced this pull request Feb 19, 2026
…on-sliced-reindex

* upstream/main: (120 commits)
  [Fleet] Add OpAMP field mappings to fleet-agents (elastic#142550)
  Clarify `expectedSize` behaviour of `ReleasableBytesStreamOutput` (elastic#142451)
  Refactor KnnIndexTester to tidy up some options (elastic#142651)
  Fixed with elastic#142638 already (elastic#142655)
  Change *OverTimeTests to extend AbstractAggregationTestCase (elastic#142659)
  Fix byteRefBlockHashSize for release mode (elastic#142668)
  Mute org.elasticsearch.xpack.esql.tree.EsqlNodeSubclassTests testTransform {class org.elasticsearch.xpack.esql.plan.logical.MMR} elastic#142674
  Fix PAUSED_FOR_NODE_REMOVAL shard blocking QUEUED promotion (elastic#142637)
  Mute org.elasticsearch.xpack.logsdb.RandomizedRollingUpgradeIT testIndexingStandardSource elastic#142670
  Revert "[ESQL] Introduce pluggable external datasource framework (elastic#141678) (elastic#142663)
  Mute org.elasticsearch.xpack.esql.spatial.SpatialPushDownGeoShapeIT testQuantizedXY elastic#141234
  PromQL: infer start/end from query DSL filters (elastic#142580)
  Add GPU vector indexing monitoring to _xpack/usage (elastic#141932)
  Fix testTrackerClearShutdown: use non-zero startTimeMillis for DONE status (elastic#142646)
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#142426
  ESQL_ Move time_zone to GA (elastic#142287)
  Mute org.elasticsearch.xpack.esql.qa.multi_node.GenerativeIT test elastic#142426
  DOCS: Convert Painless diagrams to mermaid (elastic#141851)
  ES|QL: fix validation in generative tests (elastic#142638)
  Unmute tests that do not reproduce failures (elastic#141712)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >feature :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-gpu Run tests using a GPU v9.3.1 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants