Skip to content

Fix serialization of NodeGpuStatsResponse when no GPU is present (#142937)#142991

Merged
elasticsearchmachine merged 1 commit intoelastic:9.3from
mayya-sharipova:gpu_stats_ser_fix_9.3
Feb 24, 2026
Merged

Fix serialization of NodeGpuStatsResponse when no GPU is present (#142937)#142991
elasticsearchmachine merged 1 commit intoelastic:9.3from
mayya-sharipova:gpu_stats_ser_fix_9.3

Conversation

@mayya-sharipova
Copy link
Copy Markdown
Contributor

NodeGpuStatsResponse uses writeVLong for totalGpuMemoryInBytes, which throws IllegalStateException on the sentinel value -1 returned by GPUSupport.getTotalGpuMemory() when no compatible GPU is available. The fix is to replace -1 with 0L (which does not change the semantics), and add a serialization round-trip test that reproduces the original failure and validates fix.

resolves #142936

Backport for #142937

…stic#142937)

NodeGpuStatsResponse uses writeVLong for totalGpuMemoryInBytes, which throws IllegalStateException on the sentinel value -1 returned by GPUSupport.getTotalGpuMemory() when no compatible GPU is available. The fix is to replace -1 with 0L (which does not change the semantics), and add a serialization round-trip test that reproduces the original failure and validates fix.

resolves elastic#142936
@mayya-sharipova mayya-sharipova added backport auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Search Relevance/Vectors Vector search v9.3.2 labels Feb 24, 2026
@mayya-sharipova mayya-sharipova marked this pull request as ready for review February 24, 2026 20:04
@elasticsearchmachine elasticsearchmachine merged commit 1e2a999 into elastic:9.3 Feb 24, 2026
35 checks passed
@mayya-sharipova mayya-sharipova deleted the gpu_stats_ser_fix_9.3 branch February 24, 2026 21:28
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Mar 3, 2026
uses writeVLong to serialize totalGpuMemoryInBytes,
which is -1 when no GPU is present. This causes
repeated WARN log flooding from OutboundHandler on
multi-node clusters with non-GPU nodes.

The bug was resolved in elastic#142991 for 9.3.2.

Adds known issue docs with a mitigation (raising
OutboundHandler log level to ERROR).
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Mar 3, 2026
uses writeVLong to serialize totalGpuMemoryInBytes,
which is -1 when no GPU is present. This causes
repeated WARN log flooding from OutboundHandler on
multi-node clusters with non-GPU nodes.

The bug was resolved in elastic#142991 for 9.3.2.

Adds known issue docs with a mitigation (raising
OutboundHandler log level to ERROR).
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Mar 3, 2026
uses writeVLong to serialize totalGpuMemoryInBytes,
which is -1 when no GPU is present. This causes
repeated WARN log flooding from OutboundHandler on
multi-node clusters with non-GPU nodes.

The bug was resolved in elastic#142991 for 9.3.2.

Adds known issue docs with a mitigation (raising
OutboundHandler log level to ERROR).
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Mar 3, 2026
PR elastic#142660 introduced a bug where NodeGpuStatsResponse
uses writeVLong to serialize totalGpuMemoryInBytes,
which is -1 when no GPU is present. This causes
repeated WARN log flooding from OutboundHandler on
multi-node clusters with non-GPU nodes.

The bug was resolved in elastic#142991 for 9.3.2.

Adds known issue docs with a mitigation (raising
OutboundHandler log level to ERROR).
mayya-sharipova added a commit that referenced this pull request Mar 3, 2026
PR #142660 introduced a bug where NodeGpuStatsResponse
uses writeVLong to serialize totalGpuMemoryInBytes,
which is -1 when no GPU is present. This causes
repeated WARN log flooding from OutboundHandler on
multi-node clusters with non-GPU nodes.

The bug was resolved in #142991 for 9.3.2.

Adds known issue docs with a mitigation (raising
OutboundHandler log level to ERROR).
shmuelhanoch pushed a commit to shmuelhanoch/elasticsearch that referenced this pull request Mar 4, 2026
PR elastic#142660 introduced a bug where NodeGpuStatsResponse
uses writeVLong to serialize totalGpuMemoryInBytes,
which is -1 when no GPU is present. This causes
repeated WARN log flooding from OutboundHandler on
multi-node clusters with non-GPU nodes.

The bug was resolved in elastic#142991 for 9.3.2.

Adds known issue docs with a mitigation (raising
OutboundHandler log level to ERROR).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport :Search Relevance/Vectors Vector search v9.3.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants