Exclude Default Inference Endpoints from Cluster State Storage#125242
Exclude Default Inference Endpoints from Cluster State Storage#125242jimczi merged 2 commits intoelastic:mainfrom
Conversation
When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint. However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node. Since elastic#121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node. This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup. **Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
|
Pinging @elastic/ml-core (Team:ML) |
Is the reason because other plugins can access the default endpoints directly in memory through the inference plugin? Also we should backport this change right? |
Yes, through the local model registry since all services register their default models there.
The main change is not backported yet so I'll manually add it in the current pr |
|
|
||
| if (out.getTransportVersion().onOrAfter(TransportVersions.INFERENCE_MODEL_REGISTRY_METADATA)) { | ||
| out.writeBoolean(returnMinimalConfig); | ||
| } |
There was a problem hiding this comment.
This is a leftover from #121106. It's ok to remove since the transport serialisation is not used on HandledTransportAction that executes directly on the receiving node.
…ic#125242) When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint. However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node. Since elastic#121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node. This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup. **Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
* Add ModelRegistryMetadata to Cluster State (#121106) This commit integrates `MinimalServiceSettings` (introduced in #120560) into the cluster state for all registered models in the `ModelRegistry`. These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations. To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index. If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index. * fix test compil * fix serialisation * Exclude Default Inference Endpoints from Cluster State Storage (#125242) When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint. However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node. Since #121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node. This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup. **Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
…ting The Elastic inference service removes the default models at startup if the node cannot access EIS. Since elastic#125242 we don't store default models in the cluster state but we still try to delete them. This change ensures that we don't try to update the cluster state when a default model is deleted since the delete is not performed on the master node and default models are never stored in the cluster state.
…ting (#125369) The Elastic inference service removes the default models at startup if the node cannot access EIS. Since #125242 we don't store default models in the cluster state but we still try to delete them. This change ensures that we don't try to update the cluster state when a default model is deleted since the delete is not performed on the master node and default models are never stored in the cluster state.
…ic#125242) When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint. However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node. Since elastic#121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node. This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup. **Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
…ting (elastic#125369) The Elastic inference service removes the default models at startup if the node cannot access EIS. Since elastic#125242 we don't store default models in the cluster state but we still try to delete them. This change ensures that we don't try to update the cluster state when a default model is deleted since the delete is not performed on the master node and default models are never stored in the cluster state.
…ting (#125369) (#125597) The Elastic inference service removes the default models at startup if the node cannot access EIS. Since #125242 we don't store default models in the cluster state but we still try to delete them. This change ensures that we don't try to update the cluster state when a default model is deleted since the delete is not performed on the master node and default models are never stored in the cluster state.
…ic#125242) When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint. However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node. Since elastic#121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node. This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup. **Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
…ting (elastic#125369) The Elastic inference service removes the default models at startup if the node cannot access EIS. Since elastic#125242 we don't store default models in the cluster state but we still try to delete them. This change ensures that we don't try to update the cluster state when a default model is deleted since the delete is not performed on the master node and default models are never stored in the cluster state.
When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint. However, unlike the
put inference modelaction, thegetaction does not redirect the request to the master node.Since #121106, we rely on the assumption that every model creation (
put model) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node.This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup.
Note: This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage. This is an unreleased bug so marking it as non-issue.