Skip to content

[ML] Inference endpoints UI serverless: Enables adaptive allocations and allow user to set max allocations#222726

Merged
alvarezmelissa87 merged 25 commits intoelastic:mainfrom
alvarezmelissa87:ml-inference-endpoints-remove-allocations-fields
Jun 27, 2025
Merged

[ML] Inference endpoints UI serverless: Enables adaptive allocations and allow user to set max allocations#222726
alvarezmelissa87 merged 25 commits intoelastic:mainfrom
alvarezmelissa87:ml-inference-endpoints-remove-allocations-fields

Conversation

@alvarezmelissa87
Copy link
Copy Markdown
Contributor

@alvarezmelissa87 alvarezmelissa87 commented Jun 5, 2025

Summary

Related issue: #221827

The changes in this PR for now will only apply in serverless.

This PR adds the following changes in a serverless environment:

  • removes the allocations/threads input fields from the inference endpoints UI creation and replaces it with an input for max allocations
  • adds informative text for the user when adaptive allocations will be enabled
  • always sets adaptive allocations to be enabled and min_allocations to 0

Entry points tested:

  • Inference endpoints list page > Add endpoint button
  • Playground > Connect to an LLM button
  • Connectors list page > Create connector button
  • AI Assistant > Set up GenAI Connector button
  • Index management > create index with mapping > add semantic text field

image

TASKS

- [ ] implement helper class to calculate appropriate value for num_threads based on max allocations specified by the user. This will be done keeping in mind that it will be optimized for search with high resource use.

  • ML nodes will set a default number of threads in serverless trained model APIs - this will require a backend change (I will link PR here when available)
    • Until that change is made, num_allocations will be defaulted to 1 as the endpoint currently requires that parameter
  • minimum allocations will always be 0
  • Add serverless check in AI Connector to ensure behavior is the same

TO NOTE

The field overrides added are a temporary solution until the endpoint returning the service's configurable fields can be updated.

As the code is shared with the AI Connector - this behavior will also apply for Elasticsearch service when on serverless.

Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
  • This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The release_note:breaking label should be applied in these situations.
  • Flaky Test Runner was used on any tests changed
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines

@alvarezmelissa87 alvarezmelissa87 self-assigned this Jun 5, 2025
@alvarezmelissa87 alvarezmelissa87 added the Feature:Inference UI ML Inference endpoints UI and AI connector label Jun 5, 2025
@arisonl

This comment was marked as resolved.

@alvarezmelissa87 alvarezmelissa87 force-pushed the ml-inference-endpoints-remove-allocations-fields branch from 2d51451 to 4254225 Compare June 10, 2025 15:15
@alvarezmelissa87 alvarezmelissa87 force-pushed the ml-inference-endpoints-remove-allocations-fields branch from 4254225 to 33cb73e Compare June 10, 2025 16:02
@alvarezmelissa87 alvarezmelissa87 changed the title [WIP][ML] Inference endpoints UI serverless: enable adaptive allocations and allow user to set max allocations [ML] Inference endpoints UI serverless: enable adaptive allocations and allow user to set max allocations Jun 10, 2025
@alvarezmelissa87 alvarezmelissa87 marked this pull request as ready for review June 10, 2025 16:03
@alvarezmelissa87 alvarezmelissa87 requested review from a team as code owners June 10, 2025 16:03
@shubhaat

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@Samiul-TheSoccerFan Samiul-TheSoccerFan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! Just wondering, are any changes needed in the Index Management package as well? Since it's also possible to create inference endpoints from there, it might be worth validating whether any updates are required on that side too.

const MIN_ALLOCATIONS = 0;
const DEFAULT_NUM_THREADS = 1;

export const getInferenceApiParams = (data: any, enforceAdaptiveAllocations: boolean) => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note on this function - ideally we should be doing this in the form serializer but because right now we need enforceAdaptiveAllocations (which isn't available outside of the component) we need to keep this as an external function.
Once this change is included in all environments and we no longer need that flag - this will be moved to the serializer. cc @jcger 🙏


// TODO: Remove when https://github.com/elastic/kibana/issues/133107 is resolved
const formDeserializer = (data: ConnectorFormSchema): ConnectorFormSchema => {
if (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a better alternative. We have to hardcode the serializer/deserializer this way. We'll open an issue to improve our framework

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcger - thank you so much for your feedback 🙏
I updated with all suggested changes here 1e4dd5a

  • renamed to isServerless in all areas for the connector
  • stored the value in context
  • moved data manipulation to serializer/deserializer

*/

const { actionTypeId, name, config, secrets } = data;
const connectorData = getInferenceApiParams(data, !!enforceAdaptiveAllocations) ?? data;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to add the condition here to check that it's only called when the connector type is the inference connector.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data manipulation now lives in the form serializer/deserializer so this helper function is no longer needed.
Changes made here 1e4dd5a

export interface ActionConnectorFieldsProps {
readOnly: boolean;
isEdit: boolean;
enforceAdaptiveAllocations?: boolean;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too specific for the inference connector. For now, it's using the value of isServerless, let's call it that instead. If the requirements for determining when the inference connector should enforceAdapativeAllocations change, we can adapt. For now, let's go for the mininum required changes for the current needs, and keep it as easy as possible. I'll add an extra comment for the lazy loading components that don't set the context the same way we do in the rest of the plugin


const InferenceAPIConnectorFields: React.FunctionComponent<ActionConnectorFieldsProps> = ({
isEdit,
enforceAdaptiveAllocations,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use isServerless via the Kibana context instead.
The component should render like this:

    <InferenceServiceFormFields
      http={http}
      isEdit={isEdit}
      enforceAdaptiveAllocations={isServerless}
      toasts={toasts}
    />

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 1e4dd5a

actionTypeRegistry,
ruleTypeRegistry,
share: pluginsStart.share,
enforceAdaptiveAllocations: !!pluginsStart.serverless,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets call this isServerless. Same for the rest

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 1e4dd5a

Copy link
Copy Markdown
Contributor

@jcger jcger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend testing that the field is shown when it should be, ensuring we don't break it with a change to the isServerless feature. Approving because that test isn't on our side, and because it could be done in a future PR

interface ConnectorFormFieldsProps {
actionTypeModel: ActionTypeModel | null;
isEdit: boolean;
enforceAdaptiveAllocations?: boolean;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed now, I can't see it being used

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Removed in 670417b

setResetForm?: (value: ResetForm) => void;
}

interface ProviderConfig {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename it to something that makes it clear that it's only used/needed by the inference connector, something like InferenceConnectorProviderConfig

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 82c3ef5

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add tests for the new logic?

@alvarezmelissa87
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Jun 27, 2025

⏳ Build in-progress

History

cc @alvarezmelissa87

@alvarezmelissa87
Copy link
Copy Markdown
Contributor Author

Created a follow up issue #225700 for adding tests.

@alvarezmelissa87 alvarezmelissa87 merged commit 536ddcc into elastic:main Jun 27, 2025
10 checks passed
@alvarezmelissa87 alvarezmelissa87 deleted the ml-inference-endpoints-remove-allocations-fields branch June 27, 2025 21:12
@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jul 1, 2025
@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 222726 locally
cc: @alvarezmelissa87

@peteharverson peteharverson added backport:skip This PR does not require backporting and removed backport missing Added to PRs automatically when the are determined to be missing a backport. backport:version Backport to applied version labels labels Jul 2, 2025
@peteharverson peteharverson changed the title [ML] Inference endpoints UI serverless: enable adaptive allocations and allow user to set max allocations [ML] Inference endpoints UI serverless: Enables adaptive allocations and allow user to set max allocations Sep 24, 2025
saikatsarkar056 added a commit that referenced this pull request Jan 21, 2026
…UI (#249098)

### Summary

Adds tests for serverless adaptive allocations feature (#222726).

### Run Tests

```
yarn test:jest --no-collectCoverage x-pack/platform/packages/shared/kbn-inference-endpoint-ui-common/src/components/inference_service_form_fields.test.tsx

yarn test:jest --no-collectCoverage x-pack/platform/packages/shared/kbn-inference-endpoint-ui-common/src/components/inference_flyout_wrapper.test.tsx

```
### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
qn895 pushed a commit to qn895/kibana that referenced this pull request Jan 22, 2026
…UI (elastic#249098)

### Summary

Adds tests for serverless adaptive allocations feature (elastic#222726).

### Run Tests

```
yarn test:jest --no-collectCoverage x-pack/platform/packages/shared/kbn-inference-endpoint-ui-common/src/components/inference_service_form_fields.test.tsx

yarn test:jest --no-collectCoverage x-pack/platform/packages/shared/kbn-inference-endpoint-ui-common/src/components/inference_flyout_wrapper.test.tsx

```
### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
dennis-tismenko pushed a commit to dennis-tismenko/kibana that referenced this pull request Jan 22, 2026
…UI (elastic#249098)

### Summary

Adds tests for serverless adaptive allocations feature (elastic#222726).

### Run Tests

```
yarn test:jest --no-collectCoverage x-pack/platform/packages/shared/kbn-inference-endpoint-ui-common/src/components/inference_service_form_fields.test.tsx

yarn test:jest --no-collectCoverage x-pack/platform/packages/shared/kbn-inference-endpoint-ui-common/src/components/inference_flyout_wrapper.test.tsx

```
### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting ci:project-deploy-elasticsearch Create an Elasticsearch Serverless project Feature:Inference UI ML Inference endpoints UI and AI connector :ml release_note:enhancement v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.