[SigEvents] Convert feature duplication evaluators to createPrompt pattern#256534
Merged
crespocarlos merged 22 commits intoelastic:mainfrom Mar 18, 2026
Conversation
Refactor `createSemanticUniquenessEvaluator` and `createIdConsistencyEvaluator` from `inferenceClient.output()` to the `createPrompt` + `executeUntilValid` pattern used by the rest of the `@kbn/evals` ecosystem. - Extract system/user prompts into `.text` mustache templates - Define prompt objects with `createPrompt` and Zod input schemas - Replace `inferenceClient.output()` with `executeUntilValid` - Preserve all score calculation logic, evaluator signatures, and return shapes - Move evaluators from flat file into `feature_duplication/` directory Closes elastic#255869 Made-with: Cursor
4 tasks
@kbn/evals evaluator format
Contributor
Author
|
@elasticmachine merge upstream |
…-kbn-evals-format
…_duplication_evaluators Made-with: Cursor
Contributor
Author
|
/ci |
viduni94
reviewed
Mar 11, 2026
...n-evals-suite-streams/src/evaluators/feature_duplication_evaluators/id_consistency_prompt.ts
Show resolved
Hide resolved
...ckages/shared/kbn-evals-suite-streams/src/evaluators/feature_duplication_evaluators/index.ts
Outdated
Show resolved
Hide resolved
Contributor
Author
|
@elasticmachine merge upstream |
…-kbn-evals-format
…ators Made-with: Cursor
Contributor
Author
|
@elasticmachine merge upstream |
…-kbn-evals-format
Contributor
Author
|
@elasticmachine merge upstream |
…-kbn-evals-format
Contributor
Author
|
@elasticmachine merge upstream |
…-kbn-evals-format
Contributor
Author
|
@elasticmachine merge upstream |
…-kbn-evals-format
Contributor
Author
|
@elasticmachine merge upstream |
…-kbn-evals-format
Contributor
Author
|
@elasticmachine merge upstream |
…-kbn-evals-format
Contributor
Author
|
@elasticmachine merge upstream |
…-kbn-evals-format
Contributor
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]
History
|
mbondyra
added a commit
to mbondyra/kibana
that referenced
this pull request
Mar 19, 2026
…d_agent_navigation2 * commit '9289d6b5502db245e645e190b0246554396c6c20': (34 commits) [api-docs] 2026-03-19 Daily api_docs build (elastic#258471) [Shared UX][DateRangePicker] Missing parts (elastic#258229) [Dashboard] Keep pinned_panels separate in read response (elastic#258444) Move inheritance: true to top level in .coderabbit.yml (elastic#258461) [DOCS] 9.3.2 Kibana release notes (elastic#257332) adds routing accept metric attribute to the cps metric (elastic#258168) [ML] AI/Inference Connector creation: use 'location' field to correctly set provider config (elastic#250838) [Lens] Add e2e test for legend list layout (elastic#258160) [SigEvents] Convert feature duplication evaluators to createPrompt pattern (elastic#256534) Add actionable-obs author to .coderabbit.yml (elastic#257922) [DOCS] 9.2.7 Kibana release notes (elastic#257331) Grant Serverless editor/viewer access to ES v2 indices (elastic#258384) [SigEvents][Evals] Rename terminology for KI features and KI queries (elastic#258361) [EDR Workflows][Osquery] Add shared table toolbar components and redesign saved queries list (elastic#258394) [Automatic Import V2] Upload samples using an existing index (elastic#258074) Add GET /inference_features route to expose feature registry (elastic#258044) fix additional fields not included (elastic#257625) [Discover] [Metrics] Add tier 2 journeys for Metrics in Discover E2E (elastic#255036) [Lens as code] Support correct X-Axis types in ES|QL visualizations (elastic#258159) Update APM (main) (elastic#254880) ...
flash1293
pushed a commit
to flash1293/kibana
that referenced
this pull request
Mar 19, 2026
…ttern (elastic#256534) Closes elastic#255869 ## Summary Converts the two LLM evaluators in feature duplication evals (`createSemanticUniquenessEvaluator` and `createIdConsistencyEvaluator`) from `inferenceClient.output()` with inline system prompts to the `createPrompt` + `executeUntilValid` pattern used by `@kbn/evals`. System prompts are extracted into `.text` mustache template files, and tool schemas are preserved as prompt version tool definitions. The CODE evaluator (`featureDuplicationEvaluator`) is unchanged. All evaluator names, signatures, and return shapes are preserved. ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks No significant risks. This is a structural refactor of evaluator internals — the evaluator contracts (names, signatures, return shapes) are unchanged. The LLM prompts are semantically identical, just moved from inline strings to versioned mustache templates. --- 🤖 Co-authored with AI assistance. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Viduni Wickramarachchi <viduni.wickramarachchi@elastic.co>
jeramysoucy
pushed a commit
to jeramysoucy/kibana
that referenced
this pull request
Mar 26, 2026
…ttern (elastic#256534) Closes elastic#255869 ## Summary Converts the two LLM evaluators in feature duplication evals (`createSemanticUniquenessEvaluator` and `createIdConsistencyEvaluator`) from `inferenceClient.output()` with inline system prompts to the `createPrompt` + `executeUntilValid` pattern used by `@kbn/evals`. System prompts are extracted into `.text` mustache template files, and tool schemas are preserved as prompt version tool definitions. The CODE evaluator (`featureDuplicationEvaluator`) is unchanged. All evaluator names, signatures, and return shapes are preserved. ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks No significant risks. This is a structural refactor of evaluator internals — the evaluator contracts (names, signatures, return shapes) are unchanged. The LLM prompts are semantically identical, just moved from inline strings to versioned mustache templates. --- 🤖 Co-authored with AI assistance. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Viduni Wickramarachchi <viduni.wickramarachchi@elastic.co>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #255869
Summary
Converts the two LLM evaluators in feature duplication evals (
createSemanticUniquenessEvaluatorandcreateIdConsistencyEvaluator) frominferenceClient.output()with inline system prompts to thecreatePrompt+executeUntilValidpattern used by@kbn/evals. System prompts are extracted into.textmustache template files, and tool schemas are preserved as prompt version tool definitions. The CODE evaluator (featureDuplicationEvaluator) is unchanged. All evaluator names, signatures, and return shapes are preserved.Checklist
release_note:breakinglabel should be applied in these situations.release_note:*label is applied per the guidelinesbackport:*labels.Identify risks
No significant risks. This is a structural refactor of evaluator internals — the evaluator contracts (names, signatures, return shapes) are unchanged. The LLM prompts are semantically identical, just moved from inline strings to versioned mustache templates.
🤖 Co-authored with AI assistance.