Skip to content

[SigEvents][Evals] Rename terminology for KI features and KI queries#258361

Merged
viduni94 merged 4 commits intoelastic:mainfrom
viduni94:sigevents-evals-renaming
Mar 18, 2026
Merged

[SigEvents][Evals] Rename terminology for KI features and KI queries#258361
viduni94 merged 4 commits intoelastic:mainfrom
viduni94:sigevents-evals-renaming

Conversation

@viduni94
Copy link
Copy Markdown
Contributor

@viduni94 viduni94 commented Mar 18, 2026

Summary

Updated SigEvents terminology as below:

  • Feature extraction --> KI feature extraction
  • Query generation --> KI query generation

Checklist

  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines
  • Review the backport guidelines and apply applicable backport:* labels.

Summary by CodeRabbit

Release Notes

  • Documentation

    • Updated evaluation suite documentation with clarified terminology around feature extraction and query generation workflows.
  • Refactor

    • Standardized naming conventions across evaluation components for improved consistency and clarity.
    • Renamed evaluators and dataset references to better reflect their functional purpose.
    • Updated environment variable names for clearer configuration semantics.

@viduni94 viduni94 self-assigned this Mar 18, 2026
@viduni94 viduni94 added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Feature:SigEvents Significant events feature, related to streams and rules/alerts (RnA) models:eis/google-gemini-3.0-flash Run LLM evals against model: eis/google-gemini-3.0-flash models:eis/openai-gpt-oss-120b Run LLM evals against model: eis/openai-gpt-oss-120b Team:SigEvents Project team working on Significant Events (deprecated) evals:streams-sigevents This label is deprecated. Use `evals:significant-events` to run the Significant Events eval suite. models:judge:eis/google-gemini-3.1-pro Override LLM-as-a-judge connector for evals: eis/google-gemini-3.1-pro labels Mar 18, 2026
@viduni94
Copy link
Copy Markdown
Contributor Author

/ci

@viduni94 viduni94 added models:eis/anthropic-claude-4.6-opus Run LLM evals against model: eis/anthropic-claude-4.6-opus models:eis/gp-llm-v2 Run LLM evals against model: eis/gp-llm-v2 models:eis/openai-gpt-5.2 Run LLM evals against model: eis/openai-gpt-5.2 models:eis/openai-gpt-oss-20b Run LLM evals against model: eis/openai-gpt-oss-20b models:eis/anthropic-claude-4.6-sonnet Run LLM evals against model: eis/anthropic-claude-4.6-sonnet models:eis/google-gemini-3.1-pro Run LLM evals against model: eis/google-gemini-3.1-pro models:llm-gateway/claude-haiku-4-5 Run LLM evals against model: llm-gateway/claude-haiku-4-5 labels Mar 18, 2026
@viduni94 viduni94 marked this pull request as ready for review March 18, 2026 15:34
@viduni94 viduni94 requested review from a team as code owners March 18, 2026 15:34
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

This pull request systematically renames terminology across the significant events evaluation suite: "KI extraction" becomes "KI feature extraction," "KI duplication" becomes "KI feature duplication," and "rule generation" becomes "KI query generation." The refactor updates type names, function signatures, constants, variable names, file paths, and documentation throughout the codebase while preserving underlying logic.

Changes

Cohort / File(s) Summary
Dataset Type Definitions
datasets/types.ts, datasets/index.ts, datasets/otel_demo.ts
Renamed interfaces and fields from RuleGenerationScenario/KIExtractionScenario to KIQueryGenerationScenario/KIFeatureExtractionScenario; updated DatasetConfig field names from ruleGeneration/kiExtraction to kiQueryGeneration/kiFeatureExtraction; imported type changed from ValidKIType to ValidKIFeatureType.
KI Feature Extraction Evaluators
src/evaluators/ki_feature_extraction_evaluators.ts, src/evaluators/ki_feature_extraction_evaluators.test.ts
Renamed public constants, types, and functions: VALID_KI_TYPESVALID_KI_FEATURE_TYPES, ValidKITypeValidKIFeatureType, KIExtractionEvaluationExampleKIFeatureExtractionEvaluationExample, createKIExtractionEvaluatorscreateKIFeatureExtractionEvaluators, ki_countki_feature_count; updated evaluator logic to operate on features instead of KIs.
KI Feature Extraction Test Suite
evals/significant_events/ki_feature_extraction/ki_feature_extraction.spec.ts, evals/significant_events/ki_feature_extraction/collect_sample_documents.ts
Renamed evaluators import source from ki_extraction_evaluators to ki_feature_extraction_evaluators, updated function calls from createKIExtractionEvaluators to createKIFeatureExtractionEvaluators, updated scenario type parameter from KIExtractionScenario to KIFeatureExtractionScenario.
KI Feature Duplication Evaluators
src/evaluators/ki_feature_duplication_evaluators.ts, evals/significant_events/ki_feature_duplication/*
Renamed exported constant from kiDuplicationEvaluator to kiFeatureDuplicationEvaluator, updated internal identifier from ki_duplication to ki_feature_duplication, renamed dataset constant from KI_DUPLICATION_DATASETS to KI_FEATURE_DUPLICATION_DATASETS.
KI Query Generation Evaluators
src/evaluators/ki_query_generation_evaluators.ts, src/evaluators/ki_query_generation_evaluators.test.ts
Renamed interfaces from RuleGeneration* to KIQueryGeneration* (e.g., RuleGenerationEvaluationExampleKIQueryGenerationEvaluationExample), renamed functions createRuleGenerationEvaluatorscreateKIQueryGenerationEvaluators, renamed evaluator identifier from rule_generation_code_evaluator to ki_query_generation_code_evaluator.
KI Query Generation Test Suite
evals/significant_events/ki_query_generation/ki_query_generation.spec.ts, evals/significant_events/ki_query_generation/resolve_ki_sources.ts, evals/significant_events/ki_query_generation/get_computed_ki_features_from_docs.ts
Updated dataset iteration to use kiQueryGeneration instead of ruleGeneration, renamed data generators from canonicalKIsFromExpectedGroundTruth to canonicalKIFeaturesFromExpectedGroundTruth, renamed loadKIsFromSnapshot to loadKIFeaturesFromSnapshot, updated environment variable from RULE_GENERATION_KI_SOURCE to KI_QUERY_GENERATION_KI_FEATURE_SOURCE.
Data Generators
src/data_generators/canonical_ki_features.ts, src/data_generators/canonical_ki_features.test.ts, src/data_generators/load_ki_features_from_snapshot.ts, src/data_generators/load_ki_features_from_snapshot.test.ts, src/data_generators/sigevents_ki_features_index.ts, src/data_generators/sigevents_ki_features_index.test.ts, src/data_generators/replay.ts
Renamed exported functions and constants across data generation modules: canonicalKIsFromExpectedGroundTruthcanonicalKIFeaturesFromExpectedGroundTruth, loadKIsFromSnapshotloadKIFeaturesFromSnapshot, KIS_TEMP_INDEXKI_FEATURES_TEMP_INDEX, getSigeventsSnapshotKIsIndexgetSigeventsSnapshotKIFeaturesIndex; deleted old canonical_kis.test.ts and created canonical_ki_features.test.ts.
Workflow Scripts
scripts/significant_events_snapshots/lib/gcs.ts, scripts/significant_events_snapshots/lib/significant_events_workflow.ts
Updated function imports from getSigeventsSnapshotKIsIndex to getSigeventsSnapshotKIFeaturesIndex, updated variable names from kiIndex to kiFeaturesIndex.
Documentation
evals/significant_events/README.md
Updated terminology throughout suite table headings, spec file paths, evaluator references, environment variables, and example code to reflect KI feature extraction/duplication and KI query generation naming conventions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • crespocarlos
  • spong
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: renaming terminology throughout the codebase from KI extraction/duplication to KI feature extraction/duplication, and from rule generation to KI query generation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

Tools execution failed with the following error:

Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)


Comment @coderabbitai help to get the list of available commands and usage tips.

import type { BoundInferenceClient } from '@kbn/inference-common';
import { evaluate } from '../../../src/evaluate';
import { KI_DUPLICATION_DATASETS } from './ki_duplication_datasets';
import { KI_FEATURE_DUPLICATION_DATASETS } from './ki_feature_duplication_datasets';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side node: Is this evaluating duplication or deduplication ;-)

Copy link
Copy Markdown
Contributor Author

@viduni94 viduni94 Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's evaluating duplication (by scoring uniqueness)
Correct me if I'm wrong @klacabane

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we're evaluating duplication :)

} from './sigevents_ki_features_index';

describe('sigevents_kis_index', () => {
describe('sigevents_ki_features_index', () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the streams_ki_features_index

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In here's it's not referring to the streams_ki_features_index.
It's referring to the snapshotable copy that the Significant Events snapshot workflow writes so it can be included in ES snapshots: sigevents-streams-features-<scenario>

Copy link
Copy Markdown
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Did skim through it and looks aligned.

I left some nits, don't hold back on this from getting it merged. We can keep iterating on it, it is not critical. Important is that the basics are aligned.

@viduni94 viduni94 removed models:eis/gp-llm-v2 Run LLM evals against model: eis/gp-llm-v2 models:llm-gateway/claude-haiku-4-5 Run LLM evals against model: llm-gateway/claude-haiku-4-5 models:eis/google-gemini-3.0-flash Run LLM evals against model: eis/google-gemini-3.0-flash models:eis/openai-gpt-oss-20b Run LLM evals against model: eis/openai-gpt-oss-20b labels Mar 18, 2026
@viduni94 viduni94 added models:judge:llm-gateway/gemini-3.1-pro-preview Override LLM-as-a-judge connector for evals: llm-gateway/gemini-3.1-pro-preview and removed models:eis/google-gemini-3.1-pro Run LLM evals against model: eis/google-gemini-3.1-pro models:judge:eis/google-gemini-3.1-pro Override LLM-as-a-judge connector for evals: eis/google-gemini-3.1-pro labels Mar 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.ts (1)

319-321: ⚠️ Potential issue | 🟡 Minor

Residual “generated rules” wording should be renamed to “generated queries.”

The success explanation still uses old terminology, which can confuse users and break string-based expectations around the KI query rename.

💡 Suggested patch
-        : `All ${queries.length} generated rules passed code validation`,
+        : `All ${queries.length} generated queries passed code validation`,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.ts`
around lines 319 - 321, The success message still uses the old phrase "generated
rules"; update the string in the ternary expression that builds the explanation
(the branch that currently returns `All ${queries.length} generated rules passed
code validation`) to use "generated queries" instead. Locate the construction
that references issues, score, and queries (the expression using issues.join and
`All ${queries.length} ...`) in the KI query generation evaluator and replace
the wording accordingly, and run/update any tests or consumers that assert on
that exact message literal.
🧹 Nitpick comments (1)
x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.test.ts (1)

134-134: Minor inconsistency: describe block name doesn't match new terminology.

The describe block says 'ki count evaluator' but the evaluator is now named 'ki_feature_count'. Consider updating for consistency:

-describe('ki count evaluator', () => {
+describe('ki feature count evaluator', () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.test.ts`
at line 134, Update the test suite description to match the evaluator's current
name: change the describe block string from 'ki count evaluator' to
'ki_feature_count evaluator' (or similar consistent phrasing) so it aligns with
the evaluator identifier ki_feature_count used in the tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.ts`:
- Around line 319-321: The success message still uses the old phrase "generated
rules"; update the string in the ternary expression that builds the explanation
(the branch that currently returns `All ${queries.length} generated rules passed
code validation`) to use "generated queries" instead. Locate the construction
that references issues, score, and queries (the expression using issues.join and
`All ${queries.length} ...`) in the KI query generation evaluator and replace
the wording accordingly, and run/update any tests or consumers that assert on
that exact message literal.

---

Nitpick comments:
In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.test.ts`:
- Line 134: Update the test suite description to match the evaluator's current
name: change the describe block string from 'ki count evaluator' to
'ki_feature_count evaluator' (or similar consistent phrasing) so it aligns with
the evaluator identifier ki_feature_count used in the tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 65860eeb-44b2-428f-a124-df431f39f97f

📥 Commits

Reviewing files that changed from the base of the PR and between 5538e8c and 5a1fdba.

📒 Files selected for processing (27)
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/README.md
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/datasets/index.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/datasets/otel_demo.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/datasets/types.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_feature_duplication/ki_feature_duplication.spec.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_feature_duplication/ki_feature_duplication_datasets.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_feature_extraction/collect_sample_documents.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_feature_extraction/ki_feature_extraction.spec.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_query_generation/extract_log_text.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_query_generation/get_computed_ki_features_from_docs.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_query_generation/ki_query_generation.spec.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_query_generation/resolve_ki_sources.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/scripts/significant_events_snapshots/lib/gcs.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/scripts/significant_events_snapshots/lib/significant_events_workflow.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/canonical_ki_features.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/canonical_ki_features.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/canonical_kis.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/load_ki_features_from_snapshot.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/load_ki_features_from_snapshot.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/replay.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/sigevents_ki_features_index.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/sigevents_ki_features_index.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_duplication_evaluators.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.ts
💤 Files with no reviewable changes (1)
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/canonical_kis.test.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.test.ts (1)

30-33: Avoid the non-null assertion when selecting the evaluator.

Line 33 uses !, which makes failures less explicit if the evaluator name drifts. Prefer an explicit guard with a clear error.

Proposed fix
 const getKIQueryGenerationCodeEvaluator = (esClient: ElasticsearchClient) =>
-  createKIQueryGenerationEvaluators(esClient).find(
-    (evaluator) => evaluator.name === 'ki_query_generation_code_evaluator'
-  )!;
+  {
+    const evaluator = createKIQueryGenerationEvaluators(esClient).find(
+      (item) => item.name === 'ki_query_generation_code_evaluator'
+    );
+    if (!evaluator) {
+      throw new Error('ki_query_generation_code_evaluator was not registered');
+    }
+    return evaluator;
+  };

As per coding guidelines: "Avoid non-null assertions (!) unless locally justified in TypeScript."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.test.ts`
around lines 30 - 33, The test uses a non-null assertion when selecting the
evaluator which hides failures if the evaluator list changes; update
getKIQueryGenerationCodeEvaluator to explicitly check the result of
createKIQueryGenerationEvaluators(esClient).find(...) (looking for
evaluator.name === 'ki_query_generation_code_evaluator') and throw a clear error
(e.g., new Error with message identifying 'ki_query_generation_code_evaluator'
not found) when the evaluator is undefined so failures are explicit and
informative.
x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.test.ts (1)

12-17: Create evaluators once and reuse for both lookups.

You can avoid duplicate factory calls by creating one evaluator list and reusing it for find().

♻️ Suggested refactor
-const evidenceGroundingEvaluator = createKIFeatureExtractionEvaluators().find(
+const kiFeatureExtractionEvaluators = createKIFeatureExtractionEvaluators();
+
+const evidenceGroundingEvaluator = kiFeatureExtractionEvaluators.find(
   (evaluator) => evaluator.name === 'evidence_grounding'
 );
-const kiFeatureCountEvaluator = createKIFeatureExtractionEvaluators().find(
+const kiFeatureCountEvaluator = kiFeatureExtractionEvaluators.find(
   (evaluator) => evaluator.name === 'ki_feature_count'
 );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.test.ts`
around lines 12 - 17, The test calls createKIFeatureExtractionEvaluators()
twice; instead, call it once, assign the returned array to a variable (e.g.,
evaluators), and then reuse that array for the two find() lookups to obtain
evidenceGroundingEvaluator and kiFeatureCountEvaluator; update references to use
the single evaluators variable and remove the duplicate factory invocation in
createKIFeatureExtractionEvaluators().
x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/load_ki_features_from_snapshot.test.ts (1)

98-123: Assert the source snapshot index in this happy-path test.

This only pins the temp rename target today. Because allowNoMatches: true, a bad getSigeventsSnapshotKIFeaturesIndex() rename can quietly degrade into an empty restore and this test would still pass. Please also assert the indices argument passed to restoreSnapshot.

✅ Suggested test tightening
+import { getSigeventsSnapshotKIFeaturesIndex } from './sigevents_ki_features_index';
...
     expect(mockRestoreSnapshot).toHaveBeenCalledWith(
       expect.objectContaining({
         snapshotName: 'payment-unreachable',
+        indices: [getSigeventsSnapshotKIFeaturesIndex('payment-unreachable')],
         renamePattern: '(.+)',
         renameReplacement: KI_FEATURES_TEMP_INDEX,
         allowNoMatches: true,
       })
     );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/load_ki_features_from_snapshot.test.ts`
around lines 98 - 123, The test currently only asserts the rename target for
restoreSnapshot, leaving the source snapshot index unchecked and allowing
misnamed source indices to slip by; add an assertion on mockRestoreSnapshot to
include the indices field (the source snapshot index) — either compute the
expected value by calling
getSigeventsSnapshotKIFeaturesIndex('payment-unreachable') (or importing the
source-index constant used by loadKIFeaturesFromSnapshot) and then assert
expect(mockRestoreSnapshot).toHaveBeenCalledWith(expect.objectContaining({
indices: expectedIndex })), ensuring the restore call uses the correct source
index.
x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.ts (1)

449-476: Add an explicit return type to this exported factory.

This public helper currently relies on a fairly wide inferred return type across the base path and the scenario-criteria path. Pinning it will make future refactors fail at the declaration instead of widening the evaluator API silently.

✍️ Suggested tightening
 export const createKIFeatureExtractionEvaluators = (scenarioCriteria?: {
   criteriaFn: (criteria: EvaluationCriterion[]) => Evaluator;
   criteria: EvaluationCriterion[];
-}) => {
+}): ReadonlyArray<KIFeatureExtractionEvaluator> => {
As per coding guidelines, "Prefer explicit return types for public APIs and exported functions in TypeScript."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.ts`
around lines 449 - 476, The exported factory createKIFeatureExtractionEvaluators
currently relies on inference across the base (selectEvaluators) and scenario
branch (createScenarioCriteriaLlmEvaluator) paths; add an explicit return type
to the function signature (for example Evaluator[] or the more specific
Evaluator<KIFeatureExtractionEvaluationExample, KIFeatureExtractionOutput>[] if
you want to constrain elements) so the public API is pinned — update the
signature of createKIFeatureExtractionEvaluators to include that return type and
import the Evaluator type if necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/load_ki_features_from_snapshot.ts`:
- Around line 16-17: The code uses a single global constant
KI_FEATURES_TEMP_INDEX ('sigevents-replay-temp-features') causing collisions
across concurrent runs; change to generate a unique temp index per invocation
(e.g., append a UUID or timestamp) and pass that generated name into the
restore, search, and cleanup flows instead of the global KI_FEATURES_TEMP_INDEX
constant—update all places that reference KI_FEATURES_TEMP_INDEX (the restore
call, search/scroll logic, and delete/cleanup logic mentioned in the comment
ranges) to accept and use a per-call tempIndex parameter so each run restores,
queries, and deletes its own index while keeping KI_FEATURES_SEARCH_LIMIT
unchanged.

---

Nitpick comments:
In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/load_ki_features_from_snapshot.test.ts`:
- Around line 98-123: The test currently only asserts the rename target for
restoreSnapshot, leaving the source snapshot index unchecked and allowing
misnamed source indices to slip by; add an assertion on mockRestoreSnapshot to
include the indices field (the source snapshot index) — either compute the
expected value by calling
getSigeventsSnapshotKIFeaturesIndex('payment-unreachable') (or importing the
source-index constant used by loadKIFeaturesFromSnapshot) and then assert
expect(mockRestoreSnapshot).toHaveBeenCalledWith(expect.objectContaining({
indices: expectedIndex })), ensuring the restore call uses the correct source
index.

In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.test.ts`:
- Around line 12-17: The test calls createKIFeatureExtractionEvaluators() twice;
instead, call it once, assign the returned array to a variable (e.g.,
evaluators), and then reuse that array for the two find() lookups to obtain
evidenceGroundingEvaluator and kiFeatureCountEvaluator; update references to use
the single evaluators variable and remove the duplicate factory invocation in
createKIFeatureExtractionEvaluators().

In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.ts`:
- Around line 449-476: The exported factory createKIFeatureExtractionEvaluators
currently relies on inference across the base (selectEvaluators) and scenario
branch (createScenarioCriteriaLlmEvaluator) paths; add an explicit return type
to the function signature (for example Evaluator[] or the more specific
Evaluator<KIFeatureExtractionEvaluationExample, KIFeatureExtractionOutput>[] if
you want to constrain elements) so the public API is pinned — update the
signature of createKIFeatureExtractionEvaluators to include that return type and
import the Evaluator type if necessary.

In
`@x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.test.ts`:
- Around line 30-33: The test uses a non-null assertion when selecting the
evaluator which hides failures if the evaluator list changes; update
getKIQueryGenerationCodeEvaluator to explicitly check the result of
createKIQueryGenerationEvaluators(esClient).find(...) (looking for
evaluator.name === 'ki_query_generation_code_evaluator') and throw a clear error
(e.g., new Error with message identifying 'ki_query_generation_code_evaluator'
not found) when the evaluator is undefined so failures are explicit and
informative.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 823eb02f-894c-4f51-94ba-166add805585

📥 Commits

Reviewing files that changed from the base of the PR and between 5538e8c and 5a1fdba.

📒 Files selected for processing (27)
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/README.md
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/datasets/index.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/datasets/otel_demo.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/datasets/types.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_feature_duplication/ki_feature_duplication.spec.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_feature_duplication/ki_feature_duplication_datasets.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_feature_extraction/collect_sample_documents.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_feature_extraction/ki_feature_extraction.spec.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_query_generation/extract_log_text.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_query_generation/get_computed_ki_features_from_docs.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_query_generation/ki_query_generation.spec.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/evals/significant_events/ki_query_generation/resolve_ki_sources.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/scripts/significant_events_snapshots/lib/gcs.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/scripts/significant_events_snapshots/lib/significant_events_workflow.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/canonical_ki_features.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/canonical_ki_features.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/canonical_kis.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/load_ki_features_from_snapshot.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/load_ki_features_from_snapshot.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/replay.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/sigevents_ki_features_index.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/sigevents_ki_features_index.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_duplication_evaluators.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_feature_extraction_evaluators.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.test.ts
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/evaluators/ki_query_generation_evaluators.ts
💤 Files with no reviewable changes (1)
  • x-pack/platform/packages/shared/kbn-evals-suite-streams/src/data_generators/canonical_kis.test.ts

@viduni94 viduni94 added models:judge:eis/google-gemini-3.1-pro Override LLM-as-a-judge connector for evals: eis/google-gemini-3.1-pro and removed models:judge:llm-gateway/gemini-3.1-pro-preview Override LLM-as-a-judge connector for evals: llm-gateway/gemini-3.1-pro-preview labels Mar 18, 2026
@viduni94 viduni94 removed models:eis/anthropic-claude-4.6-opus Run LLM evals against model: eis/anthropic-claude-4.6-opus models:eis/openai-gpt-5.2 Run LLM evals against model: eis/openai-gpt-5.2 models:eis/openai-gpt-oss-120b Run LLM evals against model: eis/openai-gpt-oss-120b (deprecated) evals:streams-sigevents This label is deprecated. Use `evals:significant-events` to run the Significant Events eval suite. models:eis/anthropic-claude-4.6-sonnet Run LLM evals against model: eis/anthropic-claude-4.6-sonnet models:judge:eis/google-gemini-3.1-pro Override LLM-as-a-judge connector for evals: eis/google-gemini-3.1-pro labels Mar 18, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

cc @viduni94

@viduni94 viduni94 merged commit 65976cd into elastic:main Mar 18, 2026
17 checks passed
mbondyra added a commit to mbondyra/kibana that referenced this pull request Mar 19, 2026
…d_agent_navigation2

* commit '9289d6b5502db245e645e190b0246554396c6c20': (34 commits)
  [api-docs] 2026-03-19 Daily api_docs build (elastic#258471)
  [Shared UX][DateRangePicker] Missing parts (elastic#258229)
  [Dashboard] Keep pinned_panels separate in read response (elastic#258444)
  Move inheritance: true to top level in .coderabbit.yml (elastic#258461)
  [DOCS] 9.3.2 Kibana release notes (elastic#257332)
  adds routing accept metric attribute to the cps metric (elastic#258168)
  [ML] AI/Inference Connector creation: use 'location' field to correctly set provider config  (elastic#250838)
  [Lens] Add e2e test for legend list layout (elastic#258160)
  [SigEvents] Convert feature duplication evaluators to createPrompt pattern (elastic#256534)
  Add actionable-obs author to .coderabbit.yml (elastic#257922)
  [DOCS] 9.2.7 Kibana release notes (elastic#257331)
  Grant Serverless editor/viewer access to ES v2 indices (elastic#258384)
  [SigEvents][Evals] Rename terminology for KI features and KI queries (elastic#258361)
  [EDR Workflows][Osquery] Add shared table toolbar components and redesign saved queries list (elastic#258394)
  [Automatic Import V2] Upload samples using an existing index (elastic#258074)
  Add GET /inference_features route to expose feature registry (elastic#258044)
  fix additional fields not included (elastic#257625)
  [Discover] [Metrics] Add tier 2 journeys for Metrics in Discover E2E (elastic#255036)
  [Lens as code] Support correct X-Axis types in ES|QL visualizations (elastic#258159)
  Update APM (main) (elastic#254880)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting Feature:SigEvents Significant events feature, related to streams and rules/alerts (RnA) release_note:skip Skip the PR/issue when compiling release notes Team:SigEvents Project team working on Significant Events v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants