Skip to content

[Automatic Import V2] Upload samples using an existing index#258074

Merged
bhapas merged 17 commits intoelastic:mainfrom
bhapas:aiv2_index_import
Mar 18, 2026
Merged

[Automatic Import V2] Upload samples using an existing index#258074
bhapas merged 17 commits intoelastic:mainfrom
bhapas:aiv2_index_import

Conversation

@bhapas
Copy link
Copy Markdown
Contributor

@bhapas bhapas commented Mar 17, 2026

Summary

Adds support for loading samples from an existing Elasticsearch index into the automatic-import-samples index via the same upload API used for file uploads. The index is queried for up to 1000 random documents that have event.original; those values are stored as samples. Request schema (OpenAPI + generated types) and the data stream saved object schema are updated to support the new flow.

Changes

Upload API (same route, extended body)

  • Route: POST .../integrations/{id}/data_streams/{id}/upload now accepts either:
    • File path: samples (string[]) + originalSource (e.g. sourceType: 'file', sourceValue: 'file.log'), or
    • Index path: sourceIndex (index name) + originalSource (e.g. sourceType: 'index', sourceValue: '<index>').
  • Backend: When sourceIndex is present, the handler runs an ES search on that index with exists: { field: 'event.original' }, function_score + random_score, size: 1000, _source: ['event.original'], then maps hits to rawSamples and calls addSamplesToDataStream (unchanged). Returns 400 if no documents with event.original are found.
  • Validation: Request body must include originalSource and at least one of samples or sourceIndex (enforced with a Zod .refine() in the route).

Request schema

  • OpenAPI (data_stream.schema.yaml): Upload request body now has required: [originalSource]; samples and sourceIndex are optional. Description updated for the two modes.
  • Generated types (data_stream.gen.ts): UploadSamplesToDataStreamRequestBody has optional samples and sourceIndex, required originalSource.
  • Tests (data_stream.test.ts): Adjusted for optional samples; added “accepts payload with sourceIndex and originalSource” and “rejects empty sourceIndex when provided”.

Client & UI

  • API (api.ts): UploadSamplesRequest and uploadSamplesToDataStream support optional samples and optional sourceIndex; body is built from whichever is provided plus originalSource.
  • Create Data Stream flyout: When “Select index” is used and the user clicks “Analyze Logs”, it calls uploadSamplesMutation.mutateAsync with sourceIndex and originalSource: { sourceType: 'index', sourceValue: selectedIndex } (no new hook or route).

Data stream saved object schema

  • data_stream_schema.ts: Under metadata, added optional original_source: { source_type: 'file' | 'index', source_value: string } so the data stream can record where its samples came from (for display/audit). Persistence of this field is not implemented in this PR.

Tests

  • Route tests (data_stream_routes.test.ts): Cover file upload path (samples only), index path (search + addSamplesToDataStream), index with no event.original (400), and filtering of hits without valid event.original.

@bhapas bhapas self-assigned this Mar 17, 2026
@bhapas bhapas added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:Integration-Experience Security integrations Integration Experience [elastic/integration-experience] labels Mar 17, 2026
@bhapas bhapas marked this pull request as ready for review March 17, 2026 10:18
@bhapas bhapas requested a review from a team as a code owner March 17, 2026 10:18
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/integration-experience (Team:Integration-Experience)

@bhapas bhapas marked this pull request as draft March 18, 2026 10:46
@bhapas bhapas marked this pull request as ready for review March 18, 2026 11:39
@elasticmachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Scout: [ platform / workflows_management ] plugin / local-serverless-observability_complete - Workflow execution concurrency control - drop strategy drops new executions until there is an already running execution
  • [job] [logs] Scout: [ platform / workflows_management ] plugin / local-serverless-security_complete - Workflow execution concurrency control - drop strategy drops new executions until there is an already running execution
  • [job] [logs] Scout: [ platform / dashboard-stateful-classic ] plugin / local-stateful-classic - dashboard REST schema - Registered embeddable schemas have not changed
  • [job] [logs] FTR Configs #147 / serverless observability UI - ML and Discover discover/observabilitySolution/context_awareness extension getRowIndicatorProvider should render log.level row indicators on Surrounding documents page

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
automaticImportVTwo 105.5KB 105.8KB +263.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
automaticImportVTwo 9.2KB 9.2KB +48.0B

History

cc @bhapas

Copy link
Copy Markdown
Contributor

@robester0403 robester0403 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM small NIT

@bhapas bhapas merged commit 1357f06 into elastic:main Mar 18, 2026
20 checks passed
mbondyra added a commit to mbondyra/kibana that referenced this pull request Mar 19, 2026
…d_agent_navigation2

* commit '9289d6b5502db245e645e190b0246554396c6c20': (34 commits)
  [api-docs] 2026-03-19 Daily api_docs build (elastic#258471)
  [Shared UX][DateRangePicker] Missing parts (elastic#258229)
  [Dashboard] Keep pinned_panels separate in read response (elastic#258444)
  Move inheritance: true to top level in .coderabbit.yml (elastic#258461)
  [DOCS] 9.3.2 Kibana release notes (elastic#257332)
  adds routing accept metric attribute to the cps metric (elastic#258168)
  [ML] AI/Inference Connector creation: use 'location' field to correctly set provider config  (elastic#250838)
  [Lens] Add e2e test for legend list layout (elastic#258160)
  [SigEvents] Convert feature duplication evaluators to createPrompt pattern (elastic#256534)
  Add actionable-obs author to .coderabbit.yml (elastic#257922)
  [DOCS] 9.2.7 Kibana release notes (elastic#257331)
  Grant Serverless editor/viewer access to ES v2 indices (elastic#258384)
  [SigEvents][Evals] Rename terminology for KI features and KI queries (elastic#258361)
  [EDR Workflows][Osquery] Add shared table toolbar components and redesign saved queries list (elastic#258394)
  [Automatic Import V2] Upload samples using an existing index (elastic#258074)
  Add GET /inference_features route to expose feature registry (elastic#258044)
  fix additional fields not included (elastic#257625)
  [Discover] [Metrics] Add tier 2 journeys for Metrics in Discover E2E (elastic#255036)
  [Lens as code] Support correct X-Axis types in ES|QL visualizations (elastic#258159)
  Update APM (main) (elastic#254880)
  ...
flash1293 pushed a commit to flash1293/kibana that referenced this pull request Mar 19, 2026
…#258074)

## Summary

Adds support for loading samples from an existing Elasticsearch index
into the automatic-import-samples index via the same upload API used for
file uploads. The index is queried for up to 1000 random documents that
have `event.original`; those values are stored as samples. Request
schema (OpenAPI + generated types) and the data stream saved object
schema are updated to support the new flow.

## Changes

### Upload API (same route, extended body)
- **Route:** `POST .../integrations/{id}/data_streams/{id}/upload` now
accepts either:
- **File path:** `samples` (string[]) + `originalSource` (e.g.
`sourceType: 'file'`, `sourceValue: 'file.log'`), or
- **Index path:** `sourceIndex` (index name) + `originalSource` (e.g.
`sourceType: 'index'`, `sourceValue: '<index>'`).
- **Backend:** When `sourceIndex` is present, the handler runs an ES
search on that index with `exists: { field: 'event.original' }`,
`function_score` + `random_score`, `size: 1000`, `_source:
['event.original']`, then maps hits to `rawSamples` and calls
`addSamplesToDataStream` (unchanged). Returns 400 if no documents with
`event.original` are found.
- **Validation:** Request body must include `originalSource` and at
least one of `samples` or `sourceIndex` (enforced with a Zod `.refine()`
in the route).

### Request schema
- **OpenAPI** (`data_stream.schema.yaml`): Upload request body now has
`required: [originalSource]`; `samples` and `sourceIndex` are optional.
Description updated for the two modes.
- **Generated types** (`data_stream.gen.ts`):
`UploadSamplesToDataStreamRequestBody` has optional `samples` and
`sourceIndex`, required `originalSource`.
- **Tests** (`data_stream.test.ts`): Adjusted for optional `samples`;
added “accepts payload with sourceIndex and originalSource” and “rejects
empty sourceIndex when provided”.

### Client & UI
- **API** (`api.ts`): `UploadSamplesRequest` and
`uploadSamplesToDataStream` support optional `samples` and optional
`sourceIndex`; body is built from whichever is provided plus
`originalSource`.
- **Create Data Stream flyout:** When “Select index” is used and the
user clicks “Analyze Logs”, it calls `uploadSamplesMutation.mutateAsync`
with `sourceIndex` and `originalSource: { sourceType: 'index',
sourceValue: selectedIndex }` (no new hook or route).

### Data stream saved object schema
- **`data_stream_schema.ts`:** Under `metadata`, added optional
`original_source: { source_type: 'file' | 'index', source_value: string
}` so the data stream can record where its samples came from (for
display/audit). Persistence of this field is not implemented in this PR.

### Tests
- **Route tests** (`data_stream_routes.test.ts`): Cover file upload path
(samples only), index path (search + addSamplesToDataStream), index with
no `event.original` (400), and filtering of hits without valid
`event.original`.
jeramysoucy pushed a commit to jeramysoucy/kibana that referenced this pull request Mar 26, 2026
…#258074)

## Summary

Adds support for loading samples from an existing Elasticsearch index
into the automatic-import-samples index via the same upload API used for
file uploads. The index is queried for up to 1000 random documents that
have `event.original`; those values are stored as samples. Request
schema (OpenAPI + generated types) and the data stream saved object
schema are updated to support the new flow.

## Changes

### Upload API (same route, extended body)
- **Route:** `POST .../integrations/{id}/data_streams/{id}/upload` now
accepts either:
- **File path:** `samples` (string[]) + `originalSource` (e.g.
`sourceType: 'file'`, `sourceValue: 'file.log'`), or
- **Index path:** `sourceIndex` (index name) + `originalSource` (e.g.
`sourceType: 'index'`, `sourceValue: '<index>'`).
- **Backend:** When `sourceIndex` is present, the handler runs an ES
search on that index with `exists: { field: 'event.original' }`,
`function_score` + `random_score`, `size: 1000`, `_source:
['event.original']`, then maps hits to `rawSamples` and calls
`addSamplesToDataStream` (unchanged). Returns 400 if no documents with
`event.original` are found.
- **Validation:** Request body must include `originalSource` and at
least one of `samples` or `sourceIndex` (enforced with a Zod `.refine()`
in the route).

### Request schema
- **OpenAPI** (`data_stream.schema.yaml`): Upload request body now has
`required: [originalSource]`; `samples` and `sourceIndex` are optional.
Description updated for the two modes.
- **Generated types** (`data_stream.gen.ts`):
`UploadSamplesToDataStreamRequestBody` has optional `samples` and
`sourceIndex`, required `originalSource`.
- **Tests** (`data_stream.test.ts`): Adjusted for optional `samples`;
added “accepts payload with sourceIndex and originalSource” and “rejects
empty sourceIndex when provided”.

### Client & UI
- **API** (`api.ts`): `UploadSamplesRequest` and
`uploadSamplesToDataStream` support optional `samples` and optional
`sourceIndex`; body is built from whichever is provided plus
`originalSource`.
- **Create Data Stream flyout:** When “Select index” is used and the
user clicks “Analyze Logs”, it calls `uploadSamplesMutation.mutateAsync`
with `sourceIndex` and `originalSource: { sourceType: 'index',
sourceValue: selectedIndex }` (no new hook or route).

### Data stream saved object schema
- **`data_stream_schema.ts`:** Under `metadata`, added optional
`original_source: { source_type: 'file' | 'index', source_value: string
}` so the data stream can record where its samples came from (for
display/audit). Persistence of this field is not implemented in this PR.

### Tests
- **Route tests** (`data_stream_routes.test.ts`): Cover file upload path
(samples only), index path (search + addSamplesToDataStream), index with
no `event.original` (400), and filtering of hits without valid
`event.original`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting Feature:AutomaticImport release_note:skip Skip the PR/issue when compiling release notes Team:Integration-Experience Security integrations Integration Experience [elastic/integration-experience] v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants