Commit 6378ff3
# Backport
This will backport the following commits from `main` to `8.x`:
- [[Auto Import] CSV format support
(#194386)](#194386)
<!--- Backport version: 9.4.3 -->
### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)
<!--BACKPORT [{"author":{"name":"Ilya
Nikokoshev","email":"ilya.nikokoshev@elastic.co"},"sourceCommit":{"committedDate":"2024-10-14T10:24:58Z","message":"[Auto
Import] CSV format support (#194386)\n\n## Release
Notes\r\n\r\nAutomatic Import can now create integrations for logs in
the CSV format.\r\nOwing to the maturity of log format support, we thus
remove the verbiage\r\nabout requiring the JSON/NDJSON format.\r\n\r\n##
Summary\r\n\r\n**Added: the CSV feature**\r\n\r\nThe issue is
#194342 \r\n\r\nWhen the user
adds a log sample whose format is recognized as CSV by the\r\nLLM, we
now parse the samples and insert
the\r\n[csv](https://www.elastic.co/guide/en/elasticsearch/reference/current/csv-processor.html)\r\nprocessor
into the generated pipeline.\r\n\r\nIf the header is present, we use it
for the field names and add
a\r\n[drop](https://www.elastic.co/guide/en/elasticsearch/reference/current/drop-processor.html)\r\nprocessor
that removes a header from the document stream by comparing\r\nthe
values to the header values.\r\n\r\nIf the header is missing, we ask the
LLM to generate a list of column\r\nnames, providing some context like
package and data stream title.\r\n\r\nShould the header or LLM
suggestion provide unsuitable for a specific\r\ncolumn, we use
`column1`, `column2` and so on as a fallback. To avoid\r\nduplicate
column names, we can add postfixes like `_2` as necessary.\r\n\r\nIf the
format appears to be CSV, but the `csv` processor returns fails,\r\nwe
bubble up an error using the recently
introduced\r\n`ErrorThatHandlesItsOwnResponse` class. We also provide
the first\r\nexample of passing the additional attributes of an error
(in this case,\r\nthe original CSV error) back to the client. The error
message is\r\ncomposed on the client side.\r\n\r\n**Removed: supported
formats message**\r\n \r\nThe message that asks the user to upload the
logs in `JSON/NDJSON\r\nformat` is removed in this PR:\r\n\r\n<img
width=\"741\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/34d571c3-b12c-44a1-98e3-d7549160be12\">\r\n\r\n\r\n**Refactoring**\r\n
\r\nThe refactoring makes the \"→JSON\" conversion process more uniform
across\r\ndifferent chains and centralizes processor definitions
in\r\n`.../server/util/processors.ts`.\r\n\r\nLog format chain now
expects the LLM to follow the `SamplesFormat` when\r\nproviding the
information rather than an ad-hoc format.\r\n \r\nWhen testing, the
`fail` method is [not supported
in\r\n`jest`](https://stackoverflow.com/a/54244479/23968144), so it
is\r\nremoved.\r\n\r\nSee the PR for examples and
follow-up.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"6a72037007d8f71504f444911c9fa25adfb1bb89","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["v9.0.0","release_note:feature","backport:prev-minor","Team:Security-Scalability","Feature:AutomaticImport"],"title":"[Auto
Import] CSV format
support","number":194386,"url":"https://github.com/elastic/kibana/pull/194386","mergeCommit":{"message":"[Auto
Import] CSV format support (#194386)\n\n## Release
Notes\r\n\r\nAutomatic Import can now create integrations for logs in
the CSV format.\r\nOwing to the maturity of log format support, we thus
remove the verbiage\r\nabout requiring the JSON/NDJSON format.\r\n\r\n##
Summary\r\n\r\n**Added: the CSV feature**\r\n\r\nThe issue is
#194342 \r\n\r\nWhen the user
adds a log sample whose format is recognized as CSV by the\r\nLLM, we
now parse the samples and insert
the\r\n[csv](https://www.elastic.co/guide/en/elasticsearch/reference/current/csv-processor.html)\r\nprocessor
into the generated pipeline.\r\n\r\nIf the header is present, we use it
for the field names and add
a\r\n[drop](https://www.elastic.co/guide/en/elasticsearch/reference/current/drop-processor.html)\r\nprocessor
that removes a header from the document stream by comparing\r\nthe
values to the header values.\r\n\r\nIf the header is missing, we ask the
LLM to generate a list of column\r\nnames, providing some context like
package and data stream title.\r\n\r\nShould the header or LLM
suggestion provide unsuitable for a specific\r\ncolumn, we use
`column1`, `column2` and so on as a fallback. To avoid\r\nduplicate
column names, we can add postfixes like `_2` as necessary.\r\n\r\nIf the
format appears to be CSV, but the `csv` processor returns fails,\r\nwe
bubble up an error using the recently
introduced\r\n`ErrorThatHandlesItsOwnResponse` class. We also provide
the first\r\nexample of passing the additional attributes of an error
(in this case,\r\nthe original CSV error) back to the client. The error
message is\r\ncomposed on the client side.\r\n\r\n**Removed: supported
formats message**\r\n \r\nThe message that asks the user to upload the
logs in `JSON/NDJSON\r\nformat` is removed in this PR:\r\n\r\n<img
width=\"741\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/34d571c3-b12c-44a1-98e3-d7549160be12\">\r\n\r\n\r\n**Refactoring**\r\n
\r\nThe refactoring makes the \"→JSON\" conversion process more uniform
across\r\ndifferent chains and centralizes processor definitions
in\r\n`.../server/util/processors.ts`.\r\n\r\nLog format chain now
expects the LLM to follow the `SamplesFormat` when\r\nproviding the
information rather than an ad-hoc format.\r\n \r\nWhen testing, the
`fail` method is [not supported
in\r\n`jest`](https://stackoverflow.com/a/54244479/23968144), so it
is\r\nremoved.\r\n\r\nSee the PR for examples and
follow-up.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"6a72037007d8f71504f444911c9fa25adfb1bb89"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/194386","number":194386,"mergeCommit":{"message":"[Auto
Import] CSV format support (#194386)\n\n## Release
Notes\r\n\r\nAutomatic Import can now create integrations for logs in
the CSV format.\r\nOwing to the maturity of log format support, we thus
remove the verbiage\r\nabout requiring the JSON/NDJSON format.\r\n\r\n##
Summary\r\n\r\n**Added: the CSV feature**\r\n\r\nThe issue is
#194342 \r\n\r\nWhen the user
adds a log sample whose format is recognized as CSV by the\r\nLLM, we
now parse the samples and insert
the\r\n[csv](https://www.elastic.co/guide/en/elasticsearch/reference/current/csv-processor.html)\r\nprocessor
into the generated pipeline.\r\n\r\nIf the header is present, we use it
for the field names and add
a\r\n[drop](https://www.elastic.co/guide/en/elasticsearch/reference/current/drop-processor.html)\r\nprocessor
that removes a header from the document stream by comparing\r\nthe
values to the header values.\r\n\r\nIf the header is missing, we ask the
LLM to generate a list of column\r\nnames, providing some context like
package and data stream title.\r\n\r\nShould the header or LLM
suggestion provide unsuitable for a specific\r\ncolumn, we use
`column1`, `column2` and so on as a fallback. To avoid\r\nduplicate
column names, we can add postfixes like `_2` as necessary.\r\n\r\nIf the
format appears to be CSV, but the `csv` processor returns fails,\r\nwe
bubble up an error using the recently
introduced\r\n`ErrorThatHandlesItsOwnResponse` class. We also provide
the first\r\nexample of passing the additional attributes of an error
(in this case,\r\nthe original CSV error) back to the client. The error
message is\r\ncomposed on the client side.\r\n\r\n**Removed: supported
formats message**\r\n \r\nThe message that asks the user to upload the
logs in `JSON/NDJSON\r\nformat` is removed in this PR:\r\n\r\n<img
width=\"741\"
alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/34d571c3-b12c-44a1-98e3-d7549160be12\">\r\n\r\n\r\n**Refactoring**\r\n
\r\nThe refactoring makes the \"→JSON\" conversion process more uniform
across\r\ndifferent chains and centralizes processor definitions
in\r\n`.../server/util/processors.ts`.\r\n\r\nLog format chain now
expects the LLM to follow the `SamplesFormat` when\r\nproviding the
information rather than an ad-hoc format.\r\n \r\nWhen testing, the
`fail` method is [not supported
in\r\n`jest`](https://stackoverflow.com/a/54244479/23968144), so it
is\r\nremoved.\r\n\r\nSee the PR for examples and
follow-up.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"6a72037007d8f71504f444911c9fa25adfb1bb89"}}]}]
BACKPORT-->
Co-authored-by: Ilya Nikokoshev <ilya.nikokoshev@elastic.co>
1 parent 7a80e6f commit 6378ff3
47 files changed
Lines changed: 853 additions & 132 deletions
File tree
- x-pack/plugins
- integration_assistant
- __jest__/fixtures
- common
- api
- analyze_logs
- model
- public/components/create_integration/create_integration_assistant/steps/data_stream_step
- server
- graphs
- csv
- kv
- log_type_detection
- unstructured
- lib/errors
- routes
- util
- translations/translations
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
| 18 | + | |
17 | 19 | | |
18 | 20 | | |
19 | 21 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
| 23 | + | |
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
| |||
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
| 34 | + | |
| 35 | + | |
32 | 36 | | |
33 | 37 | | |
34 | 38 | | |
| |||
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
30 | 36 | | |
31 | 37 | | |
32 | 38 | | |
| |||
Lines changed: 41 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
| 100 | + | |
99 | 101 | | |
100 | 102 | | |
101 | 103 | | |
Lines changed: 20 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
34 | 46 | | |
35 | 47 | | |
36 | 48 | | |
| |||
86 | 98 | | |
87 | 99 | | |
88 | 100 | | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
89 | 109 | | |
90 | 110 | | |
91 | 111 | | |
| |||
Lines changed: 18 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
19 | 29 | | |
20 | 30 | | |
21 | 31 | | |
| |||
66 | 76 | | |
67 | 77 | | |
68 | 78 | | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
69 | 87 | | |
70 | 88 | | |
71 | 89 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
37 | 38 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | 30 | | |
32 | 31 | | |
33 | | - | |
| 32 | + | |
34 | 33 | | |
35 | 34 | | |
36 | 35 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| 108 | + | |
| 109 | + | |
108 | 110 | | |
109 | 111 | | |
110 | 112 | | |
| |||
0 commit comments