[ML] Process delimited files like semi-structured text#56038
Merged
droberts195 merged 3 commits intoelastic:masterfrom Jan 28, 2020
droberts195:use_csv_processor_in_file_structure_finder_ingest
Merged
[ML] Process delimited files like semi-structured text#56038droberts195 merged 3 commits intoelastic:masterfrom droberts195:use_csv_processor_in_file_structure_finder_ingest
droberts195 merged 3 commits intoelastic:masterfrom
droberts195:use_csv_processor_in_file_structure_finder_ingest
Conversation
Changes the file upload functionality to process delimited files by splitting them into to messages, then sending these to the ingest pipeline as a single field for further processing in Elasticsearch. The csv_importer has been removed and the old sst_importer replaced with a similar message_importer that has been enhanced to cover the edge cases required by delimited file processing. Previously the file upload functionality parsed CSV in the browser, but by parsing CSV in the ingest pipeline it makes the Kibana file upload functionality more easily interchangable with Filebeat such that the configurations it creates can more easily be used to import data with the same structure repeatedly in production. Companion to elastic/elasticsearch#51492
Contributor
|
Pinging @elastic/ml-ui (:ml) |
darnautov
reviewed
Jan 28, 2020
| // multiline_start_pattern regex | ||
| // if it does, it is a legitimate end of line and can be pushed into the list, | ||
| // if not, it must be a newline char inside a field value, so keep looking. | ||
| async read(text) { |
Contributor
There was a problem hiding this comment.
it doesn't seem like the method is async
Member
There was a problem hiding this comment.
This is a left over from the very original csv parsing library that was used which needed to be async.
This can go, but you'll have also change the read method in ndjson_importer.js as well as removing the await from line 210 of import_view.js
darnautov
reviewed
Jan 28, 2020
Comment on lines
+62
to
+68
| if (this.multilineStartRegex === null || line.match(this.multilineStartRegex) !== null) { | ||
| message = message.replace(/\r$/, ''); | ||
| data.push({ message }); | ||
| message = ''; | ||
| } else { | ||
| message += '\n'; | ||
| } |
Contributor
There was a problem hiding this comment.
looks like the same as lines 39-50, might deserve a small dedicated method
Author
|
@elasticmachine merge upstream |
Contributor
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
droberts195
added a commit
that referenced
this pull request
Jan 28, 2020
Changes the file upload functionality to process delimited files by splitting them into to messages, then sending these to the ingest pipeline as a single field for further processing in Elasticsearch. The csv_importer has been removed and the old sst_importer replaced with a similar message_importer that has been enhanced to cover the edge cases required by delimited file processing. Previously the file upload functionality parsed CSV in the browser, but by parsing CSV in the ingest pipeline it makes the Kibana file upload functionality more easily interchangable with Filebeat such that the configurations it creates can more easily be used to import data with the same structure repeatedly in production. Companion to elastic/elasticsearch#51492
gmmorris
added a commit
to gmmorris/kibana
that referenced
this pull request
Jan 28, 2020
* master: (21 commits) [SIEM][Detection Engine] critical blocker updates to latest ECS version [Monitoring] Fix inaccuracies in logstash pipeline listing metrics (elastic#55868) Resetting errors and removing duplicates (elastic#56054) Add flag to opt out from sub url tracking (elastic#55672) [SIEM][Detection Engine] critical bug, fixes duplicate tags [ML] Anomaly Detection: Fix persist/restore of refreshInterval in globalState. (elastic#56113) [ML] Single Metric Viewer: Fix annnotations refresh. (elastic#56107) adapt ObjectToConfigAdapter.getFlattenedPaths to consider arrays as final values (elastic#56105) Add Appender.receiveAllLevels option to fix LegacyAppender (elastic#55752) [ML] Process delimited files like semi-structured text (elastic#56038) Charts plugin (combining ui/color_maps and EuiUtils) (elastic#55469) fix tutorial documentation (elastic#55996) [ML] Fix persist/restore of time/refreshInterval in data visualizer. (elastic#56122) [Index Management] Fix errors with validation (elastic#56072) [Index Management] Add try/catch when parsing index filter from URI (elastic#56051) [NP] add HTTP resources testing strategies (elastic#54908) [ML] Single Metric Viewer: Fix brush update on short recent timespans. (elastic#56125) [Uptime] Add timeout for slow process to skipped functional tests (elastic#56065) refactor (elastic#56121) Move tests in dashboard into appropriate folders (elastic#55304) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes the file upload functionality to process delimited
files by splitting them into to messages, then sending
these to the ingest pipeline as a single field for further
processing in Elasticsearch.
The csv_importer has been removed and the old sst_importer
replaced with a similar message_importer that has been
enhanced to cover the edge cases required by delimited
file processing.
Previously the file upload functionality parsed CSV in the
browser, but by parsing CSV in the ingest pipeline it
makes the Kibana file upload functionality more easily
interchangable with Filebeat such that the configurations
it creates can more easily be used to import data with the
same structure repeatedly in production.
Companion to elastic/elasticsearch#51492
Checklist
Use
strikethroughsto remove checklist items you don't feel are applicable to this PR.- [ ] This was checked for cross-browser compatibility, including a check against IE11- [ ] Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support- [ ] Documentation was added for features that require explanation or tutorials- [ ] Unit or functional tests were updated or added to match the most common scenarios- [ ] This was checked for keyboard-only and screenreader accessibilityFor maintainers