ESQL: Add configurable bracket-based multi-value support for CSV reader#143890
Merged
costin merged 6 commits intoelastic:mainfrom Mar 10, 2026
Merged
ESQL: Add configurable bracket-based multi-value support for CSV reader#143890costin merged 6 commits intoelastic:mainfrom
costin merged 6 commits intoelastic:mainfrom
Conversation
Collaborator
|
Hi @costin, I've created a changelog YAML for you. |
Collaborator
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
bpintea
reviewed
Mar 9, 2026
Contributor
bpintea
left a comment
There was a problem hiding this comment.
Should we add some tests for the warnings too?
Also, string MVs would be good (["foo", "bar"].
| private final DateTimeFormatter datetimeFormatter; | ||
|
|
||
| private final boolean bracketMultiValues; | ||
| private static final int MAX_WARNINGS = 20; |
Contributor
There was a problem hiding this comment.
I think the warning headers "mechanism" has a max already.
Contributor
There was a problem hiding this comment.
It's probably good to have a max here too, but should we note in the warnings when we exceed the max that there are more?
| if (logErrors) { | ||
| logger.warn("Skipping malformed CSV row (error {}/{}): {}", errorCount, errorPolicy.maxErrors(), message); | ||
| if (warnings.size() < MAX_WARNINGS) { | ||
| warnings.add("Row error: " + message); |
Contributor
There was a problem hiding this comment.
Do we want to note which row?
Here and below.
Comment on lines
+120
to
+123
| * FROM s3://bucket/employees.csv WITH {"multi_value_syntax": "brackets"} | ||
| * }</pre> | ||
| * <pre>{@code | ||
| * FROM s3://bucket/data.csv WITH {"multi_value_syntax": "brackets", "error_mode": "skip_row"} |
Contributor
There was a problem hiding this comment.
s/FROM/EXTERNAL
(here and PR's description)
Enable bracket syntax [a,b,c] for multi-value fields in CSV files via a new multi_value_syntax option (default: none, opt-in: brackets). When enabled, values wrapped in brackets are split and parsed per the column type, producing multi-value ESQL blocks. Add error summary logging at close() for non-FAIL_FAST policies. - CsvFormatOptions: added multiValueSyntax field with BRACKETS enum - CsvFormatReader: bracket-aware tryConvertMultiValue in tryConvertValue - CsvFormatReaderTests: comprehensive multi-value bracket tests Developed using AI-assisted tooling
Fix FQN usage (ElementType, Locale, Map), use options.escapeChar() in splitBracketContent instead of hardcoded backslash, add proper escaped comma test, and document warnings list for future use. Developed using AI-assisted tooling
Restore the full class-level Javadoc removed during implementation, update it with multi_value_syntax option docs, and fix brace/formatting style to match the original code. Developed using AI-assisted tooling
Developed using AI-assisted tooling
- Fix FROM to EXTERNAL in Javadoc examples - Include row number in error/warning messages - Add overflow note when warnings exceed MAX_WARNINGS - Add string MV, long MV, warning, and overflow tests - Expose package-private getWarnings() for test access
6185f65 to
6626fe5
Compare
szybia
added a commit
to szybia/elasticsearch
that referenced
this pull request
Mar 10, 2026
…locations * upstream/main: (126 commits) Update KnnIndexTester to use more settings from datasets (elastic#143869) fix: dynamic template vector array is overridden by automatic dense_vector mapping (elastic#143733) ES|QL: Don't reuse the same alias for _fork column (elastic#143909) Close and initialize clients after each node upgrade in logsdb rolling upgrade tests. (elastic#143823) ESQL: Added GroupedTopNOperator for LIMIT BY, compute only (elastic#143476) Handle views in ResolveIndexAction (elastic#143561) Improve reindex rethrottle API in stateless (elastic#143771) Use a copy of the SearchExecutionContext for each Percolator execution (elastic#142765) Log the stacktrace when we encounter a deprecation warning for `default_metric` (elastic#143929) ESQL: evaluate ReferenceAttributes to potentially FieldAttributes for full-text functions restriction (elastic#143893) Add ClusterStateSerializationStats Serializatation Tests (elastic#142703) Adds Coordination Diagnostics Tests (elastic#142709) Upgrade Elasticsearch to Apache Lucene 10.4 (elastic#141882) ESQL: Add configurable bracket-based multi-value support for CSV reader (elastic#143890) time series es819 binary dv use up to a 1mb block size (elastic#143049) Dynamically enable / disable plugins in correspondence to stateless mode. (elastic#142147) ES|QL: Implement first/last_over_time for tdigest (elastic#143832) Document CHANGE_POINT limitation (elastic#143877) Fix OperationsOnSeqNoDisabledIndicesIT (elastic#143892) [Test] Test that sequence numbers are not pruned with retention lease (elastic#143825) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add bracket-based multi-value support (
[a,b,c]) for CSV fields, enabledby default via
multi_value_syntax(values:brackets,none).When enabled, values wrapped in brackets are split on commas (respecting
escaped commas via the configured escape character) and each element is
parsed per the declared column type, producing native ESQL multi-value
blocks. Empty brackets
[]map to null. Scalar and multi-value fieldscan be mixed freely within the same column.
Also adds error summary logging at
close()for non-FAIL_FAST policiesto address the review feedback from #143779 — users see an INFO-level
summary instead of having to check individual WARN log lines.
Follows the convention used by ClickHouse (
Array()notation) and theexisting ESQL test CSV framework (
CsvTestUtils).