Skip to content

ESQL: Add configurable bracket-based multi-value support for CSV reader#143890

Merged
costin merged 6 commits intoelastic:mainfrom
costin:esql/csv-multivalue-support
Mar 10, 2026
Merged

ESQL: Add configurable bracket-based multi-value support for CSV reader#143890
costin merged 6 commits intoelastic:mainfrom
costin:esql/csv-multivalue-support

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Mar 9, 2026

Add bracket-based multi-value support ([a,b,c]) for CSV fields, enabled
by default via multi_value_syntax (values: brackets, none).

When enabled, values wrapped in brackets are split on commas (respecting
escaped commas via the configured escape character) and each element is
parsed per the declared column type, producing native ESQL multi-value
blocks. Empty brackets [] map to null. Scalar and multi-value fields
can be mixed freely within the same column.

Also adds error summary logging at close() for non-FAIL_FAST policies
to address the review feedback from #143779 — users see an INFO-level
summary instead of having to check individual WARN log lines.

EXTERNAL "s3://bucket/employees.csv"
EXTERNAL "s3://bucket/data.csv" WITH {"multi_value_syntax": "none"}
EXTERNAL "s3://bucket/data.csv" WITH {"multi_value_syntax": "brackets", "error_mode": "skip_row"}

Follows the convention used by ClickHouse (Array() notation) and the
existing ESQL test CSV framework (CsvTestUtils).

@costin costin requested a review from bpintea March 9, 2026 17:30
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 9, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @costin, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@costin costin enabled auto-merge (squash) March 9, 2026 17:32
Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some tests for the warnings too?
Also, string MVs would be good (["foo", "bar"].

private final DateTimeFormatter datetimeFormatter;

private final boolean bracketMultiValues;
private static final int MAX_WARNINGS = 20;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the warning headers "mechanism" has a max already.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably good to have a max here too, but should we note in the warnings when we exceed the max that there are more?

if (logErrors) {
logger.warn("Skipping malformed CSV row (error {}/{}): {}", errorCount, errorPolicy.maxErrors(), message);
if (warnings.size() < MAX_WARNINGS) {
warnings.add("Row error: " + message);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to note which row?
Here and below.

Comment on lines +120 to +123
* FROM s3://bucket/employees.csv WITH {"multi_value_syntax": "brackets"}
* }</pre>
* <pre>{@code
* FROM s3://bucket/data.csv WITH {"multi_value_syntax": "brackets", "error_mode": "skip_row"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/FROM/EXTERNAL
(here and PR's description)

costin added 6 commits March 10, 2026 10:32
Enable bracket syntax [a,b,c] for multi-value fields in CSV files via
a new multi_value_syntax option (default: none, opt-in: brackets).
When enabled, values wrapped in brackets are split and parsed per the
column type, producing multi-value ESQL blocks. Add error summary
logging at close() for non-FAIL_FAST policies.

- CsvFormatOptions: added multiValueSyntax field with BRACKETS enum
- CsvFormatReader: bracket-aware tryConvertMultiValue in tryConvertValue
- CsvFormatReaderTests: comprehensive multi-value bracket tests

Developed using AI-assisted tooling
Fix FQN usage (ElementType, Locale, Map), use options.escapeChar()
in splitBracketContent instead of hardcoded backslash, add proper
escaped comma test, and document warnings list for future use.

Developed using AI-assisted tooling
Restore the full class-level Javadoc removed during implementation,
update it with multi_value_syntax option docs, and fix brace/formatting
style to match the original code.

Developed using AI-assisted tooling
Developed using AI-assisted tooling
- Fix FROM to EXTERNAL in Javadoc examples
- Include row number in error/warning messages
- Add overflow note when warnings exceed MAX_WARNINGS
- Add string MV, long MV, warning, and overflow tests
- Expose package-private getWarnings() for test access
@costin costin force-pushed the esql/csv-multivalue-support branch from 6185f65 to 6626fe5 Compare March 10, 2026 08:35
@costin costin requested a review from bpintea March 10, 2026 08:35
Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@costin costin merged commit 50dfc61 into elastic:main Mar 10, 2026
36 checks passed
@costin costin deleted the esql/csv-multivalue-support branch March 10, 2026 10:58
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 10, 2026
…locations

* upstream/main: (126 commits)
  Update KnnIndexTester to use more settings from datasets (elastic#143869)
  fix: dynamic template vector array is overridden by automatic dense_vector mapping (elastic#143733)
  ES|QL: Don't reuse the same alias for _fork column (elastic#143909)
  Close and initialize clients after each node upgrade in logsdb rolling upgrade tests. (elastic#143823)
  ESQL: Added GroupedTopNOperator for LIMIT BY, compute only (elastic#143476)
  Handle views in ResolveIndexAction (elastic#143561)
  Improve reindex rethrottle API in stateless (elastic#143771)
  Use a copy of the SearchExecutionContext for each Percolator execution (elastic#142765)
  Log the stacktrace when we encounter a deprecation warning for `default_metric` (elastic#143929)
  ESQL: evaluate ReferenceAttributes to potentially FieldAttributes for full-text functions restriction (elastic#143893)
  Add ClusterStateSerializationStats Serializatation Tests (elastic#142703)
  Adds Coordination Diagnostics Tests (elastic#142709)
  Upgrade Elasticsearch to Apache Lucene 10.4 (elastic#141882)
  ESQL: Add configurable bracket-based multi-value support for CSV reader (elastic#143890)
  time series es819 binary dv use up to a 1mb block size (elastic#143049)
  Dynamically enable / disable plugins in correspondence to stateless mode. (elastic#142147)
  ES|QL: Implement first/last_over_time for tdigest (elastic#143832)
  Document CHANGE_POINT limitation (elastic#143877)
  Fix OperationsOnSeqNoDisabledIndicesIT (elastic#143892)
  [Test] Test that sequence numbers are not pruned with retention lease (elastic#143825)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement ES|QL|DS ES|QL datasources Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants