Skip to content

ESQL: Add error policy and configurable options for CSV format reader#143779

Merged
costin merged 3 commits intoelastic:mainfrom
costin:esql/ds/format-error-policy
Mar 9, 2026
Merged

ESQL: Add error policy and configurable options for CSV format reader#143779
costin merged 3 commits intoelastic:mainfrom
costin:esql/ds/format-error-policy

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Mar 6, 2026

Adds resilient error handling and configurable parsing options to the ESQL CSV datasource format reader.

Error policy — three modes control how malformed rows are handled during CSV ingestion:

  • FAIL_FAST — abort on the first error (default, equivalent to Spark FAILFAST)
  • SKIP_ROW — drop the malformed row and continue (equivalent to Spark DROPMALFORMED, DuckDB ignore_errors)
  • NULL_FIELD — null-fill unparseable fields while keeping the row (equivalent to Spark PERMISSIVE)

An error budget (max_errors, max_error_ratio) caps how many errors are tolerated before aborting, giving operators fine-grained control over data quality vs. throughput.

Configurable format options: delimiter, quote/escape characters, comment prefix, null representation, encoding, datetime format, and max field size can all be set per-query via WITH parameters

Both features are wired through the existing FormatReader SPI via a new withConfig(Map) method, keeping the interface backward-compatible.

Developed using AI-assisted tooling

Introduce ErrorPolicy with three modes (FAIL_FAST, SKIP_ROW, NULL_FIELD)
and an error budget (maxErrors, maxErrorRatio) for resilient CSV parsing.
Add CsvFormatOptions for configurable delimiter, quote/escape characters,
comment prefix, null representation, encoding, datetime format, and max
field size. Extend FormatReader SPI with withConfig() for per-query
configuration.
@costin costin added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.4.0 labels Mar 6, 2026
@costin costin requested a review from bpintea March 6, 2026 22:12
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @costin, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

costin added 2 commits March 7, 2026 00:12
Replace exception-based error flow with return-code approach using
a lastFieldError field, eliminating exception allocation and stack
trace filling on parse failures. Pre-compute projected column arrays
(int[], DataType[], Attribute[]) at schema init to avoid autoboxing
and list lookups per field. Hoist invariant checks (comment filter,
null value, log flag) into constructor-time booleans. Reuse Object[]
row buffer across rows. Replace division with multiplication in
error budget ratio check.
@costin costin enabled auto-merge (squash) March 7, 2026 06:57
Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, left just one note.

private void onFieldError(String message, String value, Attribute attr) {
errorCount++;
if (logErrors) {
logger.warn(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, but I guess we might want to evolve how we inform the user about the error. I believe that unless the policy is FAIL_FAST, a user would need to check the logs to see what happened.
We might want to add warnings?

@costin costin merged commit 4e671c4 into elastic:main Mar 9, 2026
36 checks passed
@costin costin deleted the esql/ds/format-error-policy branch March 9, 2026 11:48
@tylerperk tylerperk added the ES|QL|DS ES|QL datasources label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement ES|QL|DS ES|QL datasources Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants