Skip to content

ESQL: CSV schema inference and parsing enhancements#144050

Merged
costin merged 2 commits intoelastic:mainfrom
costin:csv-schema-inference-enhancements
Mar 12, 2026
Merged

ESQL: CSV schema inference and parsing enhancements#144050
costin merged 2 commits intoelastic:mainfrom
costin:csv-schema-inference-enhancements

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Mar 11, 2026

Standard CSV files use plain column names without type annotations. Before this change, such headers caused a parsing failure, blocking all normal CSV files from being read.

This adds sample-based schema inference for plain headers and fixes boolean case-sensitivity, datetime format flexibility, and numeric type alias recognition.

Developed with AI-assisted tooling

@costin costin added >enhancement ES|QL|DS ES|QL datasources labels Mar 11, 2026
@costin costin requested a review from bpintea March 11, 2026 17:46
@elasticsearchmachine elasticsearchmachine added v9.4.0 needs:triage Requires assignment of a team area label labels Mar 11, 2026
@costin costin added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Mar 11, 2026
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 11, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @costin, I've created a changelog YAML for you.

@costin costin force-pushed the csv-schema-inference-enhancements branch from fcfdeab to ad4f9a1 Compare March 11, 2026 17:51
@costin costin enabled auto-merge (squash) March 11, 2026 17:53
Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖-assisted reviewed.


private static boolean hasTypeAnnotations(String[] columns) {
for (String column : columns) {
if (column.trim().contains(":")) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about considering the escaping? I imagine column names could contain it otherwise, it's not reserved by the RFC.

}
}

private List<Attribute> inferSchemaFromBatchReader(String headerLine) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth sharing some code with inferSchemaFromSample.

@costin costin force-pushed the csv-schema-inference-enhancements branch 2 times, most recently from 487ea3d to 7fdcb73 Compare March 11, 2026 22:30
Plain CSV headers without type annotations now trigger automatic
schema inference instead of failing, unblocking all standard CSV
files. Also fixes boolean case-sensitivity, datetime format
flexibility, and numeric type alias recognition.
@costin costin force-pushed the csv-schema-inference-enhancements branch from 7fdcb73 to 4ffcdd4 Compare March 12, 2026 07:04
@costin
Copy link
Copy Markdown
Member Author

costin commented Mar 12, 2026

Addressed both review comments:

  1. Escaping in hasTypeAnnotations — quoted column names like "host:port" are now skipped when checking for type annotations. If a column is wrapped in the configured quote char, the : inside is treated as part of the name, not a type separator. Added testQuotedColumnNameWithColon to cover this.

  2. Code sharing — extracted newCsvIterator(Reader) and collectSampleRows(Iterator, String) as shared helpers. Both inferSchemaFromSample and inferSchemaFromBatchReader now delegate to these instead of duplicating the CSV schema setup and row-sampling loop.

@costin costin disabled auto-merge March 12, 2026 12:36
@costin costin merged commit d84d141 into elastic:main Mar 12, 2026
35 of 36 checks passed
@costin costin deleted the csv-schema-inference-enhancements branch March 12, 2026 12:36
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 12, 2026
…elocations

* upstream/main: (49 commits)
  CCS logging fixes (elastic#144070)
  Improve CPS cluster exclusion handling (elastic#143488)
  Remove snapshot condition now that node_reduce phase is in non-snapshot builds (elastic#144090)
  Drop deprecation warnings when updating a mapping in the cluster state applier (elastic#143884) (elastic#144040)
  Add ensureGreenAndNoInitializingShards helper (elastic#144044)
  Removed unnecessary applies_to blocks from deprecated query (elastic#144096)
  [CPS] Use single CrossProjectModeDecider instance (elastic#144030)
  Fix ESQL TS requests with LIMIT 0 (elastic#144031)
  ESQL: Remove `create` methods in aggs (elastic#144098)
  ES|QL: Refactor ChangeLimitOperator (elastic#144017)
  Add Paginated Hit Source Tests (elastic#142592)
  Fix test failure not preferred (elastic#144019)
  Remove serialization logic from EIS authorization response (elastic#144021)
  ESQL: CSV schema inference and parsing enhancements (elastic#144050)
  ESQL: Fix incorrectly optimized fork with nullify unmapped_fields (elastic#143030)
  Fix MMR release test using subqueries (elastic#144087)
  Refactoring `UserAgentPlugin` (elastic#140712)
  Drop non-finite samples in Prometheus remote write (elastic#144055)
  [TEST] Wait for internal inference indices to be created in authorization IT (elastic#143885)
  Disable ndjson datasource QA tests in release-tests (elastic#143992)
  ...
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
Standard CSV files use plain column names without type annotations. Before this change, such headers caused a parsing failure, blocking all normal CSV files from being read.

This adds sample-based schema inference for plain headers and fixes boolean case-sensitivity, datetime format flexibility, and numeric type alias recognition.

Developed with AI-assisted tooling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement ES|QL|DS ES|QL datasources Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants