ESQL: CSV schema inference and parsing enhancements by costin · Pull Request #144050 · elastic/elasticsearch

costin · 2026-03-11T17:46:22Z

Standard CSV files use plain column names without type annotations. Before this change, such headers caused a parsing failure, blocking all normal CSV files from being read.

This adds sample-based schema inference for plain headers and fixes boolean case-sensitivity, datetime format flexibility, and numeric type alias recognition.

Developed with AI-assisted tooling

elasticsearchmachine · 2026-03-11T17:47:35Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2026-03-11T17:47:36Z

Hi @costin, I've created a changelog YAML for you.

bpintea

🤖-assisted reviewed.

bpintea · 2026-03-11T18:02:03Z

...atasource-csv/src/main/java/org/elasticsearch/xpack/esql/datasource/csv/CsvFormatReader.java

+
+    private static boolean hasTypeAnnotations(String[] columns) {
+        for (String column : columns) {
+            if (column.trim().contains(":")) {


How about considering the escaping? I imagine column names could contain it otherwise, it's not reserved by the RFC.

bpintea · 2026-03-11T18:03:58Z

...atasource-csv/src/main/java/org/elasticsearch/xpack/esql/datasource/csv/CsvFormatReader.java

            }
        }

+        private List<Attribute> inferSchemaFromBatchReader(String headerLine) throws IOException {


Might be worth sharing some code with inferSchemaFromSample.

Plain CSV headers without type annotations now trigger automatic schema inference instead of failing, unblocking all standard CSV files. Also fixes boolean case-sensitivity, datetime format flexibility, and numeric type alias recognition.

costin · 2026-03-12T07:05:00Z

Addressed both review comments:

Escaping in hasTypeAnnotations — quoted column names like "host:port" are now skipped when checking for type annotations. If a column is wrapped in the configured quote char, the : inside is treated as part of the name, not a type separator. Added testQuotedColumnNameWithColon to cover this.
Code sharing — extracted newCsvIterator(Reader) and collectSampleRows(Iterator, String) as shared helpers. Both inferSchemaFromSample and inferSchemaFromBatchReader now delegate to these instead of duplicating the CSV schema setup and row-sampling loop.

…elocations * upstream/main: (49 commits) CCS logging fixes (elastic#144070) Improve CPS cluster exclusion handling (elastic#143488) Remove snapshot condition now that node_reduce phase is in non-snapshot builds (elastic#144090) Drop deprecation warnings when updating a mapping in the cluster state applier (elastic#143884) (elastic#144040) Add ensureGreenAndNoInitializingShards helper (elastic#144044) Removed unnecessary applies_to blocks from deprecated query (elastic#144096) [CPS] Use single CrossProjectModeDecider instance (elastic#144030) Fix ESQL TS requests with LIMIT 0 (elastic#144031) ESQL: Remove `create` methods in aggs (elastic#144098) ES|QL: Refactor ChangeLimitOperator (elastic#144017) Add Paginated Hit Source Tests (elastic#142592) Fix test failure not preferred (elastic#144019) Remove serialization logic from EIS authorization response (elastic#144021) ESQL: CSV schema inference and parsing enhancements (elastic#144050) ESQL: Fix incorrectly optimized fork with nullify unmapped_fields (elastic#143030) Fix MMR release test using subqueries (elastic#144087) Refactoring `UserAgentPlugin` (elastic#140712) Drop non-finite samples in Prometheus remote write (elastic#144055) [TEST] Wait for internal inference indices to be created in authorization IT (elastic#143885) Disable ndjson datasource QA tests in release-tests (elastic#143992) ...

Standard CSV files use plain column names without type annotations. Before this change, such headers caused a parsing failure, blocking all normal CSV files from being read. This adds sample-based schema inference for plain headers and fixes boolean case-sensitivity, datetime format flexibility, and numeric type alias recognition. Developed with AI-assisted tooling

costin added >enhancement ES|QL|DS ES|QL datasources labels Mar 11, 2026

costin requested a review from bpintea March 11, 2026 17:46

elasticsearchmachine added v9.4.0 needs:triage Requires assignment of a team area label labels Mar 11, 2026

costin added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Mar 11, 2026

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 11, 2026

costin force-pushed the csv-schema-inference-enhancements branch from fcfdeab to ad4f9a1 Compare March 11, 2026 17:51

costin enabled auto-merge (squash) March 11, 2026 17:53

bpintea approved these changes Mar 11, 2026

View reviewed changes

costin force-pushed the csv-schema-inference-enhancements branch 2 times, most recently from 487ea3d to 7fdcb73 Compare March 11, 2026 22:30

costin force-pushed the csv-schema-inference-enhancements branch from 7fdcb73 to 4ffcdd4 Compare March 12, 2026 07:04

Merge branch 'main' into csv-schema-inference-enhancements

4c087b0

costin disabled auto-merge March 12, 2026 12:36

costin merged commit d84d141 into elastic:main Mar 12, 2026
35 of 36 checks passed

costin deleted the csv-schema-inference-enhancements branch March 12, 2026 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: CSV schema inference and parsing enhancements#144050

ESQL: CSV schema inference and parsing enhancements#144050
costin merged 2 commits intoelastic:mainfrom
costin:csv-schema-inference-enhancements

costin commented Mar 11, 2026 •

edited

Loading

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

bpintea left a comment

Uh oh!

bpintea Mar 11, 2026

Uh oh!

bpintea Mar 11, 2026

Uh oh!

costin commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

costin commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

bpintea left a comment

Choose a reason for hiding this comment

Uh oh!

bpintea Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

bpintea Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

costin commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

costin commented Mar 11, 2026 •

edited

Loading