ESQL: CSV schema inference and parsing enhancements#144050
ESQL: CSV schema inference and parsing enhancements#144050costin merged 2 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @costin, I've created a changelog YAML for you. |
fcfdeab to
ad4f9a1
Compare
|
|
||
| private static boolean hasTypeAnnotations(String[] columns) { | ||
| for (String column : columns) { | ||
| if (column.trim().contains(":")) { |
There was a problem hiding this comment.
How about considering the escaping? I imagine column names could contain it otherwise, it's not reserved by the RFC.
| } | ||
| } | ||
|
|
||
| private List<Attribute> inferSchemaFromBatchReader(String headerLine) throws IOException { |
There was a problem hiding this comment.
Might be worth sharing some code with inferSchemaFromSample.
487ea3d to
7fdcb73
Compare
Plain CSV headers without type annotations now trigger automatic schema inference instead of failing, unblocking all standard CSV files. Also fixes boolean case-sensitivity, datetime format flexibility, and numeric type alias recognition.
7fdcb73 to
4ffcdd4
Compare
|
Addressed both review comments:
|
…elocations * upstream/main: (49 commits) CCS logging fixes (elastic#144070) Improve CPS cluster exclusion handling (elastic#143488) Remove snapshot condition now that node_reduce phase is in non-snapshot builds (elastic#144090) Drop deprecation warnings when updating a mapping in the cluster state applier (elastic#143884) (elastic#144040) Add ensureGreenAndNoInitializingShards helper (elastic#144044) Removed unnecessary applies_to blocks from deprecated query (elastic#144096) [CPS] Use single CrossProjectModeDecider instance (elastic#144030) Fix ESQL TS requests with LIMIT 0 (elastic#144031) ESQL: Remove `create` methods in aggs (elastic#144098) ES|QL: Refactor ChangeLimitOperator (elastic#144017) Add Paginated Hit Source Tests (elastic#142592) Fix test failure not preferred (elastic#144019) Remove serialization logic from EIS authorization response (elastic#144021) ESQL: CSV schema inference and parsing enhancements (elastic#144050) ESQL: Fix incorrectly optimized fork with nullify unmapped_fields (elastic#143030) Fix MMR release test using subqueries (elastic#144087) Refactoring `UserAgentPlugin` (elastic#140712) Drop non-finite samples in Prometheus remote write (elastic#144055) [TEST] Wait for internal inference indices to be created in authorization IT (elastic#143885) Disable ndjson datasource QA tests in release-tests (elastic#143992) ...
Standard CSV files use plain column names without type annotations. Before this change, such headers caused a parsing failure, blocking all normal CSV files from being read. This adds sample-based schema inference for plain headers and fixes boolean case-sensitivity, datetime format flexibility, and numeric type alias recognition. Developed with AI-assisted tooling
Standard CSV files use plain column names without type annotations. Before this change, such headers caused a parsing failure, blocking all normal CSV files from being read.
This adds sample-based schema inference for plain headers and fixes boolean case-sensitivity, datetime format flexibility, and numeric type alias recognition.
Developed with AI-assisted tooling