-
-
Notifications
You must be signed in to change notification settings - Fork 48
Closed
Milestone
Description
Right now, when loading data from formats like CSV or JSON everything is converted into StringEntry,
it's because those formats does not have any kind of schema.
It would be very beneficial to create a mechanism that when used would try to guess best possible type based on data value and cast entry into this.
For example:
"true"-> boolean"1"-> int"2023-03-04"-> DateTime
I see it as a custom transformer, something that can be used like this:
df()
->read(from_csv('...'))
->autoCast()
->write(to_parquet('...'))
->run();
xaviermarchegay