Conversation
…he correct format for read_csv_auto
| filename = os.path.join(os.path.dirname(os.path.realpath(__file__)),'..','data',name) | ||
| return filename | ||
|
|
||
| class TestReadCSV(object): |
There was a problem hiding this comment.
Can we also add a test with the parallel_csv_reader?
There was a problem hiding this comment.
Perhaps we can add a boolean flag parallel to the reader to trigger whether or not to enable the parallel read?
There was a problem hiding this comment.
I am looking into this, but the parallel csv reader is still locked behind context.options.experimental_parallel_csv_reader
I was thinking I could temporarily switch that to true and reset it after binding, but I don't think I can reliably do that
So maybe we should just upgrade it to a csv reader option?
There was a problem hiding this comment.
So maybe we should just upgrade it to a csv reader option?
I think that is a good idea
There was a problem hiding this comment.
Removing the DBConfig setting as well, or not yet?
Probably not yet? Just letting the csv reader option override the behavior of the DBConfig setting if supplied?
|
I have a segfault that I really can't wrap my head around Test: Result: The only thing I'm doing with if (!py::none().is(encoding)) {
if (!py::isinstance<py::str>(encoding)) {
throw InvalidInputException("read_csv only accepts 'encoding' as a string");
}
string encoding = StringUtil::Lower(py::str(encoding));
if (encoding != "utf8" && encoding != "utf-8") {
throw BinderException("Copy is only supported for UTF-8 encoded files, ENCODING 'UTF-8'");
}
}Ah maybe because I re-use the variable, could be that Linux doesn't like that |
…llel_csv' config setting
… define aliases for python module methods
|
I am kind of confused by this giant regression It seems consistent, but looking at |
Mytherin
left a comment
There was a problem hiding this comment.
Thanks for the PR! LGTM
|
Thanks, I also have a branch thats nearly ready for a PR for the And the cleanup we talked about to the read_csv_relation internals |
This method is made to be a copy of the
read_csvmethod of Pandas, as far as our current options allow it to be.The supported options are:
header- can be given as0, to be compatible with pandas, or as a booleandtype- can be given as a dict{str : str} or as a list(str)sep|delimiter- given as stringna_values- define the null stringskiprows- skip the first n rowscompression- given as stringquotechar- given as stringescapechargiven as stringencodinggiven as string (only utf-8 is supported)Future improvements:
Add
names, then we can also supportprefixby combining it into thenamesAdd
usecols, this can be done by pushing a projection on top of the read_csv_relation