Conversation
Use 'utf-8' and 'replace' for everything except NC_STRING variable data. For NC_STRING variable data, look for _Encoding variable attribute, otherwise use 'utf-8'.
character variables.
force 'U' dtype in chartostring.
to a char variable with _Encoding set.
character array (type='S1') is given
|
@shoyer, I'm wondering how this change would impact xarray - especially the auto-conversion of char arrays to string arrays with the last dimension collapsed. This would only happen if the |
last dim of char variable.
|
@jswhit thanks for the heads up. Yes, I think this implementation as-is would break xarray, where we do our own char -> string array conversion. There are two ways to fix this:
I like this second option better. |
|
The second option would be nice, but quite difficult since How about adding a |
|
That would fit the existing API of the library, where any interpretation of attributes is configurable... |
|
I went ahead and added a |
|
I'm okay with methods for this. But going forward, this is probably a case for separate low level and high level interfaces, even if only the high level interface is exposed publicly. h5py uses this approach and it works quite well. |
|
OK, merging now. @shoyer, good idea about the low level interface. I'll create a separate ticket for that. |
Add check for
_Encodingattribute forNC_STRINGvariables, otherwise use 'utf-8'. 'utf-8' is used everywhere else, 'default_encoding' global module variable is no longer used.getncattrmethod now takes optional kwarg 'encoding' (default 'utf-8') so encoding of attributes can be specified if desired. If_Encodingis specified for anNC_CHAR('S1') variable,thechartostringutility function is used to convert the array of characters to an array of strings with one less dimension (the last dimension is interpreted as the length of each string) when reading the data. When writing the data,stringtocharis used to convert a numpy array of fixed length strings to an array of characters with one more dimension.chartostringandstringtocharnow also have an 'encoding' kwarg.The
_Encodingattribute convention is being discussed in Unidata/netcdf-c#402.