Auto-expand string width for Column insert#9559
Conversation
astropy/table/column.py
Outdated
| new_dtype = self.dtype.kind + str(values_str_len) | ||
|
|
||
| # Explicitly convert to dtype of this column. Needed because numpy 1.7 | ||
| # enforces safe casting by default, so . This isn't the case for 1.6 or 1.8+. |
There was a problem hiding this comment.
We don't support old numpies, so no need to think about how to workaround for <1.16
There was a problem hiding this comment.
This was just a copy/paste, but is removed now, good catch!
|
@taldcroft - a quick turnaround for 4.0 sounds good to me (one can see this as a bugfix) |
|
Yes, I like this idea, and since we already have the unicode sandwich and thus are deviation from numpy anyway, I think it is fine to try to make things a bit more userfriendly; I think it would be rare indeed that a user passes in a long string and only wants the first characters stored! |
|
I'm on board with this, but wondering whether we should consider emitting a warning in this case? |
astropy/table/column.py
Outdated
| # Explicitly convert to dtype of this column. Needed because numpy 1.7 | ||
| # enforces safe casting by default, so . This isn't the case for 1.6 or 1.8+. | ||
| values = np.asarray(values, dtype=new_dtype) | ||
| data = np.insert(self.astype(new_dtype), obj, values, axis=axis) |
There was a problem hiding this comment.
Add a copy=False inside the astype, otherwise you make an extra copy also if the dtype is unchanged.
There was a problem hiding this comment.
Different implementation so this does not apply now.
4b8e775 to
3d49fd1
Compare
I'm open to other opinions, but my inclination is no warning. Pandas does not warn in this case (of course it handles strings as objects anyway), and I think that that 99.9% of the time that users just want this to work silently. My original use case was an empty ECSV table with a string column that I then appended rows to. This should just work! |
| accord with the existing unit. | ||
|
|
||
| >>> t.add_row([-8, -9, 10 * u.cm / u.s, 11]) | ||
| >>> t.add_row([-8, 'string', 10 * u.cm / u.s, 10]) |
There was a problem hiding this comment.
I think this was a bad behavior previously, where it accepted an int -9 to a string column and silently converted. No longer, inserting the wrong type into a string column raises an exception now.
There was a problem hiding this comment.
That seems better indeed. Probably best to add a test for it...
| assert np.all(t['a'] == np.array([1, 2, 3, 4])) | ||
| assert np.allclose(t['b'], np.array([4.0, 5.1, 6.2, 7.2])) | ||
| assert np.all(t['c'] == np.array(['7', '8', '9', '1'])) | ||
| assert np.all(t['c'] == np.array(['7', '8', '9', '10'])) |
There was a problem hiding this comment.
The original was truncating.
3d49fd1 to
db43bd1
Compare
mhvk
left a comment
There was a problem hiding this comment.
@taldcroft - looks good module the use of np.shape instead of np.broadcast(..).shape, and possibly an extra test. I'll approve now since I'm travelling and may not be able to look at this again in the next few days.
astropy/table/column.py
Outdated
| mask = False | ||
| else: | ||
| mask = np.zeros(values.shape, dtype=bool) | ||
| mask = np.zeros(np.broadcast(values).shape, dtype=bool) |
| accord with the existing unit. | ||
|
|
||
| >>> t.add_row([-8, -9, 10 * u.cm / u.s, 11]) | ||
| >>> t.add_row([-8, 'string', 10 * u.cm / u.s, 10]) |
There was a problem hiding this comment.
That seems better indeed. Probably best to add a test for it...
|
p.s. I'm also in favour of not warning. |
Auto-expand string width for Column insert
This changes Column insert so that if you insert a string value that is longer than the available width then the column dtype is auto-expanded.
Example:
cc: @mhvk @astrofrog @bsipocz