fix: Support using pandas nullable types#77
Conversation
|
@AndyBryson thanks for your PR, I will review your changes ASAP |
bednar
left a comment
There was a problem hiding this comment.
Hi @AndyBryson,
thanks for your PR 👍. Can you please rebase your sources with main branch and satisfy our checklist?
Best
904f39c to
163cee8
Compare
|
Hi @bednar. That's mostly done. There is no CHANGELOG.md, so I've not updated it. |
bednar
left a comment
There was a problem hiding this comment.
@AndyBryson I've add the CHANGELOG.md to main branch.
You can add something like:
### Bugfix
1. [#77](https://github.com/InfluxCommunity/influxdb3-python/pull/77): Support using pandas nullable types
bednar
left a comment
There was a problem hiding this comment.
Could you please rebase your sources instead of merging? This approach would make it easier for us to review your changes.
Hopefully that's all done now. |
bednar
left a comment
There was a problem hiding this comment.
Thanks again for your PR, there are a few changes that sound reasonable from our POV:
| try: | ||
| from ...extras import pd | ||
|
|
||
| def _not_nan(x): | ||
| return not pd.isna(x) | ||
|
|
||
| except ImportError: | ||
| pd = None | ||
|
|
||
| def _not_nan(x): | ||
| return x == x | ||
|
|
There was a problem hiding this comment.
This can be simplify to because the pandas should be present:
| try: | |
| from ...extras import pd | |
| def _not_nan(x): | |
| return not pd.isna(x) | |
| except ImportError: | |
| pd = None | |
| def _not_nan(x): | |
| return x == x | |
| def _not_nan(x): | |
| from ...extras import pd | |
| return not pd.isna(x) | |
| if issubclass(value.type, np.integer): | ||
| field_value = f"{sep}{key_format}={{{val_format}}}i" | ||
| elif issubclass(value.type, np.bool_): | ||
| field_value = f'{sep}{key_format}={{{val_format}}}' | ||
| elif issubclass(value.type, np.floating): | ||
| if null_columns.iloc[index]: | ||
| field_value = f"""{{"" if math.isnan({val_format}) else f"{sep}{key_format}={{{val_format}}}"}}""" | ||
| field_value = f"""{{"" if pd.isna({val_format}) else f"{sep}{key_format}={{{val_format}}}i"}}""" | ||
| else: | ||
| field_value = f"{sep}{key_format}={{{val_format}}}i" |
There was a problem hiding this comment.
This can be simplify to:
if issubclass(value.type, np.integer) or issubclass(value.type, np.floating) or issubclass(value.type, np.bool_): # noqa: E501
suffix = 'i' if issubclass(value.type, np.integer) else ''
if null_columns.iloc[index]:
field_value = f"""{{"" if pd.isna({val_format}) else f"{sep}{key_format}={{{val_format}}}{suffix}"}}""" # noqa: E501
else:
field_value = f"{sep}{key_format}={{{val_format}}}{suffix}"Allow nullable types that aren't floats. This means that we can upload data frames that have int/bool columns that have empty cells and still maintain the correct type.
bednar
left a comment
There was a problem hiding this comment.
Thanks again for this PR 👍
LGTM 🚀

Closes #76
Proposed Changes
Use pandas to decide what is
NaNto allow us to have nullable bools, ints and strings.Profiling
With a real world dataset, 10_000 lines, 156 columns
exiting method, no pandas null
1.86 s ± 26.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
new method, no pandas null
1.7 s ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
new method, with pandas null
1.51 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Checklist