ENH: Add use_nullable_dtypes for read_html#50286
Conversation
pandas/tests/io/test_html.py
Outdated
| res = self.read_html(out, attrs={"class": "dataframe"}, index_col=0)[0] | ||
| tm.assert_frame_equal(res, df) | ||
|
|
||
| @pytest.mark.parametrize("nullable_backend", ["pandas", "pyarrow"]) |
There was a problem hiding this comment.
| @pytest.mark.parametrize("nullable_backend", ["pandas", "pyarrow"]) | |
| @pytest.mark.parametrize("dtype_backend", ["pandas", "pyarrow"]) |
pandas/tests/io/test_html.py
Outdated
|
|
||
| out = df.to_html(index=False) | ||
| with pd.option_context("mode.string_storage", storage): | ||
| with pd.option_context("mode.nullable_backend", nullable_backend): |
There was a problem hiding this comment.
| with pd.option_context("mode.nullable_backend", nullable_backend): | |
| with pd.option_context("mode.dtype_backend", nullable_backend): |
| use_nullable_dtypes : bool = False | ||
| Whether to use nullable dtypes as default when reading data. If | ||
| set to True, nullable dtypes are used for all dtypes that have a nullable | ||
| implementation, even if no nulls are present. |
There was a problem hiding this comment.
Could you add the additional paragraph of mode.dtype_backend being available that other docstrings have? (Should start with The nullable dtype implementation)
|
Thanks @phofl |
|
Hi Folks, I'm not sure if this is the right venue for comments on patches after the fact, but just updated my codebase from pandas 1.5.3 to the current version (at time of this post it is 2.2), and noticed that at 2.0, there was a change to the nullable string values added to na_values: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#:~:text=Added%20%22None%22%20to%20default%20na_values%20in%20read_csv()%20(GH%2050286 Changing "None" to NaN ended up introducing a breaking change to my script, where it still ran without runtime errors, but processed the data differently causing errors in the output dataset. I had a csv file with "None" intentionally present in some columns in order to show the word on a dashboard. The issue didn't actually present until that null value showed up in an np.where() where the condition checked to see if it was "None". The observation then followed an undesired logic path. I addressed this by copying the default na_values list from pandas 1.5.3 and overriding the one in pandas 2.2 (as I'd noticed a number of new values showed up in the default list in addition to "None"). I'm not sure I can recommend a better way to introduce a change like this, or a way to better communicate this to users, and the change was mentioned pretty far down the release notes.. You probably don't want to put FutureWarnings in read_csv() for everyone who uses it as it'd get pretty annoying. At any rate, I wanted to make a note of this, as adding/removing values from the default na_values list might introduce a "soft" breaking change when moving to new pandas versions. Cheers, |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.