FIX Fixes set_output with list input#27044
Conversation
Micky774
left a comment
There was a problem hiding this comment.
LGTM, thanks for the fix!
OmarManzoor
left a comment
There was a problem hiding this comment.
LGTM. Thanks @thomasjpfan
| index = original_input.index if _is_pandas_df(original_input) else None | ||
| return _wrap_in_pandas_container( | ||
| data_to_wrap=data_to_wrap, | ||
| index=getattr(original_input, "index", None), | ||
| index=index, |
There was a problem hiding this comment.
Hi @thomasjpfan,
I have had a regression since 1.3.1 and located that in those lines of code.
My understanding is the following but I could be mistaken so please correct me: those lines have changed the set_output behaviour such that
- before, if original_input had an index attribute (as is unfortunately the case for lists, but also for pd.Series), the index gets wrapped, and bad things ensued for lists,
- now, if original_input is not a pd.DataFrame (including intented targets lists but unfortunately also pd.Series), the index does not get wrapped.
So while the fix is valuable for lists, it is not for the case where input are pd.Series.
Would you agree? and if so is a good way to solve this changing from is_pandas_df to a such function is_pandas_df_or_series?
Thank you for your attention.
There was a problem hiding this comment.
The set_output API was only designed with 2-D containers in mind. This means it does not consider pd.Series.
If you have an use case for set_output with a 1-D container, may you open an issue about it?
There was a problem hiding this comment.
Oh I did not know that, OK thank you for your feedback!
- Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: environment.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: requirements.txt - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044
#1087) * modified: test_transactionencoder.py - Added two new tests, `test_get_feature_names_out` and `test_set_output`. Passing these tests is a step towards the output of `TransactionEncoder` being formatted as a pandas.DataFramed by default. * modified: transactionencoder.py - Added `get_feature_names_out` method to `TransactionEncoder` to expose the `set_output` method. * modified: tests/test_transactionencoder.py - Updated test to include more checks. It is now back in a failing state. * modified: tests/test_transactionencoder.py - Updated test_set_output docstring to be more explicit. - Added numpy assertion to check that the transformed output columns match the original columns_ attribute for test_set_output. - Added numpy assertion to check that the get_feature_names_out output match the original columns_ attribute for test_get_feature_names_out. * modified: transactionencoder.py - Added logic similar to that in `sklearn.base.ClassNamePrefixFeaturesOutMixin` and `sklearn.base.OneToOneFeatureMixin` for the get_feature_names_out method. * modified: docs/sources/user_guide/preprocessing/TransactionEncoder.ipynb - Updated the user guide to show both the get_feature_names_out method and the set_output method. * modified: docs/sources/CHANGELOG.md - Updated changelog to reflect new features. * modified: docs/sources/CHANGELOG.md - Updated issue number. * modified: docs/sources/CHANGELOG.md - Updated issue number (again) to reflect the PR link instead of the issue link. * modified: mlxtend/preprocessing/transactionencoder.py - Ran isort over imports to fix failing check in PR. * modified: requirements.txt - Increased scikit-learn version to minimum required for set_output to work. * modified: environment.yml - Bumped scikit-learn version up to 1.2.2 to match requirements.txt. * modified: .github/workflows/python-package-conda.yml - Bumped scikit-learn version up to 1.2.2 to match environment.yml and requirements.txt. * modified: mlxtend/preprocessing/tests/test_transactionencoder.py - Updated `test_inverse_transform` to passing state by removing conversion to numpy array. * modified: .github/workflows/python-package-conda.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: environment.yml - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 modified: requirements.txt - Updated scikit-learn version to 1.3.1 to integerate fix from scikit-learn/scikit-learn#27044 * Update mlxtend/preprocessing/transactionencoder.py * Update mlxtend/preprocessing/transactionencoder.py * Update mlxtend/preprocessing/transactionencoder.py --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
Reference Issues/PRs
Closes #27037
What does this implement/fix? Explain your changes.
This PR correctly checks that the input is a dataframe before getting
index. The original issue happened becauselist.indexexists but is not a valid index.