[MRG] ENH: Adds inverse_transform to ColumnTransformer by thomasjpfan · Pull Request #11639 · scikit-learn/scikit-learn

thomasjpfan · 2018-07-20T02:49:03Z

Reference Issues/PRs

Fixes #11463

What does this implement/fix? Explain your changes.

Running inverse_transform with overlap or drop will raise a ValueError
_calculate_inverse_indices is used to connect indices from the output space back to the input space.

jnothman

Not yet reviewed.

jnothman · 2018-07-25T03:47:02Z

sklearn/compose/_column_transformer.py

+        for name, trans, cols in self.transformers:
+            col_indices = _get_column_indices(X, cols)
+            if not all_indexes.isdisjoint(set(col_indices)):
+                self._invertible = (False,


This doesn't appear to be covered by tests.

test_column_transformer_inverse_transform_with_overlaping_slices should cover this, I added the ValueError message in the assertion.

jnothman · 2018-07-25T03:47:10Z

sklearn/compose/_column_transformer.py

+
+        if not Xs:
+            # All transformers are None
+            return np.zeros((X.shape[0], 0))


This is not covered by tests

I added test_column_transformer_inverse_transform_all_transformers_drop to cover this.

jnothman · 2018-07-25T03:47:16Z

sklearn/compose/_column_transformer.py

+                    inverse_Xs[:, indices] = inverse_X
+
+        if self._X_is_sparse:
+            return sparse.csr_matrix(inverse_Xs)


This is not covered by tests

I added an assert to test_column_transformer_sparse_array to cover this.

jnothman · 2018-07-25T03:47:49Z

sklearn/compose/_column_transformer.py

+        Returns
+        -------
+        Xt : array-like, shape = [n_samples, n_features]
+


Note that it is a pandas DataFrame when ..

This is not resolved

jnothman

Not yet a full review

jnothman · 2018-07-26T13:14:30Z

sklearn/compose/_column_transformer.py

+            try:
+                import pandas as pd
+                return pd.DataFrame(inverse_Xs, columns=self._X_columns)
+            except ImportError:


This makes no sense. Either we promise pandas or we don't. Changing the return type on the basis of not having a dependency can't happen.

jnothman · 2018-07-26T13:17:13Z

sklearn/compose/_column_transformer.py

+        input_indices = []
+        for name, trans, cols in self.transformers:
+            col_indices = _get_column_indices(X, cols)
+            if not all_indexes.isdisjoint(set(col_indices)):


I'm probably missing something. I can't see where you update all_indexes to be non-empty. If I'm right, add tests

Good catch! It turns out pytest.raises(..., match=...) was need to properly test the Exception.

jnothman · 2018-07-26T13:22:22Z

sklearn/compose/_column_transformer.py

+            input_indices.append(col_indices)
+
+        # check remainder
+        remainder_indices = self._remainder[2]


We should make _remainder a namedtuple so that this is more legible

jnothman · 2018-07-26T13:25:45Z

sklearn/compose/_column_transformer.py

+            input_indices.append(remainder_indices)
+
+        self._input_indices = input_indices
+        self._X_features = X.shape[1]


Perhaps name this _n_features_in

jorisvandenbossche

Thanks for working on it!
Added a few comments (didn't look at the tests yet)

jorisvandenbossche · 2018-08-23T14:32:33Z

sklearn/compose/_column_transformer.py

+            col_indices_set = set(col_indices)
+            if not all_indexes.isdisjoint(col_indices_set):
+                self._invert_error = ("Unable to invert: transformers "
+                                      "contain overlaping columns")


overlaping -> overlapping

I also think the "Unable to invert" is already included into the message in inverse_transform ?

jorisvandenbossche · 2018-08-23T14:36:08Z

sklearn/compose/_column_transformer.py

+                                      "contain overlaping columns")
+                return
+            if trans == 'drop':
+                self._invert_error = "'{}' drops columns".format(name)


I would add something explicitly saying that dropping columns is not supported

jorisvandenbossche · 2018-08-23T14:36:37Z

sklearn/compose/_column_transformer.py

+        Private function to calcuate indicies for inverse_transform
+        """
+        # checks for overlap
+        all_indexes = set()


minor nit, but maybe also use 'indices' instead of 'indexes' since all other variables do that?

jorisvandenbossche · 2018-08-23T14:38:22Z

sklearn/compose/_column_transformer.py

+
+        self._input_indices = input_indices
+        self._n_features_in = X.shape[1]
+        self._X_columns = X.columns if hasattr(X, 'columns') else None


maybe getattr(X, 'columns', None) ?

jorisvandenbossche · 2018-08-23T14:42:04Z

sklearn/compose/_column_transformer.py

+        self._output_indices = []
+        cur_index = 0
+
+        Xs = self._fit_transform(X[0:1], None, _transform_one, fitted=True)


instead of transforming here again, we could also save the dimensions of the outputs in self.fit_transform itself?

or actually, you can also pass Xs if we want to keep the code here ?

That is a great idea! Thank you!

jorisvandenbossche · 2018-08-23T14:48:20Z

sklearn/compose/_column_transformer.py

+                trans = FunctionTransformer(
+                    validate=False, accept_sparse=True, check_inverse=False)
+
+            inv_transformers.append((name, trans, sub, get_weight(name)))


it seems name is not used below, so not needed to pass it?

jorisvandenbossche · 2018-08-23T14:50:12Z

sklearn/compose/_column_transformer.py

+            inverse_Xs = sparse.lil_matrix((Xs[0].shape[0],
+                                            self._n_features_in))
+        else:
+            inverse_Xs = np.zeros((Xs[0].shape[0], self._n_features_in))


zeros -> empty?

jorisvandenbossche · 2018-08-23T14:50:29Z

sklearn/compose/_column_transformer.py

+                else:
+                    inverse_Xs[:, indices] = inverse_X.toarray()
+            else:
+                if inverse_X.ndim == 1:


is this possible?

Are you referring to inverse_Xs[:, indices] = ... or inverse_x.ndim == 1?

inverse_Xs[:, indices] = ... runs when inverse_X is sparse and X is not sparse. test_column_transformer_sparse_array was updated to test this.

inverse_x.ndim == 1 runs when inverse_X is not sparse and only has one dimension. test_column_transformer_sparse_stacking tests for this use case.

I meant the second about inverse_x being 1-dimensional. I would think that any valid sklearn transformer should always return 2D output? (but maybe I am overlooking something). And so I would expect input for inverse_transform to have the same constraint.

preprocessing.LabelEncoder's transform and inverse_transform returns 1-D arrays.

LabelEncoder shouldn't be used on X

This PR was updated to remove the check for 1D outputs.

sklearn-lgtm · 2018-10-02T16:34:47Z

This pull request introduces 1 alert when merging 9d0cd00 into 7166cd5 - view on LGTM.com

new alerts:

1 for Unreachable code

Comment posted by LGTM.com

jnothman

I wonder if we could come up with a helpful example of this. I'm not sure it's necessary. Noting the available functionality in the user guide might be worthwhile.

Only a partial review.

jnothman · 2018-10-04T11:33:17Z

sklearn/compose/_column_transformer.py

+                else:
+                    inverse_Xs[:, indices] = inverse_X.toarray()
+            else:
+                if inverse_X.ndim == 1:


LabelEncoder shouldn't be used on X

jnothman · 2018-10-04T11:36:40Z

sklearn/pipeline.py



+def _inverse_transform_one(transformer, X, weight, **fit_params):
+    weight = weight or 1


I'd rather make 1 if weight is None else weight more explicitly... bool(weight) is not a good way to check for None

thomasjpfan · 2018-10-04T15:02:15Z

A motivating example would be nice. I will see if there is a place to add it in compose.rst.

jnothman · 2019-05-01T06:54:19Z

This modifies whats_new/v0.20.rst and should be updated

glemaitre · 2019-06-06T08:45:40Z

I have a use case internally where I want to use the preprocessor to process X. From this X_proc, we compute some quantiles which I would like to inverse to get the original scale.

thomasjpfan · 2019-06-09T14:34:40Z

@glemaitre thank you for the idea! Let’s see if there is a way to integrate it into one of our examples 🤔

amueller · 2019-08-06T19:27:38Z

do you want to merge with master for another round of reviews?

thomasjpfan · 2019-10-23T21:16:55Z

I have a use case internally where I want to use the preprocessor to process X. From this X_proc, we compute some quantiles which I would like to inverse to get the original scale.

@glemaitre What kind of insights do you get when you have the quantiles in the original scale?

adriangb · 2020-05-01T03:52:22Z

sklearn/compose/_column_transformer.py


+    def _calculate_inverse_indices(self, X, Xs):
+        """
+        Private function to calcuate indices for inverse_transform


Minor typo:

Suggested change

Private function to calcuate indices for inverse_transform

Private function to calculate indices for inverse_transform

glemaitre · 2020-05-27T14:31:30Z

@glemaitre What kind of insights do you get when you have the quantiles in the original scale?

Ouch. I don't remember what was my use case now. I should have directly given what I was programming at that time. It could have been linked to some neuroscience stuff but I am unsure now.

judahrand · 2020-08-07T13:33:41Z

Is this merge-able?

thomasjpfan · 2020-08-07T17:36:20Z

@Jude188 Not right now, do you have an example where this feature would be useful to you?

judahrand · 2020-08-07T17:52:21Z

@thomasjpfan I've got a case where I'm using Sklearn for only data preprocessing and not using an estimator. This is because the model that I have doesn't really have a y or target at 'learning' time. However, I want to be able to push the data that comes out of my model back through the preprocessing pipeline in order to recover the 'true' results. This basically boils down to, I have a matrix of values some of which I want to log transform some of which I do not but I also need to be able to undo that transform since the whole matrix comes out of my simulation model.

I'm sure I did a terrible job of explaining that!

MarcBresson · 2024-06-05T08:11:13Z

hello, any news on that PR?

I'm working on interpretability and started working on adding an inverse_transform for ColumnTransformer. I stumbled upon that PR and it would be extremely helpful to me.

thomasjpfan added 3 commits July 19, 2018 22:45

ENH: Adds inverse_transform to ColumnTransformer

1ba10f9

BUG: Fix

bfd0af4

ENH: inverse_transform supports dataframes

18fcd50

jnothman reviewed Jul 25, 2018

View reviewed changes

thomasjpfan added 4 commits July 25, 2018 12:55

TST: Includes error message

d9c4292

DOC: Includes pandas dataframe

cd577dd

TST: Adds inverse_transform test for sparse X

11e8c70

Merge remote-tracking branch 'upstream/master' into inverse_col_trans

a9cc365

jnothman reviewed Jul 26, 2018

View reviewed changes

thomasjpfan added 4 commits July 26, 2018 10:11

RFC: Fix tests

c9d5599

STY: flake8

8febb88

RFC: Uses namedtuple for remainder

2fc39ed

ENH: Uses lil_matrix to build sparse matrix

7bc3be0

thomasjpfan changed the title ~~ENH: Adds inverse_transform to ColumnTransformer~~ [MRG] ENH: Adds inverse_transform to ColumnTransformer Jul 27, 2018

jorisvandenbossche reviewed Aug 23, 2018

View reviewed changes

RFC: Addresses PR comments

81e3e68

thomasjpfan force-pushed the inverse_col_trans branch from 350652b to 81e3e68 Compare August 23, 2018 16:21

TST: Increases coverage

360d09b

thomasjpfan force-pushed the inverse_col_trans branch from b6db9ca to 360d09b Compare August 23, 2018 16:41

Merge remote-tracking branch 'upstream/master' into inverse_col_trans

9d0cd00

thomasjpfan added 2 commits October 2, 2018 12:48

BUG: Fix from merging

66084a2

MRG: Fix for empty columns

dc6d1f2

jnothman reviewed Oct 4, 2018

View reviewed changes

thomasjpfan added 3 commits October 4, 2018 10:54

BUG: Removes code that handles 1-D outputs

36b0ec7

RFC: Removes unneeded code

28e0418

RFC: Checks that weight is None

b21235a

glemaitre mentioned this pull request Oct 11, 2018

[RFC] Support sparse input in OneHotEncoder #12358

Open

thomasjpfan changed the title ~~[MRG] ENH: Adds inverse_transform to ColumnTransformer~~ [WIP] ENH: Adds inverse_transform to ColumnTransformer Oct 12, 2018

thomasjpfan added 4 commits November 22, 2018 13:34

DOC: Adds comment

93443fb

DOC: Spelling

6ddda58

RFC: Small fix

21bcea6

Merge remote-tracking branch 'upstream/master' into inverse_col_trans

843fcba

jnothman mentioned this pull request Jan 10, 2019

Support inverse_transform in ColumnTransformer #11463

Open

dont mind me please

d8c0175

thomasjpfan mentioned this pull request Sep 28, 2019

Slep007 - feature names, their generation and the API scikit-learn/enhancement_proposals#17

Merged

thomasjpfan added 5 commits October 23, 2019 15:04

Merge remote-tracking branch 'upstream/master' into inverse_col_trans

d061a1a

DOC Moves whats_new

b5ed728

ENH Update variables

5af7171

DOC Update

48e6df4

BUG Removes check is fitted

5111f89

github-actions bot added module:compose module:pipeline labels Mar 2, 2020

adriangb mentioned this pull request Apr 29, 2020

Mixed-type imputation for IterativeImputer #17087

Closed

adriangb reviewed May 1, 2020

View reviewed changes

cmarmo added the Needs Decision Requires decision label Sep 8, 2020

Base automatically changed from master to main January 22, 2021 10:50

tsrobinson mentioned this pull request Jun 28, 2022

Redo _preprocess_df() when sklearn ColumnTransformer has inverse_transform method tsrobinson/SyGNet#16

Open

glemaitre mentioned this pull request Jan 15, 2024

Expanded ColumnTransformer functionality -- transforming subsets of data #28130

Open



		def _inverse_transform_one(transformer, X, weight, **fit_params):
		weight = weight or 1

	Private function to calcuate indices for inverse_transform
	Private function to calculate indices for inverse_transform

Uh oh!

Conversation

thomasjpfan commented Jul 20, 2018 • edited by jnothman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sklearn-lgtm commented Oct 2, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

thomasjpfan commented Jul 20, 2018 •

edited by jnothman

Loading

thomasjpfan Aug 23, 2018 •

edited

Loading

judahrand commented Aug 7, 2020 •

edited

Loading